Open Access Article
Pinar Seyitdanlioglu
*
Department of Chemistry, Faculty of Science, Hacettepe University, Ankara, Türkiye. E-mail: pinarseyit@gmail.com; pseyitdanlioglu@hacettepe.edu.tr
First published on 16th June 2026
The spectral compatibility between an organic absorber and the illumination source is an important but often under-quantified descriptor in computational screening of organic photovoltaic materials, particularly for indoor applications where light sources have narrow and source-dependent emission profiles. Here, we introduce overlap-calculator, an open-source Python workflow for reproducible batch analysis of spectral overlap between molecular absorption spectra and solar or indoor reference light sources. The workflow accepts Gaussian TD-DFT output files and tabular UV-vis spectra in CSV or Excel formats within a common manifest-driven pipeline. TD-DFT transitions are reconstructed into continuous absorption profiles using Gaussian and Lorentzian broadening, converted into absorptance through a Beer–Lambert treatment, and compared with AM1.5G, CIE LED, fluorescent, or user-supplied spectra. The resulting descriptors include absorbed flux, absorbed fraction, and max-normalised shape overlap. The workflow is demonstrated using five previously generated TD-DFT OPV candidate absorbers and eight public organic UV-vis spectra used as spreadsheet-input examples. The TD-DFT case study illustrates automated transition parsing, spectral reconstruction, and source-dependent ranking, whereas the spreadsheet-input case demonstrates that tabular UV-vis data can be processed through the same descriptor-generation pipeline. By converting a previously script-based interpretation step into a documented and reusable workflow with command-line, Python library, HTTP API, Docker, structured output, and per-run provenance manifest support, overlap-calculator provides a practical tool for transparent spectral-compatibility analysis in data-driven optoelectronic materials research.
Computational materials screening commonly evaluates descriptors such as the HOMO–LUMO gap, vertical excitation energy, oscillator strength, open-circuit voltage estimates, light-harvesting efficiency, reorganization energy, and charge-transfer metrics.16–18 In TD-DFT-based screening studies, the spectral suitability of candidate photovoltaic materials is commonly assessed using absorption maxima, oscillator strengths, light-harvesting efficiencies, and simulated UV-vis absorption profiles.19–21 These descriptors remain essential, yet they do not by themselves quantify how much of a given light-source spectrum overlaps with the wavelength-dependent absorptance of a candidate material.22,23 The absorption maximum alone is particularly insufficient for indoor OPVs: two molecules with similar peak positions may show different spectral widths, line shapes, and degrees of overlap with LED or fluorescent emission bands. Conversely, molecules with different absorption maxima may exhibit comparable spectral compatibility when their broadened absorption profiles cover the same high-intensity region of the light source.22,24,25
Although useful for early-stage screening, such descriptor-level or visual interpretation can limit reproducibility and make it difficult to compare TD-DFT-derived transition data and spreadsheet-based spectra on a common quantitative basis.26,27 In practice, spectral compatibility is often evaluated by inspecting overlaid absorption and light-source spectra or by applying small workflow-specific calculations. While useful, such approaches can obscure processing choices such as spectral interpolation, normalization, broadening, and integration limits.22,26,28 This is a natural starting point during exploratory research, but it creates reproducibility challenges. First, theoretical TD-DFT spectra and experimental UV-vis spectra are often handled with different scripts or assumptions. Second, the choice of broadening function, wavelength grid, concentration, path length, normalization, and light-source data can remain implicit. Third, the resulting plots and tables are rarely generated from a single documented workflow that can be rerun on a new molecule set. These issues are especially important when spectral compatibility is used to rank candidates or justify molecular design decisions.
To address this need, the present work introduces overlap-calculator as a reproducible Python workflow for batch spectral-overlap analysis between molecular absorption spectra and solar or indoor light sources. The implementation supports multiple solar and indoor light sources, Gaussian TD-DFT outputs, spreadsheet UV-vis spectra, structured exports, command-line and Python API access, Docker deployment, provenance-aware output generation, and publication-quality visualization. Rather than replacing established electronic-structure or workflow-management software, overlap-calculator provides a focused post-processing layer that standardizes the transformation of absorption information into light-source-dependent overlap descriptors. This makes the analysis easier to audit, repeat, and extend.
The package accepts both theoretical and experimental absorption data within the same analysis run. Theoretical inputs are Gaussian TD-DFT output files,29 from which excited-state wavelengths and oscillator strengths are extracted. Experimental inputs are tabular UV-vis spectra supplied as files or Excel workbooks. By treating both input families as first-class data sources, the workflow allows computed and measured spectra to be processed using the same downstream descriptors. This is particularly useful when TD-DFT predictions are compared with available UV-vis measurements or when a computational screening study is extended toward experimental validation.
The workflow is organised into six main stages. First, input files are resolved through a JSON manifest that can be generated automatically from a directory of raw files or edited manually by the user. Second, theoretical and experimental inputs are parsed using input-specific loaders. Gaussian TD-DFT files are processed to extract discrete excited-state information, whereas tabular spectra are read as wavelength-dependent absorbance data. Third, TD-DFT transitions are reconstructed as continuous absorption spectra on a shared wavelength grid using Gaussian and Lorentzian line-shape functions. Fourth, the absorption information is converted into absorptance through the Beer–Lambert relationship. Fifth, the resulting absorptance spectra are integrated against selected reference light-source spectra. Finally, the workflow exports structured result tables, descriptor summaries, skip reports, timing information, and publication-quality plots (Fig. 1). The main components of the workflow and their scientific roles are summarised in Table 1. The exact mixed input manifest used in the demonstration is provided in Table S2 of the SI.
| Component | Role in the workflow | Scientific purpose |
|---|---|---|
| input.json manifest | Defines sample identifiers, file paths, input type, series name, and sheet name | Makes the analysed dataset explicit and rerunnable |
| TD-DFT parser | Extracts excited-state wavelengths, oscillator strengths, and route metadata from Gaussian outputs | Transforms discrete excited-state information into spectral inputs |
| Experimental loader | Reads wavelength and absorbance columns from CSV or Excel files | Allows measured UV-vis spectra to be processed with the same overlap descriptors |
| Light-source loader | Loads AM1.5G, CIE LED/FL, or custom two-column spectra | Enables source-specific spectral compatibility analysis |
| Descriptor exporter | Writes results, summary tables, skip reports, timings, and plots | Supports ranking, troubleshooting, and publication-ready visualisation |
| Calibration block (optional) | Applies Ecali = aEcalci + b, fcal = a × f | Per-band overrides before broadening; identity bit-identical |
This manifest-driven design avoids hard-coded file paths in the analysis code and makes the analysed dataset explicit. It also improves traceability because each result row can be linked back to a specific raw file, sample identifier, sheet, and spectral series. As a result, the same analysis can be rerun on the same dataset or extended to a new molecule set without modifying the workflow code.
A mixed theoretical-experimental manifest can contain Gaussian TD-DFT outputs and spreadsheet-based UV-vis spectra in the same run:
Although overlap-calculator is presented primarily through its command-line interface, the underlying analysis is exposed as an importable Python library: the spectrum-loading, reconstruction, overlap-integral, and descriptor routines are public, typed functions that can be called directly from a notebook or a larger pipeline. This positions the tool for orchestration by established workflow-management systems for computational materials science—AiiDA, jobflow, and pyiron—in which it would act as a calculator node that consumes upstream TD-DFT results and absorption spectra.30–32 Native plug-ins for these engines are not implemented in the present release and are noted as integration targets; the structured input contract and per-row provenance are also deliberately compatible with the recently proposed Python workflow definition exchange format, which we identify as future work for cross-engine portability.33
For each TD-DFT transition i, the excitation wavelength is converted to a wavenumber scale and represented by a normalised line profile. The wavelength grid is converted according to
| ν = 107/λ (λ in nm; ν in cm−1) | (1) |
The Gaussian and Lorentzian profiles are then defined as:
gi(ν) = [1/(σ√(2π))] exp{−(ν − νi)2/(2σ2)}
| (2) |
![]() | (3) |
![]() | (4) |
Here, σ denotes the Gaussian standard deviation. For the Lorentzian profile, the half-width-at-half-maximum γ is defined as
, ensuring that the Gaussian and Lorentzian line shapes have the same FWHM for a given
value. This choice removes an implicit width mismatch between the two broadening models, so that the reported
columns primarily reflect differences in tail behaviour rather than differences in spectral width. The molar extinction coefficient is reconstructed as a weighted sum over oscillator strengths:
| ε(λ) = 2.315 × 108Σifipi(ν(λ)) [M−1 cm−1] | (5) |
The prefactor P = 2.315 × 108 M−1 cm−2 in eqn (2) and (5) is the inverse of the integrated-intensity relation fi = 4.319 × 10−9 ∫εi(ν)dν (with ν in cm−1 and εi in M−1 cm−1), distributed over a line shape normalised to unit area on the wavenumber axis. Taking P outside the integral is exact in the narrow-line limit σ ≪ νi and otherwise incurs a fractional error of order (σ/νi)2. For the default broadening σ = 0.30 eV and the 200–800 nm window targeted here the raw (σ/ν)2 scale reaches ∼3.8% at the red edge (λ = 800 nm, ν = 12
500 cm−1), but because the leading ν-resolved correction integrates out by symmetry for a symmetric line shape, the residual error on the band area/peak stays below ∼2% over 200–800 nm well within the uncertainty already introduced by the phenomenological σ. The constant-prefactor form is therefore retained as the default for UV-vis screening, and a frequency-resolved option is provided for users extending the workflow into the near-infrared, where a ν-resolved P(ν) must be evaluated inside the integral.34,35 The wavelength grid, broadening width, and numerical integration window are user-controllable. The frequency-resolved option is intended for near-infrared or lower-energy applications.
The core reconstruction step can be represented as follows:
The use of two-line shape functions is not intended to imply that either function universally reproduces an experimental UV-vis spectrum. Instead, it provides a transparent sensitivity check. If the ranking of candidate absorbers or light sources remains stable across Gaussian and Lorentzian reconstructions, the conclusion is less likely to be an artefact of the selected spectral reconstruction model. The user-facing parameter is given in eV; the equation uses σ in cm−1 via σ cm−1 = 8065.544σ (eV). Additional implementation details for spectral reconstruction and absorptance conversion are provided in Section S5 of the SI.
Both theoretical and experimental spectra are converted into wavelength-dependent absorptance before overlap descriptors are calculated. For TD-DFT-derived spectra, absorbance is obtained from the reconstructed molar extinction coefficient using the Beer–Lambert relationship:
| A(λ) = ε(λ)cL | (6) |
| α(λ) = 1 − 10−A(λ) | (7) |
Experimental spectra are treated as absorbance directly and passed through the same absorptance definition:
The runtime Beer–Lambert reference state and wavelength-grid settings used in the manuscript are listed in Section S3 of the SI.
For each sample and light source, the workflow computes the following integrals over the selected wavelength grid:
absorbedflux = ∫α(λ)I(λ) dλ
| (8) |
light_flux_total = ∫I(λ) dλ
| (9) |
| absorbed_fraction = absorbed_flux / light_flux_total | (10) |
shape_overlap = [∫min(α(λ),Ĩ(λ)) dλ]/[∫Ĩ(λ) dλ]
| (11) |
Here I(λ) is the light-source spectral distribution and hats indicate max-normalised profiles. Absorbed_fraction reports the fraction of the available spectral source captured by the absorptance profile within the grid, while shape_overlap focuses on the relative matching of spectral shapes. The latter is particularly useful for CIE LED and fluorescent spectra, which are provided as relative spectral power distributions.
Once the molecular absorptance and the light-source spectrum are defined on the same wavelength grid, the absorbed flux, absorbed fraction, and shape-overlap descriptors are calculated numerically:
The light-source definitions and descriptor equations used in the demonstration are summarised in Sections S6 and S7 of the SI.
The most relevant runtime parameters are the broadening width, wavelength range, number of grid points, reference concentration, path length, default light sources, custom light-source files, plot generation flag, and logging configuration. This two-step structure allows the dataset description to be inspected and archived before the numerical analysis is executed, and it makes it possible to rerun the same analysis with different settings without changing the raw input files.
, ensuring that the Gaussian and Lorentzian line shapes have matched FWHM values. It is identical for every transition, is fully under user control (default σ = 0.30 eV), and is not derived from any underlying physical width: no reorganisation energy is partitioned into the band shape, no Huang–Rhys factors are evaluated, and no Duschinsky mixing is included. This convention is retained because the workflow is a descriptor calculator built on top of an absorption profile, not a band-shape simulator, and because a single transparent parameter is more useful for screening than a hidden composite of unverified assumptions; to let users audit robustness, the pipeline emits Gaussian, Lorentzian, and
columns side by side and supports a σ sweep. When quantitative band shapes are required—vibronic progressions, mirror-image symmetry with fluorescence, accurate 0–0 placement, or low-temperature spectra—this convenience path is not the recommended route. Two physically grounded alternatives feed the same descriptor pipeline. First, a Marcus-type Gaussian width,
, is a built-in broadening mode
with
and
library entry point marcus_hush_sigma_ev that assigns the classical high-temperature width per transition before broadening; at λ = 0.30 eV and T = 298.15 K this gives σ = 0.124 eV (FWHM 0.292 eV), the classical limit, so for organic vibrational baths with ħω ≫ kBT it is a first-order estimate. Second—recommended for production work—a fully vibronic Franck–Condon or Franck–Condon plus Herzberg–Teller spectrum computed with Gaussian (Freq = FC), DUSHIN, or FCclasses, exported on a wavelength grid and ingested through the tabular-input branch.40–44
The main output table contains one row per sample and light-source pair, with separate columns for Gaussian and Lorentzian broadening. The most important descriptors are the peak absorptance wavelength, peak absorptance, absorbed flux, absorbed fraction, and shape overlap (Table 2). Additional delta columns report the difference between Gaussian and Lorentzian metrics. Because the tables and plots are generated from the same documented workflow, the numerical descriptors and visual outputs remain consistent with one another.
| Descriptor | Definition or interpretation | Suggested use |
|---|---|---|
| absorbed_flux | Integral of absorptance multiplied by the light-source spectrum | Quantifies source-weighted absorbed light; unit depends on the source spectrum |
| absorbed_fraction | absorbed_flux divided by total light-source flux over the grid | Compares how much of the selected source is absorbed by a sample |
| shape_overlap | Integral comparator using max-normalised absorptance and light-source shapes | Useful when light-source data are relative, especially for LED and fluorescent spectra |
| Delta metrics | Gaussian minus Lorentzian descriptor values | Checks sensitivity to the broadening model |
| descriptor_summary | Condensed per-sample table across light sources | Starting point for ranking candidate absorbers |
Two grouped ranking tables,
(samples ordered within each light source) and
(light sources ordered within each sample), are emitted in CSV/JSON/XLSX for all four-overlap metrics (gaussian/lorentzian × absorbed_fraction/shape_overlap). Companion bar charts accompany each table under
. The generated output files and ranking-table families are listed in Section S8 of the SI.
The design of overlap-calculator is aligned with the FAIR Principles for Research Software (FAIR4RS), which adapt the FAIR Guiding Principles from data to executable software.45,46 The source code is openly developed under a permissive licence, every tagged release is archived with a persistent Zenodo DOI, and the dependency surface is reproducible through a pinned environment and a versioned container image, supporting the findable, accessible, and reusable dimensions; interoperability is addressed by reading widely supported tabular inputs and emitting structured, schema-stable outputs. Throughout this manuscript we use the term traceability in the sense of computational provenance: every analyze run writes a run_manifest.json at the output root recording the software version, git commit, UTC timestamp, the full resolved parameter set, the light-source set, and one entry per (source, sample, series, sheet) with the SHA-256 of each input file; each descriptor row additionally carries
,
, and
, and is linked via the input manifest to its originating file, sheet, series, sample, and spectrum so that a third party can re-execute the analysis and reproduce the descriptors from the archived artefacts alone.27,47
![]() | ||
| Fig. 2 Representative TD-DFT-derived absorption profile reconstructed using Gaussian and Lorentzian broadening. | ||
![]() | ||
| Fig. 3 Representative spectral–overlap plot between a TD-DFT-derived absorptance spectrum and AM1.5G illumination. | ||
![]() | ||
| Fig. 4 Representative spectral–overlap plot between a TD-DFT-derived absorptance spectrum and LED B2 illumination. | ||
The demonstration inputs are not a representative chemical space: the five TD-DFT files are a small B3LYP/6-31G donor–acceptor set, and the PhotochemCAD-derived spreadsheet spectra were chosen as well-characterised, strongly absorbing visible chromophores. They are used solely to exercise the two input branches of the workflow and are not proposed as a screening corpus or as OPV candidate absorbers.
Together, the demonstrations address three practical questions: (i) how Gaussian and Lorentzian broadening affect TD-DFT-derived overlap descriptors, (ii) whether candidate rankings change between outdoor and indoor light sources, and (iii) whether non-Gaussian UV-vis spectra arranged in Excel format can be converted into the same descriptor tables and plots as Gaussian output files. Details of the theoretical and spreadsheet-input branches used in the demonstration are provided in Sections S2–S4 of the SI.
For each candidate absorber, Gaussian TD-DFT output files are parsed to extract excited-state wavelengths and oscillator strengths. The spectra are reconstructed over the same wavelength grid using both broadening models. The resulting absorptance profiles are then integrated against AM1.5G, LEDB2, LEDB3, LEDB4, and CIEFL10 spectra. This produces a matrix of light-source-dependent descriptors suitable for ranking and visualisation.
The molar extinction coefficient spectra were converted into absorbance-like profiles using A(λ) = ε(λ)cL with a fixed reference concentration of 1 × 10−5 M and a 1 cm optical path length. The reference concentration c and path length L are state parameters of the Beer–Lambert conversion, not fit parameters. In the optically thin limit
is invariant under any global scaling of εcL, and
is a monotone-saturating function of cL that scales linearly with it for A ≪ 1; only
carries the absolute scale. For screening we therefore recommend choosing cL so that the peak reference absorbance lies in the range Amax ∈ [0.1, 1]—the well-conditioned regime of experimental cuvette spectroscopy. For typical organic chromophores (εmax ∼ 104–105 M−1 cm−1) the defaults c = 1 × 10−5 M and L = 1 cm satisfy this, and both values are written to every output row, so no iterative optimisation of c and L is required. The resulting values were arranged in an Excel workbook with wavelength_nm in the first column and each molecule as an independent spectral series.
This dataset is used only as a spreadsheet-input demonstration and is not intended as an experimental validation set for the TD-DFT-derived molecules (Fig. 5). It verifies that organic molecular UV-vis spectra supplied in tabular form can be ingested, converted to absorptance, compared with the same solar and indoor light-source spectra, and exported with the same output logic used for TD-DFT-derived spectra. The spreadsheet spectra were used only to demonstrate the tabular-input branch of the workflow and are not proposed as OPV candidate absorbers, as clarified in Section S4 of the SI.
![]() | ||
| Fig. 5 Representative spectral–overlap plot between a spreadsheet-input absorptance spectrum derived from an experimentally reported UV-vis spectrum and AM1.5G illumination. | ||
This design is useful because it allows a project to start from TD-DFT screening data and later incorporate measured UV-vis spectra without changing the downstream analysis. The same descriptors can therefore be used for exploratory ranking, experimental follow-up, and reproducibility checks, while clearly distinguishing between calculated spectra, measured absorbance spectra, and public spreadsheet demonstration data.
Therefore, the workflow supports a design logic in which absorbers are not ranked by a single universal optical descriptor but by their compatibility with the intended operating environment. The top-ranked samples under each light source are provided in Table S7 of the SI.
The ranking analysis confirms that the spectral compatibility of the tested absorbers is strongly source-dependent. Under AM1.5G illumination, 4c_td gave the highest Gaussian shape-overlap value of 0.7952, followed by 3e_td with 0.7173 and 3g_td with 0.6437. In contrast, under LEDB4 illumination, 4c_td remained the top-ranked sample with a Gaussian shape-overlap value of 0.9643, followed by 3g_td with 0.9536 and 3e_td with 0.9374. For the representative TD-DFT-derived absorber 3e_td, the highest shape-overlap values were obtained under LEDB2, LEDB3, LEDB4, and CIEFL10, whereas the AM1.5G overlap was lower. This demonstrates that absorber ranking cannot be inferred from λmax alone and should be evaluated with respect to the intended illumination environment.
For absolute irradiance spectra such as AM1.5G, absorbed_flux has a direct physical meaning within the simplified absorptance model. For relative indoor illuminant spectra, however, the absolute scale is arbitrary. In those cases, absorbed_fraction and shape_overlap are more informative for comparing molecules under the same light source. Shape overlap is especially useful when the goal is to quantify how well the positions and widths of absorption bands align with the emission profile of a lamp. However, it should not be interpreted as a device efficiency or as a replacement for experimental external quantum efficiency.
The descriptors reported by overlap-calculator are optical compatibility descriptors. They do not include exciton diffusion, morphology, donor–acceptor interfacial energetics, charge separation, transport, recombination, electrode losses, or optical interference in a real device stack. Therefore, they should be used as early-stage screening descriptors rather than final performance predictors. Their strength is that they quantify a necessary optical condition: if a candidate absorber does not overlap with the intended illumination source, high device performance is unlikely regardless of favourable electronic descriptors.
The current workflow has several limitations. TD-DFT reconstruction depends on the selected functional, basis set, solvent model, number of excited states, wavelength range, and broadening width. Beer–Lambert absorptance uses a simplified reference concentration and path length, and film optical constants and device-stack effects are not included. Experimental spectra are assumed to be absorbance spectra with a valid wavelength axis, so additional preprocessing may be needed for noisy or baseline-shifted data. Relative CIE LED and fluorescent spectra do not provide absolute absorbed power unless a calibrated intensity scale is supplied. The current descriptors quantify spectral compatibility, not EQE, Jsc, PCE, or long-term device stability. Interpretive limitations of the optical-overlap descriptors are discussed in Section S14 of the SI.
The resulting descriptors, including absorbed flux, absorbed fraction, and shape overlap, support light-source-dependent ranking of TD-DFT-derived OPV candidate absorbers while preserving transparency about the assumptions used in the calculation. The package is intended as a lightweight post-processing layer for computational and experimental optoelectronic materials research rather than as a replacement for electronic-structure calculations, spectroscopic measurements, or device-level photovoltaic characterization.
Future versions can extend the workflow in several directions. Support for EQE or IPCE spectra would allow the same light-source integration logic to estimate source-specific current-density limits. A graphical interface could make the workflow accessible to users who are less comfortable with command-line tools. Additional lamp libraries, calibrated spectral irradiance data, and user-defined spectral normalisation modes could further improve indoor PV analysis, while direct export of ranked summary reports could make the package more useful for high-throughput screening studies.
| This journal is © The Royal Society of Chemistry 2026 |