Open Access Article
This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

A reproducible Python workflow for absorber-light-source spectral matching: overlap-calculator

Pinar Seyitdanlioglu*
Department of Chemistry, Faculty of Science, Hacettepe University, Ankara, Türkiye. E-mail: pinarseyit@gmail.com; pseyitdanlioglu@hacettepe.edu.tr

Received 1st May 2026 , Accepted 15th June 2026

First published on 16th June 2026


Abstract

The spectral compatibility between an organic absorber and the illumination source is an important but often under-quantified descriptor in computational screening of organic photovoltaic materials, particularly for indoor applications where light sources have narrow and source-dependent emission profiles. Here, we introduce overlap-calculator, an open-source Python workflow for reproducible batch analysis of spectral overlap between molecular absorption spectra and solar or indoor reference light sources. The workflow accepts Gaussian TD-DFT output files and tabular UV-vis spectra in CSV or Excel formats within a common manifest-driven pipeline. TD-DFT transitions are reconstructed into continuous absorption profiles using Gaussian and Lorentzian broadening, converted into absorptance through a Beer–Lambert treatment, and compared with AM1.5G, CIE LED, fluorescent, or user-supplied spectra. The resulting descriptors include absorbed flux, absorbed fraction, and max-normalised shape overlap. The workflow is demonstrated using five previously generated TD-DFT OPV candidate absorbers and eight public organic UV-vis spectra used as spreadsheet-input examples. The TD-DFT case study illustrates automated transition parsing, spectral reconstruction, and source-dependent ranking, whereas the spreadsheet-input case demonstrates that tabular UV-vis data can be processed through the same descriptor-generation pipeline. By converting a previously script-based interpretation step into a documented and reusable workflow with command-line, Python library, HTTP API, Docker, structured output, and per-run provenance manifest support, overlap-calculator provides a practical tool for transparent spectral-compatibility analysis in data-driven optoelectronic materials research.


Introduction

Organic photovoltaics (OPVs) are attractive for emerging energy-harvesting applications because molecular structure can be tuned to control optical absorption, frontier orbital energies, excited-state character, and processing behaviour.1–3 In addition to outdoor solar conversion, OPVs are increasingly discussed for indoor and low-irradiance environments, where lightweight, flexible, and solution-processable devices could power distributed sensors or low-power Internet of Things (IoT) devices.4,5 This application space changes the screening problem. Indoor and low-light photovoltaic operation differs from outdoor operation in ways that motivate, rather than strictly require, spectrally tailored absorbers. The photon flux available from common artificial sources (warm-white LEDs, fluorescent lamps) is concentrated in the visible window and is, per unit area, two to three orders of magnitude smaller than AM1.5G, which shifts the radiative-limit optimum towards wider-gap absorbers and amplifies the penalty of any mismatch between the absorption profile and the source spectrum. Recent indoor organic and perovskite cells have exceeded ∼28–30% power-conversion efficiency under indoor illuminants precisely by exploiting this spectral match, so a quantitative descriptor of absorber-source overlap is a useful pre-synthesis screen; however, because a sufficiently broad and inexpensive absorber may also be a viable route, the criterion is best framed as a design advantage rather than a strict requirement.6–14 Under indoor lighting, the relevant illumination source is no longer only a broad solar spectrum but also a source-specific emission profile, such as a phosphor-converted white LED or a fluorescent lamp. Therefore, an absorber that appears promising under AM 1.5G illumination may not be equally suitable under a narrow or structured indoor spectrum.13–15

Computational materials screening commonly evaluates descriptors such as the HOMO–LUMO gap, vertical excitation energy, oscillator strength, open-circuit voltage estimates, light-harvesting efficiency, reorganization energy, and charge-transfer metrics.16–18 In TD-DFT-based screening studies, the spectral suitability of candidate photovoltaic materials is commonly assessed using absorption maxima, oscillator strengths, light-harvesting efficiencies, and simulated UV-vis absorption profiles.19–21 These descriptors remain essential, yet they do not by themselves quantify how much of a given light-source spectrum overlaps with the wavelength-dependent absorptance of a candidate material.22,23 The absorption maximum alone is particularly insufficient for indoor OPVs: two molecules with similar peak positions may show different spectral widths, line shapes, and degrees of overlap with LED or fluorescent emission bands. Conversely, molecules with different absorption maxima may exhibit comparable spectral compatibility when their broadened absorption profiles cover the same high-intensity region of the light source.22,24,25

Although useful for early-stage screening, such descriptor-level or visual interpretation can limit reproducibility and make it difficult to compare TD-DFT-derived transition data and spreadsheet-based spectra on a common quantitative basis.26,27 In practice, spectral compatibility is often evaluated by inspecting overlaid absorption and light-source spectra or by applying small workflow-specific calculations. While useful, such approaches can obscure processing choices such as spectral interpolation, normalization, broadening, and integration limits.22,26,28 This is a natural starting point during exploratory research, but it creates reproducibility challenges. First, theoretical TD-DFT spectra and experimental UV-vis spectra are often handled with different scripts or assumptions. Second, the choice of broadening function, wavelength grid, concentration, path length, normalization, and light-source data can remain implicit. Third, the resulting plots and tables are rarely generated from a single documented workflow that can be rerun on a new molecule set. These issues are especially important when spectral compatibility is used to rank candidates or justify molecular design decisions.

To address this need, the present work introduces overlap-calculator as a reproducible Python workflow for batch spectral-overlap analysis between molecular absorption spectra and solar or indoor light sources. The implementation supports multiple solar and indoor light sources, Gaussian TD-DFT outputs, spreadsheet UV-vis spectra, structured exports, command-line and Python API access, Docker deployment, provenance-aware output generation, and publication-quality visualization. Rather than replacing established electronic-structure or workflow-management software, overlap-calculator provides a focused post-processing layer that standardizes the transformation of absorption information into light-source-dependent overlap descriptors. This makes the analysis easier to audit, repeat, and extend.

Workflow definition of overlap-calculator

The design of overlap-calculator follows a workflow-centred strategy in which the scientific operation of spectral-overlap analysis is separated from the technical details of file discovery, parsing, interpolation, spectral broadening, absorptance conversion, light-source handling, and output generation. The workflow was developed to support a recurring post-processing task in photovoltaic materials screening: comparing molecular absorption spectra with solar or indoor illumination spectra on a common quantitative basis (Fig. 1).
image file: d6dd00247a-f1.tif
Fig. 1 Conceptual workflow of overlap-calculator.

The package accepts both theoretical and experimental absorption data within the same analysis run. Theoretical inputs are Gaussian TD-DFT output files,29 from which excited-state wavelengths and oscillator strengths are extracted. Experimental inputs are tabular UV-vis spectra supplied as files or Excel workbooks. By treating both input families as first-class data sources, the workflow allows computed and measured spectra to be processed using the same downstream descriptors. This is particularly useful when TD-DFT predictions are compared with available UV-vis measurements or when a computational screening study is extended toward experimental validation.

The workflow is organised into six main stages. First, input files are resolved through a JSON manifest that can be generated automatically from a directory of raw files or edited manually by the user. Second, theoretical and experimental inputs are parsed using input-specific loaders. Gaussian TD-DFT files are processed to extract discrete excited-state information, whereas tabular spectra are read as wavelength-dependent absorbance data. Third, TD-DFT transitions are reconstructed as continuous absorption spectra on a shared wavelength grid using Gaussian and Lorentzian line-shape functions. Fourth, the absorption information is converted into absorptance through the Beer–Lambert relationship. Fifth, the resulting absorptance spectra are integrated against selected reference light-source spectra. Finally, the workflow exports structured result tables, descriptor summaries, skip reports, timing information, and publication-quality plots (Fig. 1). The main components of the workflow and their scientific roles are summarised in Table 1. The exact mixed input manifest used in the demonstration is provided in Table S2 of the SI.

Table 1 Main components of the overlap-calculator workflow and their scientific roles in reproducible spectral-overlap analysis
Component Role in the workflow Scientific purpose
input.json manifest Defines sample identifiers, file paths, input type, series name, and sheet name Makes the analysed dataset explicit and rerunnable
TD-DFT parser Extracts excited-state wavelengths, oscillator strengths, and route metadata from Gaussian outputs Transforms discrete excited-state information into spectral inputs
Experimental loader Reads wavelength and absorbance columns from CSV or Excel files Allows measured UV-vis spectra to be processed with the same overlap descriptors
Light-source loader Loads AM1.5G, CIE LED/FL, or custom two-column spectra Enables source-specific spectral compatibility analysis
Descriptor exporter Writes results, summary tables, skip reports, timings, and plots Supports ranking, troubleshooting, and publication-ready visualisation
Calibration block (optional) Applies Ecali = aEcalci + b, fcal = a × f Per-band overrides before broadening; identity bit-identical


Input families and manifest-driven analysis

Overlap-calculator treats theoretical and experimental absorption data as two first-class input families. The theoretical path accepts Gaussian TD-DFT output files with .out or .log extensions. The experimental path accepts comma-separated values (CSV) files and Excel workbooks, including multi-sheet and multi-series datasets. This design allows computed and measured spectra to be processed within a single analysis run, which is useful when computational screening results are compared with available UV-vis measurements or when theoretical predictions are connected to experimental validation.

This manifest-driven design avoids hard-coded file paths in the analysis code and makes the analysed dataset explicit. It also improves traceability because each result row can be linked back to a specific raw file, sample identifier, sheet, and spectral series. As a result, the same analysis can be rerun on the same dataset or extended to a new molecule set without modifying the workflow code.

A mixed theoretical-experimental manifest can contain Gaussian TD-DFT outputs and spreadsheet-based UV-vis spectra in the same run:

image file: d6dd00247a-u1.tif

Although overlap-calculator is presented primarily through its command-line interface, the underlying analysis is exposed as an importable Python library: the spectrum-loading, reconstruction, overlap-integral, and descriptor routines are public, typed functions that can be called directly from a notebook or a larger pipeline. This positions the tool for orchestration by established workflow-management systems for computational materials science—AiiDA, jobflow, and pyiron—in which it would act as a calculator node that consumes upstream TD-DFT results and absorption spectra.30–32 Native plug-ins for these engines are not implemented in the present release and are noted as integration targets; the structured input contract and per-row provenance are also deliberately compatible with the recently proposed Python workflow definition exchange format, which we identify as future work for cross-engine portability.33

From oscillator strengths to broadened extinction spectra

For theoretical inputs, the workflow converts discrete TD-DFT transitions into continuous absorption spectra on a shared wavelength grid. Each excited state is represented according to its oscillator strength and broadened using both Gaussian and Lorentzian line-shape functions. Reporting both broadening models is useful because spectral width, tailing behaviour, and the resulting overlap with a light-source spectrum can depend on the selected reconstruction model, particularly when the illumination profile contains narrow LED or fluorescent emission bands.

For each TD-DFT transition i, the excitation wavelength is converted to a wavenumber scale and represented by a normalised line profile. The wavelength grid is converted according to

 
ν = 107/λ (λ in nm; ν in cm−1) (1)

The Gaussian and Lorentzian profiles are then defined as:

 
gi(ν) = [1/(σ√(2π))][thin space (1/6-em)]exp{−(ννi)2/(2σ2)} (2)
 
image file: d6dd00247a-t1.tif(3)
 
image file: d6dd00247a-t2.tif(4)

Here, σ denotes the Gaussian standard deviation. For the Lorentzian profile, the half-width-at-half-maximum γ is defined as image file: d6dd00247a-t3.tif, ensuring that the Gaussian and Lorentzian line shapes have the same FWHM for a given image file: d6dd00247a-u2.tif value. This choice removes an implicit width mismatch between the two broadening models, so that the reported image file: d6dd00247a-u3.tif columns primarily reflect differences in tail behaviour rather than differences in spectral width. The molar extinction coefficient is reconstructed as a weighted sum over oscillator strengths:

 
ε(λ) = 2.315 × 108Σifipi(ν(λ)) [M−1 cm−1] (5)

The prefactor P = 2.315 × 108 M−1 cm−2 in eqn (2) and (5) is the inverse of the integrated-intensity relation fi = 4.319 × 10−9εi(ν) (with ν in cm−1 and εi in M−1 cm−1), distributed over a line shape normalised to unit area on the wavenumber axis. Taking P outside the integral is exact in the narrow-line limit σνi and otherwise incurs a fractional error of order (σ/νi)2. For the default broadening σ = 0.30 eV and the 200–800 nm window targeted here the raw (σ/ν)2 scale reaches ∼3.8% at the red edge (λ = 800 nm, ν = 12[thin space (1/6-em)]500 cm−1), but because the leading ν-resolved correction integrates out by symmetry for a symmetric line shape, the residual error on the band area/peak stays below ∼2% over 200–800 nm well within the uncertainty already introduced by the phenomenological σ. The constant-prefactor form is therefore retained as the default for UV-vis screening, and a frequency-resolved option is provided for users extending the workflow into the near-infrared, where a ν-resolved P(ν) must be evaluated inside the integral.34,35 The wavelength grid, broadening width, and numerical integration window are user-controllable. The frequency-resolved option is intended for near-infrared or lower-energy applications.

The core reconstruction step can be represented as follows:

image file: d6dd00247a-u4.tif

The use of two-line shape functions is not intended to imply that either function universally reproduces an experimental UV-vis spectrum. Instead, it provides a transparent sensitivity check. If the ranking of candidate absorbers or light sources remains stable across Gaussian and Lorentzian reconstructions, the conclusion is less likely to be an artefact of the selected spectral reconstruction model. The user-facing parameter is given in eV; the equation uses σ in cm−1 via σ cm−1 = 8065.544σ (eV). Additional implementation details for spectral reconstruction and absorptance conversion are provided in Section S5 of the SI.

Beer–Lambert absorptance conversion

For experimental inputs, the user-provided spectral signal is treated as absorbance, A(λ), and is processed on the same wavelength grid used for theoretical spectra. CSV files and Excel workbooks are parsed to identify a wavelength column and one or more numeric signal columns. Each numeric signal column is treated as an independent spectrum, allowing multi-series UV-vis datasets to be analysed without preparing separate input files for each sample.

Both theoretical and experimental spectra are converted into wavelength-dependent absorptance before overlap descriptors are calculated. For TD-DFT-derived spectra, absorbance is obtained from the reconstructed molar extinction coefficient using the Beer–Lambert relationship:

 
A(λ) = ε(λ)cL (6)
 
α(λ) = 1 − 10A(λ) (7)
where c is the reference concentration and L is the optical path length. For experimental spectra, the measured absorbance is inserted directly into α(λ) = 1 − 10A(λ). This common treatment places TD-DFT-derived and spreadsheet-based spectra on the same numerical basis before they are compared with light-source spectra. The reference concentration and path length are configurable parameters rather than fixed assumptions, and the selected values are written to the output tables so that the assumptions used in the analysis remain visible and reproducible.

Experimental spectra are treated as absorbance directly and passed through the same absorptance definition:

image file: d6dd00247a-u5.tif

The runtime Beer–Lambert reference state and wavelength-grid settings used in the manuscript are listed in Section S3 of the SI.

Light-source-dependent overlap descriptors

The workflow compares each molecular absorptance spectrum with one or more reference light-source spectra. Built-in light sources include the AM 1.5G solar spectrum and representative CIE LED and fluorescent illuminants.36–39 Users can also supply custom light-source spectra as two-column files containing wavelength and intensity values. This flexibility allows the same molecular dataset to be evaluated under outdoor solar irradiation, standard indoor illuminants, or public experimental lamp spectra.

For each sample and light source, the workflow computes the following integrals over the selected wavelength grid:

 
absorbedflux = ∫α(λ)I(λ)[thin space (1/6-em)]dλ (8)
 
light_flux_total = ∫I(λ)[thin space (1/6-em)]dλ (9)
 
absorbed_fraction = absorbed_flux / light_flux_total (10)
 
shape_overlap = [∫min(α(λ),Ĩ(λ))[thin space (1/6-em)]dλ]/[∫Ĩ(λ)[thin space (1/6-em)]dλ] (11)

Here I(λ) is the light-source spectral distribution and hats indicate max-normalised profiles. Absorbed_fraction reports the fraction of the available spectral source captured by the absorptance profile within the grid, while shape_overlap focuses on the relative matching of spectral shapes. The latter is particularly useful for CIE LED and fluorescent spectra, which are provided as relative spectral power distributions.

Once the molecular absorptance and the light-source spectrum are defined on the same wavelength grid, the absorbed flux, absorbed fraction, and shape-overlap descriptors are calculated numerically:

image file: d6dd00247a-u6.tif

The light-source definitions and descriptor equations used in the demonstration are summarised in Sections S6 and S7 of the SI.

Command-line interface

The command-line interface is designed around two main steps. First, generate-input scans a directory and writes a manifest. Second, analyse processes the manifest and writes tables and plots to an output directory. This separation is useful for publication workflows because the manifest can be archived together with the raw inputs and output tables. The command-line execution is shown below:
image file: d6dd00247a-u7.tif

The most relevant runtime parameters are the broadening width, wavelength range, number of grid points, reference concentration, path length, default light sources, custom light-source files, plot generation flag, and logging configuration. This two-step structure allows the dataset description to be inspected and archived before the numerical analysis is executed, and it makes it possible to rerun the same analysis with different settings without changing the raw input files.

HTTP API

For deployment or integration with a web interface, the package also exposes a lightweight HTTP API. The/health endpoint provides a liveness check, while/analyze accepts multipart uploads containing mixed theoretical and experimental files, optional custom light sources, and runtime parameters. The response is a ZIP archive containing the same file structure generated by the command-line workflow. This makes the tool usable both as a local research script and as a reproducible service:
image file: d6dd00247a-u8.tif

Choice of broadening

The single user-facing width parameter σ, used to convert TD-DFT stick spectra into a continuum, is an explicitly phenomenological screening parameter. For the Lorentzian reconstruction, the half-width-at-half-maximum is defined as image file: d6dd00247a-t4.tif, ensuring that the Gaussian and Lorentzian line shapes have matched FWHM values. It is identical for every transition, is fully under user control (default σ = 0.30 eV), and is not derived from any underlying physical width: no reorganisation energy is partitioned into the band shape, no Huang–Rhys factors are evaluated, and no Duschinsky mixing is included. This convention is retained because the workflow is a descriptor calculator built on top of an absorption profile, not a band-shape simulator, and because a single transparent parameter is more useful for screening than a hidden composite of unverified assumptions; to let users audit robustness, the pipeline emits Gaussian, Lorentzian, and image file: d6dd00247a-u9.tif columns side by side and supports a σ sweep. When quantitative band shapes are required—vibronic progressions, mirror-image symmetry with fluorescence, accurate 0–0 placement, or low-temperature spectra—this convenience path is not the recommended route. Two physically grounded alternatives feed the same descriptor pipeline. First, a Marcus-type Gaussian width, image file: d6dd00247a-t5.tif, is a built-in broadening mode image file: d6dd00247a-u10.tif with image file: d6dd00247a-u11.tif and image file: d6dd00247a-u12.tif library entry point marcus_hush_sigma_ev that assigns the classical high-temperature width per transition before broadening; at λ = 0.30 eV and T = 298.15 K this gives σ = 0.124 eV (FWHM 0.292 eV), the classical limit, so for organic vibrational baths with ħωkBT it is a first-order estimate. Second—recommended for production work—a fully vibronic Franck–Condon or Franck–Condon plus Herzberg–Teller spectrum computed with Gaussian (Freq = FC), DUSHIN, or FCclasses, exported on a wavelength grid and ingested through the tabular-input branch.40–44

Output structure

The output directory is divided into tables and plots. The tables directory contains results in CSV, JSON, and XLSX formats, a descriptor summary, timing information, and a skipped-input report if applicable. The plots directory contains per-sample absorptance spectra, light-source spectra, overlap plots, and multi-sample overlays. All TIFF plots are intended for manuscript preparation and are produced in both absolute and max-normalised forms.

The main output table contains one row per sample and light-source pair, with separate columns for Gaussian and Lorentzian broadening. The most important descriptors are the peak absorptance wavelength, peak absorptance, absorbed flux, absorbed fraction, and shape overlap (Table 2). Additional delta columns report the difference between Gaussian and Lorentzian metrics. Because the tables and plots are generated from the same documented workflow, the numerical descriptors and visual outputs remain consistent with one another.

Table 2 Main descriptors exported by overlap-calculator and their suggested use
Descriptor Definition or interpretation Suggested use
absorbed_flux Integral of absorptance multiplied by the light-source spectrum Quantifies source-weighted absorbed light; unit depends on the source spectrum
absorbed_fraction absorbed_flux divided by total light-source flux over the grid Compares how much of the selected source is absorbed by a sample
shape_overlap Integral comparator using max-normalised absorptance and light-source shapes Useful when light-source data are relative, especially for LED and fluorescent spectra
Delta metrics Gaussian minus Lorentzian descriptor values Checks sensitivity to the broadening model
descriptor_summary Condensed per-sample table across light sources Starting point for ranking candidate absorbers


Two grouped ranking tables, image file: d6dd00247a-u13.tif (samples ordered within each light source) and image file: d6dd00247a-u14.tif (light sources ordered within each sample), are emitted in CSV/JSON/XLSX for all four-overlap metrics (gaussian/lorentzian × absorbed_fraction/shape_overlap). Companion bar charts accompany each table under image file: d6dd00247a-u15.tif. The generated output files and ranking-table families are listed in Section S8 of the SI.

Reproducibility and deployment

The workflow can be installed through a standard Python environment, Conda environment, or Docker image. Docker deployment is useful when the analysis is shared with collaborators who do not have the same local Python environment. For archival purposes, the final publication should provide a versioned GitHub release and a Zenodo DOI, together with the input manifest, example data, and output tables used in the manuscript. Command-line, API, Docker, software-environment, and test commands are provided in Sections S10–S13 of the SI.

The design of overlap-calculator is aligned with the FAIR Principles for Research Software (FAIR4RS), which adapt the FAIR Guiding Principles from data to executable software.45,46 The source code is openly developed under a permissive licence, every tagged release is archived with a persistent Zenodo DOI, and the dependency surface is reproducible through a pinned environment and a versioned container image, supporting the findable, accessible, and reusable dimensions; interoperability is addressed by reading widely supported tabular inputs and emitting structured, schema-stable outputs. Throughout this manuscript we use the term traceability in the sense of computational provenance: every analyze run writes a run_manifest.json at the output root recording the software version, git commit, UTC timestamp, the full resolved parameter set, the light-source set, and one entry per (source, sample, series, sheet) with the SHA-256 of each input file; each descriptor row additionally carries image file: d6dd00247a-u16.tif, image file: d6dd00247a-u17.tif, and image file: d6dd00247a-u18.tif, and is linked via the input manifest to its originating file, sheet, series, sample, and spectrum so that a third party can re-execute the analysis and reproduce the descriptors from the archived artefacts alone.27,47

Demonstration workflows

Rationale for the demonstrations

The manuscript uses two complementary demonstrations to separate the scientific application from the input-format validation. The demonstration set was designed to evaluate both input branches of overlap-calculator. Five Gaussian TD-DFT output files, generated previously by the author and reused here only as representative calculated-spectrum inputs,48 were included to demonstrate automated transition parsing, spectral reconstruction, Beer–Lambert absorptance conversion, and light-source-dependent descriptor generation. These files were not introduced as a new TD-DFT dataset, and the present analysis does not re-evaluate the molecular systems from the original study. Instead, the examples serve solely to show how overlap–calculator processes Gaussian TD-DFT outputs. The resulting broadened spectra were then compared with AM1.5G, CIE LED, and fluorescent illumination spectra (Fig. 2–4). In parallel, eight experimentally reported organic UV-vis spectra were arranged as spreadsheet inputs to evaluate the tabular-data branch of the workflow. These spectra were not intended as an experimental validation of the TD-DFT candidates; rather, they demonstrate that non-Gaussian UV-vis spectra can be ingested, converted to absorptance, and analysed using the same light-source-dependent descriptor pipeline. It should be emphasised that the PhotochemCAD-derived molecules used in the spreadsheet-input demonstration are not proposed here as OPV candidate absorbers. They were selected only as publicly available organic UV-vis spectra with diverse absorption profiles, in order to test whether overlap-calculator can ingest tabular experimental spectra, convert them into absorptance profiles, and process them through the same descriptor-generation pipeline used for TD-DFT-derived inputs. Therefore, this part of the demonstration validates the experimental/spreadsheet input branch of the workflow rather than the photovoltaic relevance of these particular molecules.
image file: d6dd00247a-f2.tif
Fig. 2 Representative TD-DFT-derived absorption profile reconstructed using Gaussian and Lorentzian broadening.

image file: d6dd00247a-f3.tif
Fig. 3 Representative spectral–overlap plot between a TD-DFT-derived absorptance spectrum and AM1.5G illumination.

image file: d6dd00247a-f4.tif
Fig. 4 Representative spectral–overlap plot between a TD-DFT-derived absorptance spectrum and LED B2 illumination.

The demonstration inputs are not a representative chemical space: the five TD-DFT files are a small B3LYP/6-31G donor–acceptor set, and the PhotochemCAD-derived spreadsheet spectra were chosen as well-characterised, strongly absorbing visible chromophores. They are used solely to exercise the two input branches of the workflow and are not proposed as a screening corpus or as OPV candidate absorbers.

Together, the demonstrations address three practical questions: (i) how Gaussian and Lorentzian broadening affect TD-DFT-derived overlap descriptors, (ii) whether candidate rankings change between outdoor and indoor light sources, and (iii) whether non-Gaussian UV-vis spectra arranged in Excel format can be converted into the same descriptor tables and plots as Gaussian output files. Details of the theoretical and spreadsheet-input branches used in the demonstration are provided in Sections S2–S4 of the SI.

For each candidate absorber, Gaussian TD-DFT output files are parsed to extract excited-state wavelengths and oscillator strengths. The spectra are reconstructed over the same wavelength grid using both broadening models. The resulting absorptance profiles are then integrated against AM1.5G, LEDB2, LEDB3, LEDB4, and CIEFL10 spectra. This produces a matrix of light-source-dependent descriptors suitable for ranking and visualisation.

TD-DFT-derived OPV screening case

We emphasise that the workflow is method-agnostic: the parser accepts Gaussian TD-DFT output files regardless of the functional and basis set used, and the tabular branch accepts any wavelength-gridded spectrum, whether computed at a higher level of theory, simulated vibronically, calibrated empirically, or measured experimentally. The B3LYP/6-31G level of the five demonstration files reflects the legacy inputs reused here only to demonstrate parsing, spectral reconstruction, broadening, and descriptor generation; it is not a recommendation for new excited-state calculations and does not restrict the applicability of the workflow. For new electronic-structure inputs, current excited-state TD-DFT benchmark studies for organic chromophores generally show that the accuracy depends strongly on the exchange–correlation functional, the basis set, and the charge-transfer character of the transition; range-separated hybrids such as CAM-B3LYP or ωB97X-D, as well as modern global hybrids such as M06-2X and PBE0, are therefore often preferred over uncalibrated conventional hybrids.49–52 To address the systematic errors documented in those benchmarks, overlap-calculator (v1.1.0 onward) ships a calibration block, supplied via—calibration PATH (CLI) or Calibration/load_calibration/apply_calibration (library), that applies a linear excitation-energy calibration Ecali = aEcalci + b, an oscillator-strength scaling fcal = α × f, and optional per-band sigma/reorganisation-energy overrides before broadening; the identity calibration (a = 1, b = 0, α = 1) is bit-identical to running without it. The tabular branch additionally accepts any externally pre-corrected spectrum.

Spreadsheet-input demonstration using public UV-vis spectra

To demonstrate the spreadsheet-input branch of the workflow, a separate UV-vis dataset of representative organic chromophores was prepared from publicly available PhotochemCAD/OMLC absorption spectra associated with the Taniguchi and Lindsey PhotochemCAD database.53

The molar extinction coefficient spectra were converted into absorbance-like profiles using A(λ) = ε(λ)cL with a fixed reference concentration of 1 × 10−5 M and a 1 cm optical path length. The reference concentration c and path length L are state parameters of the Beer–Lambert conversion, not fit parameters. In the optically thin limit image file: d6dd00247a-u19.tif is invariant under any global scaling of εcL, and image file: d6dd00247a-u20.tif is a monotone-saturating function of cL that scales linearly with it for A ≪ 1; only image file: d6dd00247a-u21.tif carries the absolute scale. For screening we therefore recommend choosing cL so that the peak reference absorbance lies in the range Amax ∈ [0.1, 1]—the well-conditioned regime of experimental cuvette spectroscopy. For typical organic chromophores (εmax ∼ 104–105 M−1 cm−1) the defaults c = 1 × 10−5 M and L = 1 cm satisfy this, and both values are written to every output row, so no iterative optimisation of c and L is required. The resulting values were arranged in an Excel workbook with wavelength_nm in the first column and each molecule as an independent spectral series.

This dataset is used only as a spreadsheet-input demonstration and is not intended as an experimental validation set for the TD-DFT-derived molecules (Fig. 5). It verifies that organic molecular UV-vis spectra supplied in tabular form can be ingested, converted to absorptance, compared with the same solar and indoor light-source spectra, and exported with the same output logic used for TD-DFT-derived spectra. The spreadsheet spectra were used only to demonstrate the tabular-input branch of the workflow and are not proposed as OPV candidate absorbers, as clarified in Section S4 of the SI.


image file: d6dd00247a-f5.tif
Fig. 5 Representative spectral–overlap plot between a spreadsheet-input absorptance spectrum derived from an experimentally reported UV-vis spectrum and AM1.5G illumination.

Unified descriptor generation from theoretical and spreadsheet inputs

Both input branches are routed to a common output structure. For TD-DFT files, the workflow first reconstructs Gaussian- and Lorentzian-broadened spectra from discrete excited states before applying the Beer–Lambert absorptance model. For spreadsheet spectra, the supplied profile is treated as absorbance and converted directly into absorptance. After this point, both branches are compared with the same light-source library and exported as descriptor tables, summary files, skip reports, timing tables, and TIFF visualisations.

This design is useful because it allows a project to start from TD-DFT screening data and later incorporate measured UV-vis spectra without changing the downstream analysis. The same descriptors can therefore be used for exploratory ranking, experimental follow-up, and reproducibility checks, while clearly distinguishing between calculated spectra, measured absorbance spectra, and public spreadsheet demonstration data.

Source-dependent ranking of candidate absorbers

The ranking analysis shows that the spectral compatibility of absorbers depends on the selected illumination source. Under AM1.5G, broad absorption across the visible region may increase absorbed flux. Under LED or fluorescent sources, however, a narrower but better-aligned absorptance profile may provide a higher shape_overlap (Fig. 6).
image file: d6dd00247a-f6.tif
Fig. 6 Source-dependent spectral-overlap ranking generated by overlap-calculator. (a) Sample ranking under AM1.5G illumination based on Gaussian spectral shape-overlap values. (b) Light-source-dependent Gaussian shape-overlap values for the representative TD-DFT-derived absorber 3e_td.

Therefore, the workflow supports a design logic in which absorbers are not ranked by a single universal optical descriptor but by their compatibility with the intended operating environment. The top-ranked samples under each light source are provided in Table S7 of the SI.

The ranking analysis confirms that the spectral compatibility of the tested absorbers is strongly source-dependent. Under AM1.5G illumination, 4c_td gave the highest Gaussian shape-overlap value of 0.7952, followed by 3e_td with 0.7173 and 3g_td with 0.6437. In contrast, under LEDB4 illumination, 4c_td remained the top-ranked sample with a Gaussian shape-overlap value of 0.9643, followed by 3g_td with 0.9536 and 3e_td with 0.9374. For the representative TD-DFT-derived absorber 3e_td, the highest shape-overlap values were obtained under LEDB2, LEDB3, LEDB4, and CIEFL10, whereas the AM1.5G overlap was lower. This demonstrates that absorber ranking cannot be inferred from λmax alone and should be evaluated with respect to the intended illumination environment.

Scope and extensibility of the workflow

The main contribution of overlap-calculator is the standardisation of a recurring interpretation step in OPV screening. A small in-house script can answer a specific question for a single dataset, but it is difficult to reuse, document, and audit unless its assumptions are made explicit. By separating input definition, parsing, spectral reconstruction, light-source integration, descriptor calculation, and output export, the present workflow makes each assumption visible. This is particularly valuable when the analysis is used to support publication claims, compare molecule families, or share data with collaborators.

For absolute irradiance spectra such as AM1.5G, absorbed_flux has a direct physical meaning within the simplified absorptance model. For relative indoor illuminant spectra, however, the absolute scale is arbitrary. In those cases, absorbed_fraction and shape_overlap are more informative for comparing molecules under the same light source. Shape overlap is especially useful when the goal is to quantify how well the positions and widths of absorption bands align with the emission profile of a lamp. However, it should not be interpreted as a device efficiency or as a replacement for experimental external quantum efficiency.

The descriptors reported by overlap-calculator are optical compatibility descriptors. They do not include exciton diffusion, morphology, donor–acceptor interfacial energetics, charge separation, transport, recombination, electrode losses, or optical interference in a real device stack. Therefore, they should be used as early-stage screening descriptors rather than final performance predictors. Their strength is that they quantify a necessary optical condition: if a candidate absorber does not overlap with the intended illumination source, high device performance is unlikely regardless of favourable electronic descriptors.

The current workflow has several limitations. TD-DFT reconstruction depends on the selected functional, basis set, solvent model, number of excited states, wavelength range, and broadening width. Beer–Lambert absorptance uses a simplified reference concentration and path length, and film optical constants and device-stack effects are not included. Experimental spectra are assumed to be absorbance spectra with a valid wavelength axis, so additional preprocessing may be needed for noisy or baseline-shifted data. Relative CIE LED and fluorescent spectra do not provide absolute absorbed power unless a calibrated intensity scale is supplied. The current descriptors quantify spectral compatibility, not EQE, Jsc, PCE, or long-term device stability. Interpretive limitations of the optical-overlap descriptors are discussed in Section S14 of the SI.

Conclusions

The overlap-calculator provides a reproducible Python workflow for quantifying spectral compatibility between molecular absorbers and reference light sources. By accepting both Gaussian TD-DFT outputs and spreadsheet-based UV-vis spectra, reconstructing or importing absorption profiles, converting them into absorptance, and integrating them against AM1.5G, CIE LED, fluorescent, or custom spectra, the workflow transforms a previously manual interpretation step into a documented and batch-compatible analysis.

The resulting descriptors, including absorbed flux, absorbed fraction, and shape overlap, support light-source-dependent ranking of TD-DFT-derived OPV candidate absorbers while preserving transparency about the assumptions used in the calculation. The package is intended as a lightweight post-processing layer for computational and experimental optoelectronic materials research rather than as a replacement for electronic-structure calculations, spectroscopic measurements, or device-level photovoltaic characterization.

Future versions can extend the workflow in several directions. Support for EQE or IPCE spectra would allow the same light-source integration logic to estimate source-specific current-density limits. A graphical interface could make the workflow accessible to users who are less comfortable with command-line tools. Additional lamp libraries, calibrated spectral irradiance data, and user-defined spectral normalisation modes could further improve indoor PV analysis, while direct export of ranked summary reports could make the package more useful for high-throughput screening studies.

Author contributions

Pinar Seyitdanlioglu: conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing – original draft, writing – review and editing, visualization.

Conflicts of interest

There are no conflicts to declare.

Data availability

The source code, README file, user manual, worked case studies, example input files, spreadsheet demonstration data, light-source spectra, and output tables supporting this study are available at: https://github.com/pinarsyt/overlap-calculator. A versioned archival release containing the same reproducibility materials is available at Zenodo: https://zenodo.org/records/20579269. Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6dd00247a.

Acknowledgements

The author thanks Gökay Öztürk for his valuable support and encouragement during the preparation of this manuscript. The author also gratefully acknowledges her family for their continuous support, encouragement, and patience throughout the preparation of this work. AI-assisted language-editing tools were used to improve grammar, clarity, and readability; all scientific content and interpretations were reviewed and approved by the author. The TD-DFT calculations used in this paper were performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). The open access publishing fees for this article were covered by the institutional agreement between the Scientific and Technological Research Council of Türkiye (TÜBİTAK) and the Royal Society of Chemistry (RSC).

References

  1. B. Kippelen and J.-L. Brédas, Energy Environ. Sci., 2009, 2, 251–261 RSC.
  2. G. J. Hedley, A. Ruseckas and I. D. W. Samuel, Chem. Rev., 2017, 117, 796–837 CrossRef CAS PubMed.
  3. G. Zhang, F. R. Lin, F. Qi, T. Heumüller, A. Distler and H.-J. Egelhaaf, et al., Chem. Rev., 2022, 122, 14180–14274 Search PubMed.
  4. M. Jahandar, S. Kim and D. C. Lim, ChemSusChem, 2021, 14, 3449–3474 CrossRef CAS PubMed.
  5. X. Liu, S. Xu, B. Tang and X. Song, Chem. Eng. J., 2024, 497, 154944 CrossRef CAS.
  6. Y. J. You, C. E. Song, Q. V. Hoang, Y. Kang, J. S. Goo and D. H. Ko, et al., Adv. Funct. Mater., 2019, 29, 1901171 CrossRef.
  7. S. Biswas, Y. Lee, H. Choi, H. W. Lee and H. Kim, RSC Adv., 2023, 13, 32000–32022 RSC.
  8. A. Venkateswararao, J. K. Ho, S. K. So, S.-W. Liu and K.-T. Wong, Mater. Sci. Eng., R, 2020, 139, 100517 CrossRef.
  9. M. Freunek, M. Freunek and L. M. Reindl, IEEE J. Photovolt., 2012, 3, 59–64 Search PubMed.
  10. H. K. H. Lee, J. Wu, J. Barbé, S. M. Jain, S. Wood and E. M. Speller, et al., J. Mater. Chem. A, 2018, 6, 5618–5626 Search PubMed.
  11. A. Venkateswararao and K. T. Wong, Bull. Chem. Soc. Jpn., 2021, 94, 812–838 CrossRef CAS.
  12. M. Li, F. Igbari, Z. K. Wang and L. S. Liao, Adv. Energy Mater., 2020, 10, 2000641 CrossRef CAS.
  13. Y. Cui, L. Hong and J. Hou, ACS Appl. Mater. Interfaces, 2020, 12, 38815–38828 CrossRef CAS.
  14. Y. Cui, Y. Wang, J. Bergqvist, H. Yao, Y. Xu and B. Gao, et al., Nat. Energy, 2019, 4, 768–775 CrossRef CAS.
  15. A. Chakraborty, G. Lucarelli, J. Xu, Z. Skafi, S. Castro-Hermosa and A. B. Kaveramma, et al., Nano Energy, 2024, 128, 109932 CrossRef CAS.
  16. J. Hachmann, R. Olivares-Amaya, S. Atahan-Evrenk, C. Amador-Bedolla, R. S. Sánchez-Carrera and A. Gold-Parker, et al., J. Phys. Chem. Lett., 2011, 2, 2241–2251 CrossRef CAS.
  17. S. A. Lopez, E. O. Pyzer-Knapp, G. N. Simm, T. Lutzow, K. Li and L. R. Seress, et al., Sci. Data, 2016, 3, 160086 CrossRef PubMed.
  18. M. Bourass, A. T. Benjelloun, M. Benzakour, M. Mcharfi, M. Hamidi, S. M. Bouzzine and M. Bouachrine, Chem. Cent. J., 2016, 10, 67 CrossRef PubMed.
  19. J. Yan, X. Rodríguez-Martínez, D. Pearce, H. Douglas, D. Bili and M. Azzouzi, et al., Energy Environ. Sci., 2022, 15, 2958–2973 RSC.
  20. S. Rani, N. Al-Zaqri, J. Iqbal, S. J. Akram, A. Boshaala and R. F. Mehmood, et al., RSC Adv., 2022, 12, 29300–29318 RSC.
  21. S. R. Bora and D. J. Kalita, RSC Adv., 2023, 13, 26418–26429 RSC.
  22. D. Lübke, P. Hartnagel, J. Angona and T. Kirchartz, Adv. Energy Mater., 2021, 11, 2101474 CrossRef.
  23. M. A. Saeed, Appl. Phys. Lett., 2025, 127, 130501 CrossRef CAS.
  24. S. Y. Park, C. Labanti, J. Luke, Y. C. Chin and J. S. Kim, Adv. Energy Mater., 2022, 12, 2103237 CrossRef CAS.
  25. X. Hou, Y. Wang, H. K. H. Lee, R. Datt, N. U. Miano and D. Yan, et al., J. Mater. Chem. A, 2020, 8, 21503–21525 RSC.
  26. A. S. Tarleton, J. C. Garcia-Alvarez, A. Wynn, C. M. Awbrey, T. P. Roberts and S. Gozem, J. Phys. Chem. A, 2022, 126, 435–443 CrossRef CAS PubMed.
  27. C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil and M. R. Crusoe, et al., Data Intell., 2020, 2, 108–121 CrossRef.
  28. J. Popp and T. Biskup, Chem.: Methods, 2022, 2, e202100097 CAS.
  29. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman and et al., Gaussian 16, Revision C.02, Gaussian, Inc, Wallingford CT, 2019 Search PubMed.
  30. G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari and B. Kozinsky, Comput. Mater. Sci., 2016, 111, 218–230 CrossRef.
  31. A. S. Rosen, M. Gallant, J. George, J. Riebesell, H. Sahasrabuddhe and J. X. Shen, et al., J. Open Source Softw., 2024, 9, 5995 CrossRef.
  32. J. Janssen, S. Surendralal, Y. Lysogorskiy, M. Todorova, T. Hickel, R. Drautz and J. Neugebauer, Comput. Mater. Sci., 2019, 163, 24–36 CrossRef CAS.
  33. J. Janssen, J. George, J. Geiger, M. Bercx, X. Wang and C. Ertural, et al., Digital Discovery, 2025, 4, 3149 RSC.
  34. R. C. Hilborn, Am. J. Phys., 1982, 50, 982–986 CrossRef CAS.
  35. R. S. Mulliken, J. Chem. Phys., 1939, 7, 20–34 CrossRef CAS.
  36. International Commission on Illumination, Relative spectral power distributions of illuminants representing typical LED lamps, 1 nm spacing, CIE Central Bureau, Vienna, 2018, DOI:  DOI:10.25039/CIE.DS.dhcw57sd.
  37. International Commission on Illumination, Relative spectral power distributions of illuminants representing typical fluorescent lamps, 1 nm wavelength steps, CIE Central Bureau, Vienna, 2018, DOI:  DOI:10.25039/CIE.DS.54hy6srn.
  38. International Commission on Illumination, CIE 015:2018 Colorimetry, CIE Central Bureau, 4th edn, Vienna, 2018 Search PubMed.
  39. National Renewable Energy Laboratory, Reference Air Mass 1.5 Spectra (ASTM G-173-03), U.S. Department of Energy, 2003 Search PubMed.
  40. R. A. Marcus and N. Sutin, Biochim. Biophys. Acta, Rev. Bioenerg., 1985, 811, 265–322 CrossRef CAS.
  41. S. F. Nelsen, S. C. Blackstock and Y. Kim, J. Am. Chem. Soc., 1987, 109, 677–682 Search PubMed.
  42. F. Santoro, A. Lami, R. Improta and V. Barone, J. Chem. Phys., 2007, 126, 184102 Search PubMed.
  43. J. Bloino, M. Biczysko, F. Santoro and V. Barone, J. Chem. Theory Comput., 2010, 6, 1256–1274 CrossRef CAS.
  44. J. R. Reimers, J. Chem. Phys., 2001, 115, 9103–9109 CrossRef CAS.
  45. M. Barker, N. P. Chue Hong, D. S. Katz, A. L. Lamprecht, C. Martinez-Ortiz and F. Psomopoulos, et al., Sci. Data, 2022, 9, 622 Search PubMed.
  46. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton and A. Baak, et al., Sci. Data, 2016, 3, 160018 CrossRef PubMed.
  47. S. Soiland-Reyes, P. Sefton, M. Crosas, L. J. Castro, F. Coppens and J. M. Fernández, et al., Data Sci., 2022, 5, 97–138 CrossRef.
  48. P. Seyitdanlioglu, J. Mol. Model., 2026, 32, 8 CrossRef CAS PubMed.
  49. A. D. Laurent and D. Jacquemin, Int. J. Quantum Chem., 2013, 113, 2019–2039 CrossRef CAS.
  50. C. Adamo and D. Jacquemin, Chem. Soc. Rev., 2013, 42, 845–856 RSC.
  51. R. Send, M. Kühn and F. Furche, J. Chem. Theory Comput., 2011, 7, 2376–2386 Search PubMed.
  52. D. Jacquemin, V. Wathelet, E. A. Perpete and C. Adamo, J. Chem. Theory Comput., 2009, 5, 2420–2435 CrossRef CAS PubMed.
  53. M. Taniguchi and J. S. Lindsey, Photochem. Photobiol., 2018, 94, 290–327 CrossRef CAS PubMed.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.