Open Access Article
Iqra Sarfraz
,
Sergei F. Vyboishchikov
,
Miquel Solà
* and
Albert Artigas
*
Institut de Química Computacional i Catàlisi (IQCC) and Departament de Química, Universitat de Girona, C/Maria Aurèlia Capmany, 69, 17003 Girona, Catalonia, Spain. E-mail: albert.artigas@udg.edu; miquel.sola@udg.edu
First published on 9th March 2026
Singlet fission (SF) is a physical phenomenon exhibited by some families of organic materials potentially able to boost the power conversion efficiency of solar cells beyond the theoretical Shockley–Queisser limit of 33%. To experience SF, a molecule must fulfil the so-called energy matching conditions (EMCs), which can be evaluated using DFT and TD-DFT calculations. Here, we propose a simple protocol that exploits machine learning workflows to screen large libraries of molecules using a reduced number of quantum chemical calculations. The protocol is based on the AQME and ROBERT platforms and is adapted to users with no experience in data science and basic computational chemistry knowledge. Using this approach, we screened a library of 3835 benzannulated biphenylenes to identify 505 candidates fulfilling the first EMC for SF. Fragment-based statistical analysis was employed to rationalize the structural features associated with SF. The workflow is general and can be applied to other families of compounds for the accelerated discovery of SF materials.
For a molecule to undergo SF, it must satisfy at least two key energy matching conditions (EMCs): (i) the lowest singlet excited state (S1) must have at least twice the energy of the lowest triplet state (T1) to ensure exergonic SF, and (ii) the second triplet state (T2) must also exceed twice the T1 energy to prevent triplet–triplet annihilation (TTA).1 However, these two energetic criteria alone do not guarantee efficient SF.16 The chromophore must also exhibit a high absorption coefficient and an appropriate bandgap to capture high-energy photons. A fast SF rate is essential to outcompete other deactivation pathways, such as charge transfer from the S1 state to the acceptor, internal conversion to S0, or excimer formation. Likewise, the material should display a slow triplet–triplet annihilation (TTA) rate. In practical device configurations, additional electronic factors become crucial. For instance, proper alignment between the T1 energy and the acceptor's bandgap (≈1.1 eV for silicon-based solar cells)17 or compatible frontier orbital energies (e.g., LUMO matching in organic photovoltaics) to ensure efficient triplet harvesting.18 Moreover, practical considerations also include chemical stability and synthetic feasibility, which are essential for real-world implementation.
Considering all these aspects, the screening of molecular systems fulfilling the first EMC remains a good starting point for SF material discovery. In this context, the identification of new SF materials has largely relied on in silico studies guided by these EMCs.6 A paradigmatic example is the rational design of 1,3-diphenylisobenzofuran, which is considered the first rationally engineered SF chromophore.19–21 Subsequent works have demonstrated that SF is operational in other open-shell systems.22–24
Excited state engineering can also be achieved through aromaticity manipulation. In this sense, polycyclic systems exhibiting ground-state (Hückel) antiaromaticity and excited-state (Baird) aromaticity show simultaneous stabilization of S1 and T1 states, enabling tailored SF behavior through aromatic/antiaromatic character modulation.25 A more sophisticated approach involves the design of the so-called aromatic chameleons (Fig. 1c), which are defined as π-conjugated compounds that can modify their π-electron distributions to satisfy various aromaticity laws in various electronic states, in particular, in the S0 and T1 states.26–31 Compounds that are aromatic in S0 and T1 states are candidates for SF applications because, commonly, they have a small singlet–triplet energy gap. While the computational search for new SF has been so far a fruitful strategy, evaluating the EMCs for large compound libraries is often tedious and computationally expensive, as it requires relatively accurate DFT and TD-DFT calculations. To accelerate this discovery phase, many groups have recently turned their attention to genetic algorithms32 and machine learning (ML) techniques.33–38 However, ML protocols can be complex to implement and reproduce, particularly for non-expert researchers. To address this challenge, here we propose a streamlined workflow using ROBERT39 and AQME40 packages—developed and maintained by the Alegre-Requena group—to efficiently screen large libraries of potential SF chromophores. This workflow is ideally suited for organic and materials chemists seeking to identify promising synthetic targets for SF applications.
![]() | ||
| Fig. 2 Overview of the ML protocol employed in this work to predict S1/T1 ratio of benzannulated biphenylenes. | ||
The first step of the protocol involved generating a library of benzannulated biphenylenes (step 1 in Fig. 2). To achieve this, we employed an in-house developed Python code44 that successively attaches benzene rings to a given input geometry (xyz format) in a cata-condensed fashion (see Scheme S1 in the SI for further details). For our study, we considered six generations of cata-benzannulated biphenylenes. Thus, the program generated a family of 3835 distinct molecules, each containing 2–8 benzene rings fused to the Hückel antiaromatic cyclobutadiene core. The molecules are represented as graphs using Python's NetworkX module.45 Importantly, the program performs a preliminary optimization of each structure with the Merck molecular force field (MMFF)46 as implemented in RDKit.47 Redundancy among newly generated molecules is assessed by computing the InChIKey48 for each structure and checking its presence against the set of previously generated structures within the same generation. The final results are compiled into a single multi-xyz output file. Next step was the obtention of target values (step 2 in Fig. 2). Accordingly, 300 members of the library were randomly selected and transformed into multi-job Gaussian 1649 input files. The S0 and T1 states of this set of molecules were optimized with the (U)M06-2X/def2-SVP method and their energies were obtained at the (U)M06-2X/def2-TZVPP//(U)M06-2X/def2-SVP (gas phase) level of theory.50,51 This level of theory was selected based on previous results.6,25 Vertical S1 energies were obtained from TD-DFT calculations at the S0 geometries at the same level of theory. Input file generation could be done automatically with a single command line using the QPREP module included in the AQME platform. Successfully terminated calculations with no imaginary frequencies (280 out of 300) were identified and the energies of each state were extracted and assembled in a spreadsheet that contained an identification code for each molecule together with their computed S1/T1 ratio.
We are aware that the use of vertical energies for the S1 state results in somewhat overestimated S1/T1 ratios.52 Additionally, each molecule was annotated with its SMILES code using the Open Babel program,53 a requirement for the subsequent generation of chemical descriptors to be used during model training.
Training set generation and model training (steps 3 and 4 in Fig. 2) were carried out using the AQME and ROBERT platforms, respectively. Both processes are fully automated and can be executed with a single command line thanks to their streamlined integration. First, AQME's CSEARCH and CMIN modules perform an RDKit based conformational search, followed by xTB54 geometry optimization starting from SMILES strings. Then, the QDESCP module generates three distinct databases containing a number of Boltzmann weighted xTB and RDKit descriptors, together with the desired target values (in our case, the S1/T1 ratio). These three databases—denoted full, denovo, and interpret—contain descriptor sets organized across increasing levels of human interpretability. The interpret database, which contains 22 human-interpretable molecular descriptors, was selected for ML model training due to its lower dimensionality, reduced redundancy, and enhanced interpretability. This database was then treated with the ROBERT platform (step 4 in Fig. 2), which operates through 4 consecutives modules. First, the CURATE module improves data quality by automatically suppressing highly correlated variables.
Next, the GENERATE module screens diverse combinations of built-in scikit-learn55 algorithms—by default, random forest (RF), gradient boosting (GB), multivariate linear regression (MVL), and dense neural networks (NN) as implemented in sklearn.neural_network. MLPRegressor class—along with different partition sizes. It also employs permutation feature importance (PFI) analysis56 to quantify the relative contribution of each descriptor to the model's predictive performance. Additionally, SHapley Additive exPlanations (SHAP)57,58 analysis is used to assess the contribution of each descriptor to the model predictions. This method provides feature importance values based on Shapley theory.59 In this approach, the values of each descriptor are randomly permuted, and the resulting decrease in model accuracy is used as a measure of its importance. The VERIFY module then carries out a series of tests to assess the performance and reliability of the ML predictors produced by the GENERATE module. Finally, the PREDICT module computes a set of performance metrics for the selected models. All results are compiled into a comprehensive report delivered as a pdf file, which also includes a 1-to-10 score derived from the model's performance metrics. The details regarding the ROBERT platform can be found the original publications39,40 and the online documentation websites.
The most relevant results obtained from the first round of model training can be found in the corresponding ROBERT report (see SI). Out of the 280 molecules randomly chosen from the family of 3836 compounds, 224 (80%) were allocated to the training set for 10-times repeated 5-fold cross-validation, while the remaining 56 (20%) were set aside by the program as the test set. The best-performing models involved either a NN algorithm using 6 descriptors [model without PFI filtering, denoted as NN (No PFI)] or an MVL model using 4 descriptors (model with PFI filtering). In both cases, the overall score was 8/10. The NN (No PFI) model yielded particularly strong metrics: R2 = 0.92, MAE = 0.046, and RMSE = 0.066, with only 3.6% outliers in both the test set. In both models, PFI and SHAP analyses identified the xTB-computed S0–T1 gap as the most important descriptor, which is fully consistent with the target variable under study. While the model successfully passed most of the diagnostic tests performed by the VERIFY module, a major issue was identified regarding data distribution (Fig. 3). That is, 86% of the data points were found concentrated in the first two quartiles, which may significantly limit the model's applicability. This bias in data distribution could limit the model's ability to reliably predict molecules with S1/T1 ≥ 2, which are precisely the desired targets.
![]() | ||
| Fig. 3 Data distribution of target values within CV (10 times repeated 5-fold cross-validation) and test sets. | ||
Taking this into account, we used the NN (No PFI) model to select additional compounds predicted to belong to the upper quartiles of the S1/T1 ratio distribution (step 5 in Fig. 2). While this model was expected to be less reliable in the 2.0–2.6 range, it could still identify molecules for which S1/T1 ≥ 2 with remarkable accuracy. Indeed, all picked systems (100 systems with predicted S1/T1 in the 2.25–2.6 range, see SI) were found to satisfy the first EMC. Based on these predictions, a new training set was assembled by repeating steps 2 and 3 of the protocol.
This new set comprised 369 molecules, including the original 280 systems used in round 1 and 89 extra data points with computed S1/T1 in the 2.2–3.0 range (Fig. 4a). The best algorithm identified in a second round of ROBERT model training again the NN (No PFI), which employed 6 AQME-generated descriptors (Fig. 4b): S0–T1 gap, second electron affinity, G of H bonds H2O, electronegativity, MolLogP and dipole moment.† The xTB-computed S0–T1 gap once more emerged as the most important contributor based on PFI and SHAP analyses (see ROBERT report in the SI). The overall model score remained at 8/10, and excellent test-set performance metrics were computed using the PREDICT module: R2 = 0.96, MAE = 0.057, RMSE = 0.076 (Fig. 4c). The proportion of outliers was 3.7% for the cross validation (CV) set and 5.4% for the test set. In terms of binary (qualitative) prediction, an accuracy of 96.2% and an F-score of 0.95 was obtained. Moreover, all false negatives yielded predicted S1/T1 values above 1.90, very close to the 2.0 singlet-fission threshold. Likewise, all false positives have actual S1/T1 that are also above 1.90.
Despite the good predictive metrics obtained, it should be noted that only the three most relevant descriptors identified by PFI and SHAP analyses (Fig. 4b) show a clear connection to the molecular electronic structure and are thus related to the energetic requirements for SF at some extent. In contrast, the remaining descriptors (i.e. G of H bonds H2O, dipole moment, and MolLogP) appear to lack a direct physical relationship with the process. For this reason, we selected the PFI-filtered model [denoted NN (PFI) in Fig. 4c and d], which includes only the three most relevant and chemically interpretable descriptors (i.e. S0–T1 energy gap, electronegativity, and second electron affinity). This simplified model maintained excellent performance (R2 = 0.94, MAE = 0.066, RMSE = 0.092 for the test set) and achieved an even higher ROBERT score of 9/10. The consistency between descriptor relevance (PFI) and chemical interpretability confirms that the model's predictive power is driven by physically meaningful features. Moreover, this outcome highlights the importance of combining statistical feature selection with chemical interpretability to ensure that the resulting models remain both accurate and insightful.
To further evaluate the workflow's performance, this second round of model training was repeated using three different combinations of two descriptors and S0–T1 energy gap alone (see SI for details). While good performance (ROBERT score up to 9) was obtained using an NN and the S0–T1 gap alone or in combination with electronegativity, the predictive power in the range of interest was found poorer. Therefore, we retained the NN (PFI) model employing all three physically meaningful descriptors as the champion model. To assess the model's robustness, its performance was evaluated on an external random test set of 257 molecules (Fig. 4d). The results remained consistent, with R2 = 0.90, MAE = 0.053, and RMSE = 0.063. In terms of binary predictions, only 11 of 257 results are misclassified (6 false negatives plus 5 false positives). This corresponds to an accuracy of 95.7%, a precision (true positives divided by all positive predictions) of 86.8%, and an F-score of 0.86.
This model was finally employed to predict once more the S1/T1 value of all 3835 compounds in the library. As a result, 505 systems with predicted values ≥1.90 were identified as hit compounds for further investigation and development (see Fig. 5a for selected examples and the SI for the full list in multi-xyz format).
While the present study is primarily data-driven, some degree of chemical interpretation of the hit molecules is desirable. To rationalize the identified hits in terms of their benzannulation patterns, we first analyzed the occurrence of all fragments obtained by fusing one to four benzene rings to a single side of the cyclobutadiene core (fragments F1–F27 in Fig. S1) and compared their relative occurrence within the hit subset to that in the full library of 3835 compounds. Substructure searches were performed using functions available in the RDKit.Chem module.41 The results indicate that no simple, clear-cut prediction based on the presence of a single substructure is possible. Nevertheless, one fragment, namely the relatively simple cyclobuta[a]anthracene motif (F5 in Fig. 5b), exhibits an increased relative occurrence among the hit molecules. Using the presence of this fragment as a binary classifier yields 398 true positives, 661 false positives, 107 false negatives, and 2669 true negatives. This corresponds to an accuracy of 80%, a precision of 38%, a sensitivity of 79%, and an F-score of 0.51. In practical terms, this criterion allows a large fraction of unsuitable candidates to be correctly discarded. However, it is insufficiently selective, as the proportion of false positives (the structures that have the fragment in question but show S1/T1 ≤ 1.90) remains substantial, limiting its usefulness as a standalone predictor. None of the other investigated substructures display statistically meaningful predictive power. Complete statistics for all fragments are provided in the SI (Table S2).
The preference for the cyclobuta[a]anthracene motif indicates that linearly cata-condensed π-extension adjacent to the cyclobutadiene core favors SF propensity within the analyzed family. This trend is chemically consistent with the well-known behavior of linearly fused acenes, which are established SF chromophores.1,9 It should be noted, however, that isolated anthracene does not satisfy the first energetic matching criterion, whereas longer acenes such as tetracene and pentacene do. Our results suggest that, in benzannulated biphenylenes, a three-ring linear fragment can already be sufficient. This difference is attributed to the synergistic effect of linear conjugation and the presence of the antiaromatic core. From a practical perspective, this finding is also insightful, since shorter linear acene segments are generally more stable and synthetically more accessible than longer acenes. The latter typically suffer from reduced stability and handling difficulties. In contrast, kinked annulation patterns tend to shift the excited state energies outside the SF window. Intriguingly, the closely related and more linear cyclobuta[b]anthracene (F8 in Fig. 5b) exhibits a very low F-score of 0.03, suggesting that a strictly collinear alignment of the substituents across the two sides of the cyclobutadiene core is disfavored. To further examine the linear versus kinked annulation, we analyzed the occurrence of four fragments that correspond to second-generation structures (F28–F31 in Fig. S1). Among these, the two kinked isomeric motifs (F28 and F30) are significantly enriched among the hit compounds, with F-scores of 0.54. In contrast, the linear topology (F29 and F31) is strongly depleted, with an F-score of 0.03 and 0.01, respectively. Overall, the later results confirm that although locally linear benzannulation on each side of the cyclobutadiene core correlates with SF propensity, the most favorable architectures are not globally linear. Instead, an overall kinked topology—i.e., when considering the relative positioning of the fused benzene rings on the left and right sides of the cyclobutadiene core—is preferred.
Finally, in order to assess the robustness and transferability of the ML workflow, we applied the same descriptor set and training protocol to an independent dataset of 302 benzannulated dibenzopentalenes, a structurally distinct yet chemically related family of ground-state antiaromatic polycyclic molecules (Fig. 5d, see SI). The resulting model displays predictive performance comparable to that obtained for benzannulated biphenylenes (test set R2 = 0.85, MAE = 0.039, RMSE = 0.047; ROBERT score = 8/10), indicating that the learned structure–property relationships are not specific to a single molecular family. When both datasets are combined into a single, more chemically diverse training set, the predictive accuracy is further improved (R2 = 0.94, MAE = 0.052, RMSE = 0.073), demonstrating that the dominant descriptors remain meaningful across families and that increased chemical diversity enhances the overall generalization of the model.
Supplementary information (SI): computational details, tables with metrics of model trainings, chemical structure of the fragments considered, compound library (multi-xyz file), input and output datasets (.csv files), and ROBERT reports (pdf files). See DOI: https://doi.org/10.1039/d5tc04137f.
Footnote |
† S0–T1 gap refers to the energy difference between the ground singlet state and the lowest triplet excited state computed with xTB. The values are given in kcal/mol. Second EA refers to the energy difference between the dianion and the monoanion. It is estimated vertically from xTB-computed self-consistent-charge (SCC) energies of the monoanion and dianion plus an xTB correction factor. Values are given in eV. Electronegativity refers to the ability of a molecule to attract electrons. It is estimated as the average of the xTB-computed SCC ionization potential and the xTB-computed SCC electron affinity. Values are given in eV. G of H bonds H2O refers to the Gibbs energy of interaction through hydrogen bonds between the molecule and an implicit solvation model, the so-called analytical linearized Poisson–Boltzmann (ALPB) that uses water as the solvent. MolLogP refers to the logarithm of the molecule's partition coefficient between octanol and water, which estimates its lipophilicity (hydrophobicity). It is estimated with RDKit using the Wildman–Crippen fragment method,60 in which predefined atom-environment fragments contribute additively to the final value. This model assigns each atom a fragment constant based on its type and bonding context, and the molecular log P is obtained as the sum of these contributions. Dipole moment refers to the xTB computed molecular dipole moment module expressed in Debye (Note that the AQME and ROBERT programs incorrectly refer to this descriptor as dipole module). |
| This journal is © The Royal Society of Chemistry 2026 |