Haochen
Shi
ab,
Yiming
Shi
ab,
Pengcheng
Jiang
*c,
Bo
Qiao
ab,
Zhiqin
Liang
ab,
Suling
Zhao
ab and
Dandan
Song
*ab
aKey Laboratory of Luminescence and Optical Information, Beijing Jiaotong University, Ministry of Education, Beijing 100044, China
bInstitute of Optoelectronics Technology, Beijing Jiaotong University, Beijing 100044, China
cHebei Technology Innovation Center for Energy Conversion Materials and Devices, College of Chemistry and Materials Science, Hebei Normal University, Shijiazhuang 050024, China
First published on 17th September 2025
Unlocking the potential of circularly polarized thermally activated delayed fluorescence (CP-TADF) molecules for advanced optoelectronic applications necessitates an accurate understanding of the luminescent dissymmetry factor (glum). To develop a robust predictive framework, we first conducted a comprehensive statistical analysis of reported experimental |glum| values and their theoretical predictions, which revealed significant discrepancies between experimental and calculated results. Based on these findings, we employ machine learning (ML) models with Morgan fingerprints to predict |glum|, complemented by Klekota-Roth fingerprints and Shapley additive explanations for detailed structural insights. Using this predictive model, we perform inverse design of CP-TADF materials through a generative model based on a modified variational autoencoder, with |glum| as the objective function. This integrative approach successfully identifies CP-TADF molecules with both high |glum| values and favorable synthetic accessibility. Our framework serves as a powerful tool for the intelligent design of CP-TADF materials, bridging theoretical predictions with experimental realization and accelerating the discovery of next-generation CP-TADF materials.
A key parameter for evaluating the performance of CP-TADF materials is the circularly polarized luminescence dissymmetry factor (glum), which quantifies the degree of polarization in the emitted light. Recent studies on CP-TADF materials have primarily concentrated on the design of novel molecular structures to achieve higher |glum| values,13 alongside improving stability and efficiency.14 However, the development of CP-TADF materials is still limited by challenges in accurately predicting |glum| and understanding the structural factors that govern their chiroptical properties.
Machine learning (ML) techniques have become invaluable tools in materials science, offering advanced capabilities to analyze complex relationships between molecular structures and their properties. ML-driven quantitative structure–property relationship (QSPR) approaches have been successfully applied in various domains, including the prediction of material properties such as electronic energy levels,15 bond dissociation energies,16 and key photophysical parameters.17 Beyond material-centric studies, ML has also been employed to investigate OLED-specific properties.18 By handling high-dimensional data, ML enables both automatic and targeted feature selection, allowing researchers to identify the most relevant molecular descriptors and establish precise structure–property–performance relationships.
In our previous work, we demonstrated that ML-driven QSPR can predict the photoluminescence quantum yield (PLQY) of multi-resonance thermally activated delayed fluorescence (MR-TADF) materials, aiding the design of high-performance structures.19 In this work, we focus on the design and optimization of CP-TADF materials with enhanced circularly polarized |glum| values using ML techniques. Through a comprehensive analysis of the discrepancies between reported experimental |glum| values and theoretical predictions by density functional theory (DFT) calculations, we develop a robust ML framework to accurately predict and optimize |glum| for CP-TADF molecules. Furthermore, we enhance a variational autoencoder (VAE) model by incorporating |glum| as an optimization objective, enabling the inverse design and generation of novel CP-TADF structures with superior chiroptical properties.
In DFT-based calculations, glum is typically calculated using the following equation:21
and
represent electric dipole moment and magnetic dipole moment. This equation provides a theoretical framework to predict |glum| values, offering a systematic and efficient method for molecular evaluation.
To evaluate the reliability of DFT-based predictions, we conducted a detailed comparison between theoretically calculated |glum| values and experimentally reported data. As shown in Fig. 1 and Fig. S1, we selected 50 CP-TADF materials1–3,6,7,9,10,22–38 from various literature sources and computed performed with the electric dipole moments (
) and magnetic dipole moments (
) by DFT calculations using BDF software (B3LYP/6-31G(d,p)). It is worth noting that the def2-TZVP basis set is considered an excellent choice for dipole moment calculations.39 However, due to its high computational accuracy, our tests revealed that it imposes stringent convergence requirements in the calculations of CP-TADF materials. Moreover, the computation for a single material is both time- and energy-intensive, making it unsuitable for the high-throughput screening we aim to achieve. The comparison reveals notable inconsistencies between calculated and reported |glum| values, both in magnitude and trend. To further validate this observation, we performed additional calculations using Gaussian 1640 based on B3LYP/6-31G(d,p) functional and basis set on three materials sourced from different studies: BN-DCz, TRZ-MeIAc and SFOT. The calculated |glum| values for these materials were 0.019, 0.4765, and 0.4735, respectively, which deviate significantly from the reported experimental values of 0.0013, 0.00059, and 0.0022. The emergence of this difference could have many reasons, such as the choice of functionals, basis sets, and computational approximations. These limitations underscore the need for alternative approaches that can integrate experimental data with theoretical insights to improve prediction accuracy. To address these challenges, we adopted an ML-driven approach, which combines experimental data and computational insights.
To begin with, we selected three ML models for analysis: random forest (RF), extreme gradient boosting (XGBoost), and neural network (NN), training these models using Morgan fingerprints as input features to determine the best-performing model, with hyperparameter optimization carried out through Optuna.41 The scatter plots in Fig. 3(a) depict the predictive capabilities of these models. The data points from the RF model show less dispersion on the test set and align more closely with the ideal prediction line, demonstrating its superior ability to capture the critical factors influencing |glum| and achieve higher fitting accuracy. Meanwhile, the results shown in Table 1 indicate that the RF model achieves the highest Pearson coefficient (r = 0.97), the highest determination coefficient (R2 = 0.93), and the lowest root mean squared error (RMSE = 2 × 10−3) on the test set, outperforming both XGBoost and NN. The superior performance of the RF model can be attributed to its ability to effectively handle the high-dimensional (2048 dimensions) and complex nature of Morgan fingerprints, capturing key relationships between molecular structures and |glum| values with high accuracy. While XGBoost also shows strong predictive capabilities, its slightly lower performance compared to RF may stem from its sensitivity to hyperparameter tuning and its tendency to lose precision at higher |glum| values, as indicated by the scatter plots in Fig. 3(a). On the other hand, NN exhibits poor performance, with significant deviations from the diagonal line in both training and test sets, likely due to overfitting and its higher dependency on larger datasets for optimal generalization. Overall, these results establish RF as the most suitable model for predicting |glum|, offering a robust balance between predictive accuracy and consistency, particularly in scenarios involving limited datasets.
| Metric | RF | XGBoost | NN | |||
|---|---|---|---|---|---|---|
| Train | Test | Train | Test | Train | Test | |
| Root mean squared error (RMSE) | 2 × 10−3 | 3.2 × 10−3 | 4.8 × 10−3 | |||
| Pearson coefficient (r) | 0.98 | 0.97 | 0.83 | 0.92 | 0.98 | 0.86 |
| Determination coefficient (R2) | 0.92 | 0.93 | 0.65 | 0.86 | 0.86 | 0.70 |
Fig. 3(b) compares the performance of RF model on three descriptor systems: Morgan fingerprints, molecular descriptors (including Cheminformatics and E-state indices obtained from Padel42 and CDK Descriptor), and their combination in predicting |glum|. Among these, the Morgan fingerprint system demonstrates the best performance, achieving the highest correlation metrics (r and R2), along with the lowest prediction error (RMSE) on the test set. In contrast, the molecular descriptor system shows moderate performance, with slightly lower correlation metrics and higher prediction error compared to the Morgan fingerprint system. This suggests that molecular descriptors, which primarily capture global or statistical properties, may not be as effective in representing the local structural features critical for predicting |glum|. Interestingly, the combined system, which integrates both Morgan fingerprints and molecular descriptors, performs the worst. It results in higher prediction error and noticeably lower correlation metrics, likely due to the introduction of redundant or irrelevant features that interfere with the model ability to generalize effectively.
To further validate the predictive performance of our model on new data, we selected several recently reported CP-TADF materials31,43–45 that were not included in the previous training and testing datasets, and evaluated their predicted performance, as shown in Table S1. Overall, the predictions align reasonably well with the experimental results, with most data points located near the reference line, indicating the reliability and generalization capability of the RF model on unseen compounds. Deviations observed for certain molecules (e.g., CzCzP, R-DWBN) may arise from structural features or measurement conditions not fully captured in the training data, yet the model successfully preserves the relative ranking of |glum| across the new materials. This result highlights the robustness of the model in guiding the design and screening of novel high-performance CP-TADF candidates.
While Morgan fingerprints demonstrate strong predictive performance for modeling |glum|, their abstract and high-dimensional nature poses challenges in directly correlating predictions with specific structural features. To overcome this limitation, we employed KRFP combined with SHAP analysis, as illustrated in Fig. 3(c) and (d), to identify key structural factors influencing |glum| values. The analysis highlights the top 20 KRFP features with the most significant impact, among which the four most influential structures were visualized in Fig. 3(d). These structures exhibit distinct electronic and geometric characteristics that are closely tied to chiroptical properties. For instance, nitrogen-containing heterocycles, such as KRFP 3141, introduce asymmetry and electronic polarization, significantly influencing electric and magnetic dipole moments. Similarly, structures like KRFP 3587 and KRFP 3602, with amine or alkyl-substituted aromatic units, may contribute to electronic effects that impact |glum|. Additionally, the fused aromatic system in KRFP 3084 enhances molecular rigidity, stabilizing chiral configurations and improving |glum| values. These findings offer deeper insights into the structural attributes that govern |glum|, shedding light on how specific electronic and geometric factors contribute to chiroptical performance. This knowledge can guide the rational design of CP-TADF materials with enhanced |glum| values.
Further analysis was conducted to evaluate the generative performance of the g-VAE model by visualizing its latent space. Both generated molecules and reported CP-TADF molecules were projected into a two-dimensional chemical space using t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) based on 2D Morgan fingerprints. As shown in Fig. 4, the t-SNE projection reveals a widely dispersed distribution of generated molecules, with partial overlap with reported molecules, indicating that the g-VAE effectively explores diverse regions of the chemical space while retaining structural relevance to experimentally validated compounds. In contrast, the UMAP projection presents more compact and locally aggregated clusters, highlighting its ability to preserve neighborhood relationships and reveal subtle structural similarities. The coexistence of overlapping regions and distinct clusters in both projections demonstrates that the g-VAE latent space is well-organized and capable of generating novel chemotypes that are chemically meaningful, structurally reasonable, and consistent with high-performance CP-TADF design principles.
![]() | ||
| Fig. 4 (a) t-SNE and (b) UMAP projections of the generated molecules and reported CP-TADF molecules based on 2D Morgan fingerprints. | ||
To identify high-performing candidates, we conducted a multi-step screening process. Initially, generated molecules were selected not only for their high predicted |glum| values, but also for possessing structurally reasonable configurations. Furthermore, to ensure novelty, only molecules with a maximum Tanimoto similarity of ≤80% to any reported material were retained. Subsequently, DFT calculations were performed to evaluate the photophysical properties of the filtered molecules, focusing on their potential for efficient luminescence. Through this systematic screening, we successfully identified a top promising candidate.
Fig. 5(a) presents the |glum| distribution of all generated molecules, as predicted by RF model based on Morgan fingerprints. It is evident that the majority of |glum| values fall below 3 × 10−3, prompting us to focus on the top-performing molecules in the upper range of the distribution. Among these high |glum| candidates, we further conducted DFT calculations to evaluate their oscillator strengths (fosc), a key parameter for determining luminescence efficiency. From this analysis, D4_0633, a CP-TADF molecule based on the binaphthol structure, emerged as the most promising candidate due to its high predicted |glum| and fosc, as shown in Fig. 5(b). D4_0633 exhibits several advantageous features for CP-TADF performance. Its rigid binaphthol framework provides a stable chiral backbone, which is essential for enhancing |glum| values. The extended conjugation with multiple aromatic units promotes charge transfer, while the donor and acceptor groups facilitate intramolecular charge transfer, improving both luminescence efficiency and circularly polarized light emission. With a predicted |glum| value of 1.18 × 10−2, D4_0633 stands out as a promising candidate for CP-TADF applications.
To further optimize the structure, we replaced the methyl group attached to the nitrogen with a phenyl ring, which enhanced the structure conjugation and rigidity. This modification results in an improved derivative, D4_0633_1, which exhibits an even higher |glum| of 1.25 × 10−2. The predicted |glum| values of D4_0633 and D4_0633_1 exceed those of the reported binaphthol-based CP-TADF molecules (compounds 1-10 in Fig. 5(b)) by an order of magnitude, demonstrating their exceptional chiroptical properties. This significant improvement highlights the effectiveness of integrating ML predictions with property-directed molecular generation in discovering high-performance CP-TADF materials.
In Fig. 5(c), we conducted detailed quantum calculation analysis of D4_0633 and D4_0633_1. The highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels are similar for D4_0633 and D4_0633_1. The distributions of the natural transition orbitals (NTOs) for the first singlet excited state (S1) and the first triplet excited state (T1), along with key photophysical parameters such as oscillator strength (fosc), S1, T1, T2 excited energy, and the singlet–triplet energy gap (ΔEST) are also presented in Fig. 5(c). For D4_0633, the ΔEST is 0.24 eV. After modification, D4_0633_1 exhibits slightly lower excited energy of S1 and T1, with a reduced ΔEST of 0.21 eV. This reduction in ΔEST is beneficial for reverse intersystem crossing (RISC), a critical process for CP-TADF performance. The NTO analysis reveals that the electron and hole distributions in the S1 and T1 states remain largely consistent for both molecules. For D4_0633, the calculated SOC values indicate that the SOC between S1 and T2 (〈S1|ĤSOC|T2〉 = 0.65
cm−1) is significantly larger than that between S1 and T1 (〈S1|ĤSOC|T1〉 = 0.26
cm−1). This highlights the largely localized T2 state act as the dominant channel for RISC, as the stronger coupling facilitates faster transitions between S1 and T2. After modification, D4_0633_1 exhibits reduced SOC values, with 〈S1|ĤSOC|T2〉 = 0.33
cm−1 and 〈S1|ĤSOC|T1〉 = 0.11
cm−1. The SOC reduction for both T1 and T2 arises from decreased spatial overlap and an extended CT distance. This effect originates from the substitution of the peripheral methyl group with a phenyl ring in D4_0633_1, which increases the π-conjugation length and introduces additional steric hindrance, thereby twisting donor–acceptor dihedral angles. These changes reduce the spatial overlap between Frontier orbitals and extend the CT distance. According to perturbation theory and the El-Sayed rule, such reduced overlap weakens wavefunction interactions and lowers SOC magnitudes. Despite these decreases, the modified compound retains an energetically accessible T2–S1 channel and favorable orbital configuration, enabling efficient RISC.
Importantly, the calculated fosc values of S1–S0 transition for both D4_0633 and D4_0633_1 both exceed 0.17. For luminescent materials, a higher fosc value is generally desirable as it correlates with stronger radiative transitions and higher PLQY. For donor–acceptor type TADF materials, an fosc value of 0.17 is reasonable for strong radiative transition. This suggests that these materials have the potential for efficient light emission, making them promising candidates for TADF-based applications.
To facilitate experimental realization, we calculate the synthetic accessibility (SA) scores for D4_0633 and D4_0633_1 and compared them with those of all reported CP-TADF molecules in the dataset. As shown in Fig. 6(a), the SA scores of D4_0633 and D4_0633_1 fall within the interquartile range of the reported compounds. In the overall molecular space, SA scores range from 2 to 7, with most reported CP-TADF materials concentrated between 3 and 4.5. The SA scores of D4_0633 and D4_0633_1 are both around 4, which lies within this dominant range and indicates that these molecules are highly likely to be synthetically accessible.
![]() | ||
| Fig. 6 (a) Comparison of SA scores between reported CP-TADF molecules and D4_0633 and D4_0633_1. (b) Practical synthetic routes for D4_0633 and D4_0633_1. | ||
Practical synthetic routes were also designed for both molecules, as shown in Fig. 6(b), which rely on well-established reactions such as Buchwald–Hartwig couplings and controlled substitution steps. These routes are designed to ensure the accessibility of the starting materials, scalability of the synthesis, and overall practicality. This combination of favorable SA scores and robust synthetic strategies underscores the experimental realization of these high-performing CP-TADF molecules both practical and efficient.
The code for this study is uploaded to https://github.com/SpartAce7012/CP-TADF. They are made available to qualified researchers on reasonable request from the corresponding author.
| This journal is © The Royal Society of Chemistry 2025 |