Generative AI-powered inverse design for tailored narrowband molecular emitters

Mianzhi Pan; Tianhao Tan; Yawen Ouyang; Qian Jin; Yougang Chu; Wei-Ying Ma; Jianbing Zhang; Lian Duan; Dong Wang; Hao Zhou

doi:10.1039/D5DD00268K

View PDF Version

Open Access Article

This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

DOI: 10.1039/D5DD00268K (Paper) Digital Discovery, 2025, Advance Article

Generative AI-powered inverse design for tailored narrowband molecular emitters

Mianzhi Pan† ^ade, Tianhao Tan†^bc, Yawen Ouyang†^a, Qian Jin^bc, Yougang Chu^d, Wei-Ying Ma^a, Jianbing Zhang*^de, Lian Duan*^bc, Dong Wang*^bc and Hao Zhou*^a
^aInstitute for AI Industry Research, Tsinghua University, Beijing, 100084, P. R. China. E-mail: zhouhao@air.tsinghua.edu.cn
^bLaboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, P. R. China. E-mail: dong913@mail.tsinghua.edu.cn
^cMOE Key Laboratory of Organic OptoElectronics and Molecular Engineering, Department of Chemistry, Tsinghua University, Beijing, 100084, P. R. China
^dNational Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210023, P. R. China
^eSchool of Artificial Intelligence, Nanjing University, Nanjing, Jiangsu 210023, P. R. China

Received 16th June 2025 , Accepted 3rd September 2025

First published on 4th September 2025

Abstract

As organic display technology progresses, the urgent and daunting challenge lies in the development of next-generation molecular emitters capable of delivering an extensive color gamut with unparalleled color purity. The existing process for uncovering new emitters is largely reliant on a time-consuming and costly trial-and-error method. However, with the integration of AI, the pace of materials discovery is accelerated dramatically. Here, a molecular generation framework, MEMOS, which harnesses the efficiency of Markov molecular sampling techniques alongside multi-objective optimization for the inverse design of molecules, is presented. MEMOS facilitates the precise engineering of molecules capable of emitting narrow spectral bands at desired colors. Utilizing a self-improving iterative process, it can efficiently traverse millions of molecular structures within hours, pinpointing thousands of target emitters with an impressive success rate up to 80%, as validated by density functional theory calculations. Through the use of MEMOS, well-documented multiple resonance cores from the experimental literature have been successfully retrieved, and a broader color gamut has been achieved with the newly identified tricolor narrowband emitters. These findings underscore the immense potential of MEMOS as an efficient tool for expediting the exploration of the uncharted chemical territory of molecular emitters and their experimental discovery.

1. Introduction

Organic light-emitting diodes (OLEDs) have emerged as highly promising light-emitting devices for applications in lighting and displays, owing to their rapid response times, wide viewing angles, and inherent flexibility. Color purity is a critical attribute for emitters, as it directly influences color deviation and the realism of the displayed image. Typically, the color purity of an emitter is assessed by the full width at half-maximum (FWHM) of its emission spectrum. Minimizing the FWHM is essential for achieving color coordinates that lie closer to the outer boundary of the color space, thereby ensuring a more accurate and vibrant color representation.

Recently, a novel class of molecules with narrowband emission, known as multiple resonance thermally activated delayed fluorescent (MR-TADF) materials, has been introduced.¹ These materials leverage resonance effects to confine frontier orbitals to specific atoms by incorporating elements with complementary electronegativity, such as boron and nitrogen, at the ortho- or para-positions of rigid conjugated rings. This strategy not only effectively separates the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO), thereby endowing the molecule with excellent thermally activated delayed fluorescence properties, but also curtails the deformation of the excited state and reduces electron-vibrational coupling, leading to a narrowed emission profile.^2,3 To date, multiple resonance strategies have yielded narrowband emitters with the narrowest reported emission peaks demonstrating a FWHM of merely 13 nm in the blue, 14 nm in the green, and 21 nm in the red spectral regions.^4–10 However, the scarcity of molecules that exhibit narrowband emissions in the long-wavelength regions presents a significant challenge. The majority of molecules, especially those emitting in the green and red spectra, still display emission peaks with an FWHM exceeding 20 nm. This rarity of high-performance red and green emitters has emerged as a critical bottleneck in the advancement of OLED materials discovery.

Traditionally, designing ideal luminescent molecules involves a laborious cycle of extensive synthesis and validation, which often incurs significant costs due to the trial-and-error nature of the approach. High-throughput virtual screening (HTVS), complemented by theoretical calculations, is considered a more cost-effective approach for material discovery.¹¹ Nevertheless, accurately predicting the emission peaks and FWHM for a vast array of molecular spectra remains challenging, primarily due to the computationally intensive task of calculating Hessian matrices for excited states. Machine learning (ML) methods, which leverage data to establish structure–property relationships and enable rapid property prediction without the need for extensive theoretical calculations, have substantially expedited the HTVS process.^12–17 These methods have recently been applied to screen for organic emitters that exhibit high photoluminescence quantum yield (PLQY), a wide color gamut, or high color purity.^18–22

Despite the encouraging progress made in ML-accelerated HTVS, its efficiency is still limited by several key factors. Firstly, it is dependent on a finite collection of pre-designed candidate molecules, which encompass only a tiny fraction of the vast molecular space. Secondly, the processes of molecular generation and screening are disconnected, resulting in a low hit rate and necessitating a considerable time commitment. To achieve a thorough and efficient exploration of the chemical space, there is an urgent need for a more integrated approach that can freely navigate the extensive chemical landscape and optimize molecules on the fly, based on feedback from the molecules that are generated. Recent progress in generative modeling has opened new avenues for inverse design of materials, enabling the targeted generation of candidate molecules x by learning the conditional distribution p(x|y) over desired properties y.²³

In this research, we introduce Markov Emission MOlecular Sampling (MEMOS), a framework tailored for the efficient multi-objective inverse design of luminescent molecules. Inspired by the efficacy of the Markov Chain Monte Carlo (MCMC) sampling technique in generating text²⁴ and drug-like molecules,²⁵ MEMOS leverages the power of efficient MCMC sampling for the organic light-emitting field. It does so by integrating customized molecular editing operations and scoring functions. Within the MEMOS framework, candidate molecules are generated at each time step by applying a single editing operation to the molecules from the preceding step. This method facilitates the exploration of an almost limitless chemical space, free from the constraints of finite candidate pools. The acceptance of the newly generated molecules is determined based on their property scores, including emission wavelength and FWHM. This procedure systematically directs the properties of the sampled molecules towards the desired target values.

By employing MEMOS, we have successfully designed a diverse array of molecules with finely tunable colors and FWHM values that reach as low as 13 nm in the blue, 15.5 nm in the green, and 20 nm in the red spectral regions. The CIE color coordinates of these molecules closely adhere to the Rec. 2020 standard,²⁶ offering significant potential for expanding the color gamut and facilitating the development of ultra-high-definition displays. Our methodology has dramatically boosted the efficiency of molecular design and underscores the potential of artificial intelligence (AI) in the discovery of cutting-edge emission molecules.

2. Results and discussion

2.1 Overview of MEMOS

Navigating the vast molecular space to uncover molecules with desired optical properties presents a substantial challenge. To surmount this hurdle, we have crafted a framework, MEMOS, which is based on MCMC sampling, to design emission molecules with the specific color and high color purity. Fig. 1a depicts the workflow of MEMOS. Initially, the MCMC sampling process continuously generates candidate molecules from a seed molecule by modifying those from preceding steps using five molecular graph editing operations (Fig. 1b): fragment adding and deleting, ring fusing and defusing, and atom substitution. Proposals for these operations are represented by message-passing neural networks (MPNNs),²⁷ which are adaptively trained throughout the sampling process. Subsequently, proxy models predict the emission wavelength and FWHM of the sampled molecules. The acceptance of the newly generated molecule is determined based on its score on the objective function (Fig. 1c), which encompasses structural and representational rationality, wavelength, and color purity.


	Fig. 1 (a) Schematic overview of MEMOS: a self-improving OLED molecular design system utilizing MCMC sampling. (b) Schematic diagram of all molecular editing operations. (c) Schematic diagram of the scorer used in MCMC sampling.

As the cornerstone of our framework, the MCMC sampling method, enhanced with a simulated annealing scheme,²⁸ is employed to navigate the chemical space in search of molecules with the desired properties. Specifically, it constructs a Markov chain within the chemical space, where each state corresponds to a molecule, and the equilibrium distribution of the Markov chain is aligned with the target distribution.²⁹ Therefore, as the number of sampling steps increases, the initial molecule can progressively enhance its objective function score, ultimately achieving the desired properties. Density functional theory (DFT) is then applied to ascertain the precise optical properties of the top-k generated molecules. Finally, these data are used to iteratively refine the proxy model, creating a self-improving loop for the design of target molecules. Unlike conventional screening based on static predefined libraries, our method explores chemical space in a dynamic, model-guided manner. Only a few hundred top-ranked molecules are subjected to further DFT validation, and convergence is achieved within just a few iterations, greatly accelerating the design cycle.

To transform between molecules, the original Markov molecular sampling approach²⁵ considers fragment adding and deleting. However, relying solely on these operations is insufficient for effectively exploring the molecular space, since organic light-emitting molecules typically feature multiple fused ring structures. Moreover, simply adding single bonds through the adding operation may lead to significant long-range charge transfer, which is not conducive to the sampling of narrow emission molecules. To address this limitation, MEMOS expands the operation set to include ring fusing and defusing, as well as heteroatom substitution. With these additional operations, MEMOS can explore more than 100 novel molecules per step. Furthermore, up to 300 Markov chains, each undergoing 1500 steps, can be established and evolved on a single NVIDIA RTX 3090 Graphics Processing Unit (GPU) within a day. Consequently, MEMOS can traverse through ∼45 [thin space (1/6-em)] 000000 molecules in a single day, thereby facilitating a more efficient exploration of a broader chemical space than traditional HTVS approaches. This capability enables the generation of a more diverse set of target molecules and achieves the efficient inverse design of narrowband emitters in desired colors.

2.2 Proxy model performance in optical property prediction

The accurate prediction of molecular properties is essential in machine learning applications. Uni-Mol,³⁰ as a pre-trained model, has demonstrated remarkable proficiency in forecasting the optical properties of organic molecules.¹⁸ Based on the Transformer architecture,³¹ Uni-Mol takes 3D molecular conformations as input and leverages its pre-training on a vast array of molecular conformations to generate high-quality molecular representations. Upon fine-tuning, Uni-Mol exhibits exceptional performance across a range of downstream tasks. Consequently, we utilize Uni-Mol as the proxy model to predict the emission peak wavelengths and spectral FWHM of molecules. We have developed separate models for each property, each comprising a Uni-Mol backbone complemented by a multilayer perceptron (MLP). The MLP transforms the vector output from Uni-Mol into a scalar value, which represents the final prediction for each property. Fig. 2a and b illustrate the performance of our models on the initially collected dataset from experiments (see Methods 4.1). These results are based on the models' predictions on a predefined test set, which is distinct from the training set. Overall, both models have been successfully trained using our meticulously curated dataset. The correlation coefficient (R) and mean absolute error (MAE) are 0.79 and 32.16 nm for the emission peak wavelength, and 0.86 and 0.05 eV for the FWHM, respectively. These metrics indicate that the models are capable of accurately determining these two properties, which in turn enables their use in informing the subsequent molecular sampling process.


	Fig. 2 Mean absolute error (MAE) and distribution of labels on test set against the prediction of initial proxy models for (a) emission peak wavelengths and (b) the FWHM of the emission spectrum. Prediction of adapted proxy models in the last iteration for (c) emission peak wavelengths and (d) the FWHM of the emission spectrum. Data from the initial experiment dataset are marked in red, while newly added DFT-calculated data are highlighted in blue.

2.3 Iteratively adapting to the MEMOS-explored chemical space

One challenge in material design arises when proxy models, used in design algorithms, encounter novel chemical spaces that differ from the training data,³² leading to a degradation in the quality of the generated molecules. This issue is known as the out-of-distribution (OOD) problem.³³ To address this, a representation-based constraint is incorporated into its sampling objective to exclude molecules that lie too far from the training distribution (see methods 4.3). Moreover, we iteratively update the proxy models with newly generated molecules, enabling the models to adapt to the novel chemical space explored by MEMOS. This strategy aligns with the methods proposed by Fannjiang and Listgarten,³⁴ which involve iterative retraining based on previously high-scoring molecules.

As illustrated in Fig. 1a, the top-k molecules are selected as the final sampled targets and labeled based on the results from DFT calculations. These labeled molecules are then integrated into the dataset to enhance the training of the proxy models. The updated models are employed in the subsequent sampling phase, thus creating a self-improving loop. With each iteration, the proxy models become capable of predicting the properties of a broader range of molecules. Fig. 2c and d demonstrate that the refined models achieve superior accuracy in predictions on test datasets that include the initial test data and the sampled molecules. Consequently, the hit rate (the proportion of selected top-k molecules whose DFT-validated wavelength/FWHM values satisfy the predefined thresholds) of MEMOS has notably increased during the self-improving loop.

Fig. 3a and b show the shift in property distributions of the sampled molecules over successive iterations. Here, our target is green light-emitting molecules. The iterative refinement of the model has resulted in an increasing number of sampled molecules that emit within the 525 ± 25 nm range and have a FWHM below 0.125 eV. By the fifth iteration, approximately 70% of the sampled molecules emit colors that align with the target expectation (525 ± 25 nm), and nearly 80% meet the target standard for FWHM values (<0.125 eV). The hit rate for sampling red light-emitting molecules is somewhat lower, and the wavelength distributions are broader when compared to those emitting blue and green colors. This discrepancy is likely due to the smaller number of reported red light-emitting molecules in the initial dataset and the sensitivity to radiation energy in the long-wavelength region. Adapting MEMOS with chemically informed priors, such as enriching fragment libraries with electron-donating and electron-withdrawing substituents or introducing known red-emitting scaffolds like RBNN core into the initial seed pool, may improve sampling efficiency in these spectral ranges.


	Fig. 3 (a) Distribution of emission peak wavelengths for sampled molecules across different iterative batches. (b) Distribution of emission FWHM for sampled molecules across different iterative batches. (c) Distribution of sampled molecules from different iterative batches within chemical space. (d) Boxplot of synthetic accessibility scores for sampled molecules across different iterative batches. The upper and lower boundaries of the boxes represent the first and third quartiles, respectively; the whiskers indicate 1.5 times the standard deviation, and the median is denoted by a center line. (e) Distribution of emission peak wavelengths for sampled molecules intended for RGB light emission in the fifth iteration. (f) Distribution of emission FWHM for sampled molecules targeting RGB light emission in the fifth iteration.

The rising hit rate indicates that the proxy model is progressively adapting to the novel chemical space explored by MEMOS. We employ t-SNE³⁵ to visualize the distribution of sampled molecules in chemical space across iterations. As depicted in Fig. 3c, initially, the sampled molecules cluster at the outskirts of the distribution of molecules in the training data. Due to the scarcity of data in this area, the reliability of the predicted properties is relatively low, leading to a low hit rate for MEMOS. As the iterations progress, MEMOS gradually extends its exploration to a wider chemical space. The incorporation of this new data into the dataset enhances the representation of the chemical space, thereby improving the model's generalization capability. Consequently, the predictions become more reliable, and the sampling hit rate steadily increases. After five iterations, almost all newly sampled molecules are situated within the chemical space covered by the dataset. In addition to the increasing hit rate, the MAE on a distinct, non-overlapping dataset is employed to quantitatively assess predictive uncertainty (Fig. S2). The gradual decline in MAE over iterations indicates that the model becomes more capable of handling unseen molecules as more training data are incorporated.

Beyond the success in achieving desired optical properties, it is equally essential to quantitatively assess the structural validity and synthetic accessibility of the sampled molecules. In addition to optical properties, synthesis accessibility is a critical metric for evaluating the practical applicability of these molecules. The synthesis accessibility of green light-emitting molecules sampled from different iterations was evaluated using fragment spatial and stereoselective-based synthesis accessibility scores (SAscore),³⁶ which are integrated into the RDKit package.³⁷ As shown in Fig. 3d, there is a trend of decreasing synthesis accessibility scores over each iteration. This trend suggests that the structures of the sampled molecules are evolving towards greater rationality and simplicity, with a reduction in the presence of unconventional structural elements. After five iterations, around 50% of the molecules have synthesis accessibility scores below 3.5, suggesting that these target molecules could be more readily synthesized. For reference, experimentally reported MR emitters have an average SA score of 3.88 ± 0.88, with values ranging from 2.01 to 6.21. Further evaluation of synthetic accessibility, based on votes from experimental collaborators, along with selected molecular examples, is provided in Fig. S11. Most candidates were assessed as having only moderate synthetic feasibility, often requiring complex routes with low expected yields. To address this, more rigorous synthesis-aware constraints and modifications to molecular operations will be incorporated based on expert feedback in future work.

2.4 Target pool discovery and interpretable design by MEMOS

Using the iteratively adapted proxy models, we have successfully sampled narrowband molecules across the blue, green, and red-light spectra (Fig. 3e and f). During the sampling process, several well-known emitters such as CzBN, CzBO, and γ-Cb-B^5,38,39 were recovered from basic fragments that could not be further divided by deleting or defusing using our designed operations. Fig. 4 illustrates a trajectory that evolves from an indivisible core to the BN-ICZ core, which differs from the experimentally reported BN-ICZ molecule⁴⁰ by the absence of a tert-butyl group. During the sampling process, carbazole units and tert-butyl substituents have been introduced to further modulate the molecular structure and enable the fine-tuning of photophysical properties to meet our specific requirements. Our proxy model predictions show excellent agreement with experimental values, both in terms of wavelength and FWHM.


	Fig. 4 (a) Sampling trajectory of BN-ICZ, along with the corresponding emission wavelength and FWHM predicted by the proxy models for each molecule. (b) DFT calculated and experimental spectra of BN-ICZ, with the experimental spectra reproduced from the literature.⁴⁰

This demonstrates that the vocabulary and actions within MEMOS are adequate for constructing potential high-performance molecules. Structures with a Tanimoto similarity as high as 0.97 to more complex emitters, such as BTC-BNCz⁴¹ with an emission peak at 488 nm and a FWHM of 23 nm, or BNNO⁴² with an emission peak at 637 nm and a FWHM of 32 nm, can also be identified. To fully recover such structures, it is essential to incorporate additional constraints, such as symmetry and reactivity criteria.

Employing MEMOS, we can sample thousands of target molecules daily that meet our criteria for color and FWHM, thereby efficiently expanding the array of known MR emitters. The molecules sampled through this process are categorized into several distinct groups, including boron–nitrogen-based, boron–oxygen-based, carbonyl–nitrogen-based, and indolo[3,2,1-jk]carbazole (ICZ)-based types. Each group exhibits a diverse array of unique structures. These structures have considerably expanded the target pool for organic light-emitting materials. Representative structures that emit red, green, and blue colors as identified in the sampling are illustrated in Fig. 5. The quality, uniqueness, and novelty of the structures generated by MEMOS surpass those of other generative models, which often face challenges in producing conjugated structures. Detailed benchmarking results are provided in Table S1. As current editing operations, fragment libraries, and structural constraints are informed by known molecules to ensure chemical validity, most MEMOS-generated structures retain recognizable motifs. Incorporating richer chemical heuristics and expanding the design space could further unlock the potential of AI, enhancing its capacity to discover novel, non-intuitive yet promising candidates.


	Fig. 5 Typical structures of the sampled target molecules for (a) blue, (b) green, and (c) red emissions, with the corresponding emission wavelengths, FWHM, and oscillator strengths listed. The narrowest FWHM in each emission color are indicated by red underlines. Molecules exemplifying the application of double boron embedded design strategy is highlighted.

To accurately tailor the optical properties of the molecules to meet target specifications, MEMOS automatically incorporates molecular design strategies learned from the dataset during the sampling process, without the need for pre-set manual adjustments. For instance, it implements the double boron embedding strategies,^8,43 which strategically introduce additional boron (B) or nitrogen (N) atoms at the meta/para positions relative to the existing boron atom. This strategic incorporation has effectively modulated the strengths of intramolecular charge transfer and enhanced the MR effect on the central ring, thereby finely adjusting the spectral color and narrowing the emission band. Particularly, compared to molecules without additional B/N incorporation, the spectra of the highlighted molecules in Fig. 5 with double boron embeddings have shifted from red to the desired green emission, while the FWHM has been reduced by nearly 10 nm. The introduction of an additional boron atom enhances the MR effect within the conjugated motif, leading to a more localized electron distribution. This, in turn, reduces electron-vibrational coupling and narrows the spectra (Fig. S8). This demonstrates that MEMOS has successfully learned the underlying patterns from the training data, strategically incorporating heteroatoms to meticulously adjust charge transfer properties, ultimately fine-tuning the optical characteristics to align with our specific requirements.

From the target pool, we have identified those capable of producing a broader color gamut. This gamut not only exceeds the coverage of the conventional sRGB color space but also closely aligns with the Rec. 2020 standard.²⁶ The CIE coordinates achieved are (0.137, 0.051) for blue, (0.167, 0.723) for green, and (0.715, 0.285) for red (Fig. 6a and b).


	Fig. 6 (a) Color space constructed using the three target molecules delineated by solid lines, compared to the standard sRGB color space outlined by dashed lines, with the CIE coordinates of the molecules provided. (b) Theoretically calculated emission spectra for the three target molecules, including the emission wavelength and FWHM values.

3. Conclusions

In summary, by combining efficient MCMC sampling with sophisticated proxy models for optical property prediction, MEMOS effectively tackles the inverse design challenge of molecules with narrowband emissions spanning the entire spectrum of wavelengths. This integration significantly speeds up the exploration of chemical space. Moreover, the performance of MEMOS is continually enhanced through a self-improving iterative process, resulting in high success rates of 70% for emission peak wavelengths and 80% for FWHM. Well-documented multiple resonance cores from the experimental literature have been successfully identified, and a broader color gamut has been achieved through the utilization of the newly designed tricolor narrowband emitters.

Building upon the foundation of MEMOS, we are now poised to concurrently optimize other pivotal optical properties, including photoluminescence efficiencies, fluorescence lifetimes, and singlet–triplet energy gaps. Relevant property datasets can be sourced from experiments or scalable DFT calculations. Once property-specific proxy models are trained, their predictions can be integrated into MEMOS’ s objective function, enabling efficient multi-objective optimization. We believe that MEMOS will play a pivotal role in propelling the advancement of next-generation organic luminescent materials, which are essential for applications in display technologies and various other fields.

4. Methods

4.1 Dataset construction and proxy models training

The initial dataset was constructed by collecting molecules featuring multiple resonance structures and narrow spectral profiles, as reported in the literature. To minimize the impact of extraneous factors on the spectral data, the emission peak wavelength (measured in nm) and FWHM (measured in eV) of the emission spectra in dichloromethane or toluene solutions were used as labels. Furthermore, we expanded the dataset by including molecules from the dataset reported by Park et al.⁴⁴ that exhibited an FWHM below 100 nm in either dichloromethane or toluene solutions. The final dataset consisted of 811 molecules, with 320 of them exhibiting multiple resonance structure characteristics. The Uni-Mol models were fine-tuned using this curated dataset. We trained two separate models to predict each property individually. It is important to note that both properties were normalized to conform to a standard normal distribution. The entire dataset was randomly partitioned into training and validation sets in a 4 [thin space (1/6-em)]

1 ratio. Before input into Uni-Mol, all molecules were converted into a 3D conformation using the Experimental-Torsion Basic Knowledge Distance Geometry (ETKDG)⁴⁵ algorithm and optimized with the MMFF94 force field⁴⁶ using the RDKit program. During training, we utilized the Adam optimizer and employed mean squared error (MSE) as the loss function. Each model was trained for 100 epochs, and the training process was terminated if the validation MSE did not improve for 10 consecutive epochs. The batch size was set to 32, and the learning rate was fixed at 1 × 10⁻⁴. The remaining training hyperparameters were consistent with the pre-training procedure of Uni-Mol, as detailed in the original paper.³⁰

4.2 Spectra calculation

During the active learning phase, the DFT-calculated emission spectra of the sampled molecules were integrated into the dataset. The emission spectrum was obtained by computing the thermal vibration correlation function (TVCF) using the MOlecular MAterials Property Prediction Package (MOMAP),⁴⁷ without considering the Herzberg–Teller effect and Duschinsky rotation.

Structural optimizations for both the ground and excited states, along with electronic energy and frequency analyses, were conducted at the B3LYP-D3(BJ)/6-31G* level of theory using the Gaussian 16 package.⁴⁸ Solvent effects were taken into consideration through the Polarizable Continuum Model using the Integral Equation Formalism (IEFPCM), with the solvent's volume and dielectric constant set according to the properties of toluene. To further mitigate the impact of discrepancies between experimental and computational spectra on model training, the calculated emission wavelengths and FWHM were calibrated based on a linear regression relationship prior to model training. The calibration was performed on a set of 37 experimentally reported molecules. The resulting R² value for wavelength fitting reached 0.96, while the R² for FWHM was 0.55. Further details can be found in Fig. S1.

4.3 Molecular sampling

4.3.1 Objective function. To generate valid molecules with desired properties, we developed a multi-objective scoring function defined as:

π(x) = S_rep(x) + S_struct(x) + S_wave(x) + S_FWHM(x)

Here, π(x) represents the objective function of molecule x and also serves as an unnormalized probability distribution over the chemical space from which we aim to sample. This function integrates a molecular representation constraint (S_rep(x)) to ensure that sampled molecules remain close to the training data, along with criteria for molecular structure validity (S_struct(x)), emission peak wavelength (S_wave(x)), and the FWHM of the emission spectrum (S_FWHM(x)). Each objective is formulated as a separate function that outputs a scalar, with higher values indicating better fulfillment of the objective. The overall sampling objective is computed as the sum of these individual functions. The detailed forms of each objective are provided in the SI.

4.3.2 Molecular editing operations. To transform one molecule into another, we considered five molecular editing operations based on the molecular graph, as illustrated in Fig. 1b. Specifically, the adding and fusing operations involve expanding the molecule by attaching a structure from predefined vocabularies. Conversely, the deleting and defusing operations serve as the inverse of adding and fusing, respectively. Finally, the substitution operation converts a valency-permissive aromatic carbon (or nitrogen) to nitrogen (or carbon).

4.3.3 Parameterizing proposal distributions with MPNNs. All proposal distributions for the editing operations were parameterized using Message Passing Neural Networks (MPNNs), a type of Graph Neural Networks (GNNs) that iteratively updates node features by exchanging messages with their neighboring nodes. Specifically, we employed graph convolutions as the message-passing function. The detailed architecture of the MPNNs is displayed in Fig. S6. The MPNNs are trained in a self-adaptive manner to increase the likelihood of generating high-quality proposals, thereby enhancing the efficiency of searching the chemical space. At each time step, proposals that improved the objective score from previous steps were collected, forming a dataset D. Subsequently, we trained the MPNNs on D using maximum likelihood estimation, aiming to maximize the probability of proposal that enhances the target score. This was implemented through the cross-entropy loss. The MPNNs were trained for one epoch at each sampling time step, utilizing the Adam optimizer with a learning rate of 3 × 10⁻⁴, and the batch size was set to 128.

4.3.4 Molecular sampling process. During a sampling process, 300 trajectories were established, originating from randomly selected seed molecules. Each trajectory underwent 1500 editing steps. At each time step, we randomly selected an operation. Once the operation was determined, a molecule x was edited according to the corresponding proposal distributions parameterized by MPNN, yielding a new molecule x′. We decided whether to accept x′ with a probability of min{1, π^α(x′)/π^α(x)}, where α = 1/0.95^[t/5]. Here, t is the index of the sampling time step. Finally, molecules from the last 200 steps were collected, and the top-100 unique molecules with the highest scores were selected for DFT calculation.

Author contributions

Mianzhi Pan: methodology; software; investigation; visualization; writing – original draft; writing – review and editing. Tianhao Tan: methodology; investigation; formal analysis; visualization; writing – original draft; writing – review and editing. Yawen Ouyang: methodology; investigation; software; writing – review and editing. Qian Jin: investigation; data curation; formal analysis. Yougang Chu: software. Wei-Ying Ma: supervision; resources. Jianbing Zhang: supervision; resources. Lian Duan: supervision; resources. Dong Wang: conceptualization; supervision; resources; writing – review and editing. Hao Zhou: conceptualization; supervision; resources; methodology.

Conflicts of interest

The authors declare no competing financial interest.

Data availability

Our collected dataset, the OLED molecules generated in each round, and the source code of this study are available at https://doi.org/10.5281/zenodo.17034464.

Supplementary information, including the calibration of theoretically calculated spectra, molecular vocabulary, benchmarking against other generative models, and additional details, is available. See DOI: https://doi.org/10.1039/d5dd00268k.

Acknowledgements

This research is supported by the National Science and Technology Major Project (Grant No. 2022ZD0117502) and the National Natural Science Foundation of China (Grant No. 22073055 and Grant No. 62406170). This work is also sponsored by the Beijing Nova Program (20240484682). The authors extend their gratitude to Prof. Qiang Shi from the Institute of Chemistry, Chinese Academy of Sciences, and the Center of High Performance Computing at Tsinghua University for providing computational resources.

References

T. Hatakeyama, K. Shiren, K. Nakajima, S. Nomura, S. Nakatsuka, K. Kinoshita, J. Ni, Y. Ono and T. Ikuta, Ultrapure Blue Thermally Activated Delayed Fluorescence Molecules: Efficient HOMO–LUMO Separation by the Multiple Resonance Effect, Adv. Mater., 2016, 28, 2777–2781 CrossRef CAS PubMed.
J.-M. Teng, Y.-F. Wang and C.-F. Chen, Recent progress of narrowband TADF emitters and their applications in OLEDs, J. Mater. Chem. C, 2020, 8, 11340–11353 RSC.
H. J. Kim and T. Yasuda, Narrowband Emissive Thermally Activated Delayed Fluorescence Materials, Adv. Opt. Mater., 2022, 10, 2201714 CrossRef CAS.
Y. Kondo, K. Yoshiura, S. Kitera, H. Nishi, S. Oda, H. Gotoh, Y. Sasada, M. Yanai and T. Hatakeyama, Narrowband deep-blue organic light-emitting diode featuring an organoboron-based emitter, Nat. Photonics, 2019, 13, 678–682 CrossRef CAS.
Y. Xu, Z. Cheng, Z. Li, B. Liang, J. Wang, J. Wei, Z. Zhang and Y. Wang, Molecular-Structure and Device-Configuration Optimizations toward Highly Efficient Green Electroluminescence with Narrowband Emission and High Color Purity, Adv. Opt. Mater., 2020, 8, 1902142 CrossRef CAS.
Y. Zhang, D. Zhang, J. Wei, Z. Liu, Y. Lu and L. Duan, Multi-Resonance Induced Thermally Activated Delayed Fluorophores for Narrowband Green OLEDs, Angew. Chem., Int. Ed., 2019, 58, 16912–16917 CrossRef CAS PubMed.
Y. C. Cheng, X. C. Fan, F. Huang, X. Xiong, J. Yu, K. Wang, C. S. Lee and X. H. Zhang, A Highly Twisted Carbazole-Fused DABNA Derivative as an Orange-Red TADF Emitter for OLEDs with Nearly 40% EQE, Angew. Chem., Int. Ed., 2022, 61, e202212575 CrossRef CAS PubMed.
M. Yang, I. S. Park and T. Yasuda, Full-Color, Narrowband, and High Electroluminescence from Boron and Carbazole Embedded Polycyclic Heteroaromatics, J. Am. Chem. Soc., 2020, 142, 19468–19472 CrossRef CAS PubMed.
X. C. Fan, F. Huang, H. Wu, H. Wang, Y. C. Cheng, J. Yu, K. Wang and X. H. Zhang, A Quadruple-Borylated Multiple-Resonance Emitter with para/meta Heteroatomic Patterns for Narrowband Orange-Red Emission, Angew. Chem., Int. Ed., 2023, 62, e202305580 CrossRef CAS PubMed.
X. Zeng, L. Wang, H. Dai, T. Huang, M. Du, D. Wang, D. Zhang and L. Duan, Orbital Symmetry Engineering in Fused Polycyclic Heteroaromatics toward Extremely Narrowband Green Emissions with an FWHM of 13 nm, Adv. Mater., 2023, 35, e2211316 CrossRef PubMed.
S. Curtarolo, G. L. Hart, M. B. Nardelli, N. Mingo, S. Sanvito and O. Levy, The high-throughput highway to computational materials design, Nat. Mater., 2013, 12, 191–201 CrossRef CAS PubMed.
K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. Von Lilienfeld, K.-R. Muller and A. Tkatchenko, Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space, J. Phys. Chem. Lett., 2015, 6, 2326–2331 CrossRef CAS PubMed.
M. Nakata and T. Shimazaki, PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry, J. Chem. Inf. Model., 2017, 57, 1300–1308 CrossRef CAS PubMed.
W. Ye, C. Chen, Z. Wang, I.-H. Chu and S. P. Ong, Deep neural networks for accurate predictions of crystal stability, Nat. Commun., 2018, 9, 3800 CrossRef PubMed.
C.-W. Ju, H. Bai, B. Li and R. Liu, Machine Learning Enables Highly Accurate Predictions of Photophysical Properties of Organic Fluorescent Materials: Emission Wavelengths and Quantum Yields, J. Chem. Inf. Model., 2021, 61, 1053–1065 CrossRef CAS PubMed.
S. Xu, J. Li, P. Cai, X. Liu, B. Liu and X. Wang, Self-improving photosensitizer discovery system via Bayesian search with first-principle simulations, J. Am. Chem. Soc., 2021, 143, 19769–19777 CrossRef CAS PubMed.
J. F. Joung, M. Han, J. Hwang, M. Jeong, D. H. Choi and S. Park, Deep learning optical spectroscopy based on experimental database: potential applications to molecular design, JACS Au, 2021, 1, 427–438 CrossRef CAS PubMed.
Z. Cheng, J. Liu, T. Jiang, M. Chen, F. Dai, Z. Gao, G. Ke, Z. Zhao and Q. Ou, Automatic Screen-out of Ir(III) Complex Emitters by Combined Machine Learning and Computational Analysis, Adv. Opt. Mater., 2023, 11, 2370069 CrossRef.
R. Gomez-Bombarelli, J. Aguilera-Iparraguirre and T. Hirzel, et al., Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach, Nat. Mater., 2016, 15, 1120–1127 CrossRef CAS PubMed.
F. Strieth-Kalthoff, H. Hao and V. Rathore, et al., Delocalized, asynchronous, closed-loop discovery of organic laser emitters, Science, 2024, 384, eadk9227 CrossRef CAS PubMed.
A. Nigam, R. Pollice, P. Friederich and A. Aspuru-Guzik, Artificial design of organic emitters via a genetic algorithm enhanced by a deep neural network, Chem. Sci., 2024, 15, 2618–2639 RSC.
W. Cai, C. Zhong, Z.-W. Ma, Z.-Y. Cai, Y. Qiu, Z. Sajid and D.-Y. Wu, Machine-learning-assisted performance improvements for multi-resonance thermally activated delayed fluorescence molecules, Phys. Chem. Chem. Phys., 2024, 26, 144–152 RSC.
B. Sanchez-Lengeling and A. Aspuru-Guzik, Inverse molecular design using machine learning: Generative models for matter engineering, Science, 2018, 361, 360–365 CrossRef CAS PubMed.
N. Miao, H. Zhou, L. Mou, R. Yan and L. Li, CGMH: Constrained sentence generation by metropolis-hastings sampling, Proc. AAAI Conf. Artif. Intell., 2019, 33, 6834–6842 Search PubMed.
Y. Xie, C. Shi, H. Zhou, Y. Yang, W. Zhang, Y. Yu and L. Li, MARS: Markov Molecular Sampling for Multi-objective Drug Discovery, Proc. Int. Conf. Learn. Represent., 2021 Search PubMed.
X. Fan, X. Hao, F. Huang, J. Yu, K. Wang and X. Zhang, RGB Thermally Activated Delayed Fluorescence Emitters for Organic Light-Emitting Diodes toward Realizing the BT. 2020 Standard, Adv. Sci, 2023, 10, e2303504 CrossRef PubMed.
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals and G. E. Dahl, Neural message passing for quantum chemistry, Proc. Int. Conf. Mach. Learn., 2017, 1263–1272 Search PubMed.
R. Chibante, Simulated annealing: theory with applications, BoD–Books on Demand, 2010 Search PubMed.
C. Andrieu, N. De Freitas, A. Doucet and M. I. Jordan, An introduction to MCMC for machine learning, Mach. Learn., 2003, 50, 5–43 CrossRef.
G. Zhou, Z. Gao, Q. Ding, H. Zheng, H. Xu, Z. Wei, L. Zhang and G. Ke, Uni-Mol: a universal 3D molecular representation learning framework, Proc. Int. Conf. Learn. Represent., 2023 Search PubMed.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, Attention is all you need, Proc. Adv. Neural Inf. Process. Syst., 2017, 30, 6000–6010 Search PubMed.
D. Brookes, H. Park and J. Listgarten, Conditioning by adaptive sampling for robust design, Proc. Int. Conf. Mach. Learn., 2019, 773–782 Search PubMed.
S. S. Omee, N. Fu, R. Dong, M. Hu and J. Hu, Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study, npj Comput. Mater., 2024, 10, 144 CrossRef.
C. Fannjiang and J. Listgarten, Autofocused oracles for model-based design, Proc. Adv. Neural Inf. Process. Syst., 2020, 33, 12945–12956 Search PubMed.
L. Van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res., 2008, 9, 2579–2605 Search PubMed.
P. Ertl and A. Schuffenhauer, Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions, J. Cheminformatics, 2009, 1, 8 CrossRef PubMed.
Open-source cheminformatics, https://www.rdkit.org, online, accessed 11-april-2013.
I. S. Park, H. Min and T. Yasuda, Ultrafast Triplet-Singlet Exciton Interconversion in Narrowband Blue Organoboron Emitters Doped with Heavy Chalcogens, Angew. Chem., Int. Ed., 2022, 61, e202205684 CrossRef CAS PubMed.
M. Yang, S. Shikita, H. Min, I. S. Park, H. Shibata, N. Amanokura and T. Yasuda, Wide-Range Color Tuning of Narrowband Emission in Multi-resonance Organoboron Delayed Fluorescence Materials through Rational Imine/Amine Functionalization, Angew. Chem., Int. Ed., 2021, 60, 23142–23147 CrossRef CAS PubMed.
Y. Zhang, G. Li, L. Wang, T. Huang, J. Wei, G. Meng, X. Wang, X. Zeng, D. Zhang and L. Duan, Fusion of Multi-Resonance Fragment with Conventional Polycyclic Aromatic Hydrocarbon for Nearly BT. 2020 Green Emission, Angew. Chem., Int. Ed., 2022, 61, e202202380 CrossRef CAS PubMed.
D. Li, M. Li, D. Liu, J. Yang, W. Li, Z. Yang, H. Yuan, S. Jiang, X. Peng, G. Yang, W. Xie, W. Qiu, Y. Gan, K. Liu and S. Su, High-Performance Narrowband OLED with Low Efficiency Roll-Off Based on Sulfur-Incorporated Organoboron Emitter, Adv. Opt. Mater., 2023, 11, 2301084 CrossRef CAS.
T. Fan, M. Du, X. Jia, L. Wang, Z. Yin, Y. Shu, Y. Zhang, J. Wei, D. Zhang and L. Duan, High-Efficiency Narrowband Multi-Resonance Emitter Fusing Indolocarbazole Donors for BT. 2020 Red Electroluminescence and Ultralong Operation Lifetime, Adv. Mater., 2023, 35, 2301018 CrossRef CAS PubMed.
R. K. Naveen, H. I. Yang and J. T. Kwon, Double boron-embedded multiresonant thermally activated delayed fluorescent materials for organic light-emitting diodes, Commun. Chem., 2022, 5, 149 CrossRef PubMed.
J. F. Joung, M. Han, M. Jeong and S. Park, Experimental database of optical properties of organic compounds, Sci. Data, 2020, 7, 295 CrossRef CAS PubMed.
S. Riniker and G. A. Landrum, Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation, J. Chem. Inf. Model., 2015, 55, 2562–2574 CrossRef CAS PubMed.
T. A. Halgren, Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94, J. Comput. Chem., 1996, 17, 490–519 CrossRef CAS.
Z. Shuai, Thermal Vibration Correlation Function Formalism for Molecular Excited State Decay Rates, Chinese J. Chem., 2020, 38, 1223–1232 CrossRef CAS.
M. J. Frisch et al., Gaussian 16, Revision C.01, Gaussian, Inc., Wallingford CT, 2016 Search PubMed.

Footnote

† These authors contributed equally to this work.

Click here to see how this site uses Cookies. View our privacy policy here.