Open Access Article
Ngo T. Que
a,
Vu D. Huanb,
Le T. Duyb,
Vu N. Baob,
Vu L. Minh
c,
Mai X. Trang
d,
Anh D. Phan†
*ab and
Pham T. Huy
b
aPhenikaa Institute for Advanced Study, Phenikaa University, Hanoi 12116, Vietnam. E-mail: anh.phanduc@phenikaa-uni.edu.vn
bFaculty of Materials Science and Engineering, Phenikaa School of Engineering, Phenikaa University, Hanoi 12116, Vietnam
cFaculty of Science, Engineering and Built Environment, School of Information Technology, Deakin University, Australia
dPhenikaa School of Computing, Phenikaa University, Hanoi 12116, Vietnam
First published on 27th February 2026
We present a data-driven approach to predict the excitation wavelength, emission wavelength, and crystal field energy levels (4T1, 4T2) in Mn4+-doped phosphors based solely on elemental composition. For the first time, we construct the largest and most compherensive experimental dataset of Mn4+-activated phosphors to train and accurately predict the properties without relying on complex structural descriptors. Among several evaluated models, the K-Nearest Neighbors and Extra Trees Regressors achieved the highest accuracy for predicting excitation and emission wavelengths, respectively. Importantly, to evaluate generalization, we test these models on Eu3+-doped systems and achieve high predictive accuracy. An inverse design model is further developed to suggest candidate phosphor compositions for target optical outputs. By avoiding complex descriptors while preserving accuracy and interpretability, this work provides a foundation for theory-informed discovery of luminescent materials.
Accurately predicting the excitation wavelength, emission peak, and the 4T1 and 4T2 energy levels is not only crucial for optimizing the performance of phosphor materials, but also fundamental to advancing our theoretical understanding of their luminescent behavior.12,13 These physical quantities provide insights into the electronic structure and energy transfer mechanisms that govern how materials interact with light. In particular, the 4T1 and 4T2 energy levels are associated with specific electronic transitions of dopant ions, which influence both the position and intensity of emission bands. By analyzing these transitions, researchers can infer the local coordination environment, crystal field strength, and site symmetry of activator ions within the host lattice.14 This information is essential for selecting suitable host materials and dopants to achieve the desired emission and thermal stability.15 Similarly, the emission peak indicates the energy of photons released as excited electrons return to lower energy states, while the excitation wavelength represents the energy required to trigger this luminescent process. Knowing these two quantities helps select suitable excitation sources, enhance color quality, and evaluate optical efficiency of phosphor materials.16 These understanding plays a key role in facilitating fabrication and application by identifying promising material systems prior to synthesis.
While experimental techniques such as photoluminescence, photoluminescence excitation, time-resolved luminescence, and temperature-dependent emission analyses have been widely used,1,17–19 they require costly equipment, demanding sample preparation, specialized environments, and time-consuming procedures. These problems limit their use in high-throughput or exploratory studies. In contrast, theoretical methods based on using machine learning (ML) or deep learning (DL) to analyze database give fast and accurate estimation of key optical parameters using only compositional or structural data.20,21 These computational approaches dramatically reduce time and resources needed to screen and optimize phosphor materials.
Machine learning and deep learning are increasingly being applied to the research and design of luminescent materials, particularly phosphors used in LED technologies.22–34 These models allow us to predict the emission wavelength,23,24,27,30–32 thermal quenching temperature,27,29,31 spectral bandwidth, and quantum yield31 based on a material's composition and crystal structure. Algorithms including artificial neural networks (ANN),33 Gradient Boosting Regression,23,26,30,31 and Random Forest22–25,34 have shown strong performance in accelerating the discovery and optimization of phosphor materials. Among various dopants, europium (Eu)-doped phosphors are the most widely studied using machine learning because a large amount of experimental data is available for them.28–31,33,34 In contrast, other dopant systems remain underexplored because of the lack of comprehensive and publicly available data. Another major challenge in this area is the use of many input features, which is typically between 50 and 150, for most machine learning models.23,26–29,31–34 These features often include detailed information at the atomic level such as atomic structure data,29 ionic radii,23,33,34 atomic weights,32,34 and electronegativity values.23,24,27,30,31,34 Furthermore, collecting full data for each material is often difficult and takes time.
These challenges raise important questions about how to improve the accuracy and usefulness of machine learning models for designing phosphor materials. (1) Can the excitation wavelength, emission peak, 4T1 and 4T2 energy levels of Mn4+-doped phosphors be accurately predicted using only elemental composition without relying on experimental properties or complex descriptors? (2) Which machine learning algorithms provide the best predictive accuracy for excitation and emission properties of Mn4+-doped phosphors? (3) Are models trained solely on Mn4+-doped compositions transferable to other dopant systems with different luminescent behavior? (4) Lastly, can an inverse-design approach be developed to propose candidate phosphor compositions based on desired excitation and emission wavelengths? Answering these questions will help create more efficient and generalizable machine-learning tools to better discover and design new phosphor materials.
In this work, we address these challenges by developing a data-driven approach to predict and design phosphor materials. We first collect experimental data on Mn4+-doped phosphors and use it to train machine learning models that predict the excitation wavelength, the emission peak, and the wavelength of the 4T1 and 4T2 transition based solely on chemical composition. To evaluate the generalizability of our approach, we apply the trained models to predict the optical properties of Eu3+-doped phosphors. Once reliable forward prediction models are established, we construct an inverse design algorithm to suggest phosphor compositions based on target properties.
:
20 ratio, which is commonly used in machine learning studies. In our previous work,21 we examined different splitting ratios ranging from 60
:
40 to 90
:
10 and found that increasing the proportion of training data generally improves the predictive accuracy. However, the improvement becomes marginal when the training set increases from 80 to 90%. To validate our model and avoid overfitting, a 5-fold cross-validation was used during training. Moreover, we performed hyper parameter optimization for each machine learning algorithm using a randomized search approach. In our work, all regression models and the GridSearchCV-based hyperparameter tuning were implemented using the scikit-learn library.35,36 This approach scans a broad range of parameter values and selects those that give the best results under cross-validation and improve predictive accuracy.
The model performance was evaluated using the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE), defined as
![]() | (1) |
![]() | (2) |
![]() | (3) |
| Size of data | Type of data | Type of doping | Predicted | DL/ML models | R2 | RMSE | MAE | Reference |
|---|---|---|---|---|---|---|---|---|
| 39 | Experiment | Mn4+ | 2E energy (cm−1) (lowest energy excited state) | Linear regression | 0.95 | 149.99 | 89.33 | 22 |
| Robust regression | 0.94 | 153.73 | 95.68 | |||||
| Lasso regression | 0.95 | 149.86 | 91.05 | |||||
| Ridge regression | 0.93 | 168.07 | 133.81 | |||||
| ElasticNet | 0.66 | 383.54 | 281.9 | |||||
| DT | 0.31 | 541.56 | 401.17 | |||||
| RF | 0.72 | 348.04 | 249.26 | |||||
| 116 | Experiment | Mn4+ | Emission peak (nm) | XGB | 0.71 | 14.25 | 9.88 | 23 |
| RF | 0.80 | 16.65 | 10.77 | |||||
| Lasso regression | 0.64 | 17.09 | 11.37 | |||||
| Ridge regression | 0.69 | 18.93 | 12.82 | |||||
| KNN | 0.85 | 13.08 | 8.13 | |||||
| SVR | 0.81 | 13.6 | 9.39 | |||||
| 33 | Experiment | Mn4+ | Emission peak (nm) | RF | 0.87 | 0.7 | 24 | |
| 65 | Experiment | Mn4+ | Lifetime (ms) | RF | 0.432 | 25 | ||
| 2832 | DFT | Ce3+ | Relative permittivity (ϵr) (eV) | XGB | 0.93 | 0.65 | 26 | |
| 219 | Experiment | Ce3+ | Centroid shift (eV) | XGB | 0.90 | 0.18 | ||
| 76 | Experiment | Ce3+ | Emission peak (nm) | Kernel Ridge | 0.79 | 12.64 | 27 | |
| Thermal quenching (K) | 0.64 | 37 | ||||||
| 2610 | DFT | Eu2+ and Ce3+ | Debye temperature (K) | SVR | 0.89 | 59.9 | 37.9 | 28 |
| 269 | Experiment | Eu3+ | Thermal quenching (K) | SVR | 0.71 | 31 | 29 | |
| 129 | Experiment | Eu2+ | Emission peak (nm) | XGB | 0.78 | 42 | 30 | |
| 1665 | Experiment | Eu2+ and Eu3+ | Emission peak (nm) | XGB | 0.866 | 11.2 | 31 | |
| 877 | 1st excitation max (nm) | 0.775 | 8.83 | |||||
| 951 | Decay time (ns) | 0.987 | 0.09 | |||||
| 1252 | CIE X coordinate | 0.937 | 0.02 | |||||
| 1252 | CIE Y coordinate | 0.814 | 0.02 | |||||
| 183 | Thermal quenching (K) | 0.574 | 44.61 | |||||
| 555 | Internal quantum efficiency | 0.674 | 9.8 | |||||
| 56 | External quantum efficiency | 0.675 | 8.48 | |||||
| 186 | Experiment | Cr3+ | Emission peak (nm) | SVR | 0.821 | 8.761 | 32 | |
| KNN | 0.85 | 9.125 | ||||||
| 95 | Experiment | Eu2+ | Excitation wavelength (nm) | CBP | 0.999 | 1.68 | 33 | |
| Multiple linear | 0.999 | 1.74 | ||||||
| ANN | 0.9999 | 1.83 | ||||||
| 296 | Experiment | Eu3+ | Asymmetry ratio (Λ) | RF | 0.90 | 1.03 | 0.77 | 34 |
Compared with previous work on Eu-doped phosphors (ref. 31), our models achieve higher R2 values, which are typically below 0.8 in earlier studies. In contrast, ref. 33 reported R2 values close to 1 because the dataset is small, less noisy, and based on simple luminescent materials with features strongly related to the predicted property. The evaluation was mainly performed on the training set with a small test set and without rigorous cross-validation. In our case, the Mn4+ dataset is much larger and chemically more diverse, so some residual scatter and a limited number of outliers are unavoidable in a minimal-input model. These outliers indicate that additional factors beyond chemical composition such as local structure, defects, or experimental uncertainties also influence the excitation behavior. We therefore view the present results as a realistic baseline for composition-only predictions and as a starting point for future models that incorporate more detailed structural descriptors.
The emission-wavelength prediction accuracy of six machine learning models for Mn4+-doped phosphors is compared in Fig. 3. All models show high accuracy with R2 values between 0.94 and 0.98, MAE values ranging from 1.24 to 5.14 nm, and RMSE values from 4.01 to 7.27 nm. Among them, the Extra Trees Regressor exhibits the best performance with the highest R2 of 0.98, the lowest MAE of 1.24 nm, and RMSE of 4.37 nm. The K-Nearest Neighbors, Random Forest, and Support Vector Regression models also provide good predictions with R2 values above 0.96. Compared with emission peak predictions in earlier studies23,24,27,30–32 (Table 1), our models achieve better accuracy due to the larger and more comprehensive dataset and the use of advanced algorithms. The maximum emission wavelengths of Mn4+-doped phosphors typically fall within two ranges: red (620–640 nm) and deep-red/far-red (650–740 nm). The Extra Trees Regressor is the most accurate in the red region, while the Support Vector Regressor performs slightly better for deep-red and far-red emissions. These results suggest that different algorithms capture distinct composition–property relationships, and that ensemble-based methods, particularly Extra Trees, are highly effective for modeling emission behavior.
To gain chemical insight into these predictions, we analyze the feature importance of the Extra Trees model for emission prediction (Fig. S4 in the SI). The analysis shows that fluorine (F), oxygen (O), lanthanum (La), and aluminum (Al) have the highest importance scores, while the remaining elements contribute more weakly. This trend is consistent with physical expectations. The presence of F and O anions significantly affects the local anion environment around Mn4+ and therefore have a strong influence on the crystal-field strength, covalency, and nephelauxetic effect.37 La and Al act as common host cations that control the local coordination geometry and lattice rigidity.38,39 By contrast, many other cations mainly play secondary structural or charge-balancing roles. As a result, they contribute less independent information to the model and thus receive lower feature-importance scores.
The predictive performance of six machine learning models for estimating the 4T1 energy levels of Mn4+-doped phosphors is shown in Fig. 4. The Decision Tree Regressor achieves the highest accuracy with an R2 of 0.82, an MAE of 5.44 nm, and an RMSE of 13.04 nm. The remaining models (Extra Trees, Support Vector Regression, K-Nearest Neighbors, Gradient Boosting Regressor and Random Forest) provide slightly lower predictive performance with R2 values ranging from 0.77 to 0.81. Compared with the emission-peak prediction, the accuracy for 4T1 is clearly reduced. This difference arises mainly from the way 4T1 energies are determined and from their stronger dependence on local structure. The 4A2 → 4T1 transition and other electronic transitions such as the charge-transfer band and the 4A2 → 2T2 transition can spectrally overlap. This leads to the experimental determination of 4T1 energies less precise and introduces uncertainties into the training dataset. In addition, the 4T1 energy is highly sensitive to local coordination geometry, crystal-field distortions, and covalency, whereas our descriptors do not fully capture these local effects.
Fig. 5 shows the predictive performance of the six machine learning models for the 4T2 energy level. Unlike the results obtained for the 4T1 energy level, the K-Nearest Neighbors Regressor outperforms other models with an R2 of 0.86, MAE of 4.12 nm, and RMSE of 13.96 nm. The Decision Tree Regressor, which previously provided the best results for predicting 4T1 energies, shows significantly lower accuracy for the 4T2 level with an R2 of 0.75, MAE of 5.22 nm, and RMSE of 18.76 nm. The remaining models present intermediate predictive performance with R2 values ranging from 0.81 to 0.83. These findings suggest that different electronic transitions exhibit distinct relationships with compositional features and machine-learning models may be particularly effective at modeling the 4T2 energy level.
To further examine the generalization capability of the emission model, we applied the trained Extra Trees regressor to an independent set of Mn4+-doped phosphors that were not used in either training or testing. Specifically, we considered all Mn4+-activated compositions in very recent works40–47 that (i) exhibit a dominant red emission band and (ii) contain only elements represented in our descriptor space. As shown in Table 2, the absolute differences between predicted and experimental wavelengths range from 1.1 to 26.2 nm, with a mean deviation of approximately 10.7 nm and an RMSE of about 13.2 nm. These errors are larger than the internal test-set, as expected for an external validation set comprising newly reported materials. But these findings indicate that the model is able to provide reasonably accurate first-order estimates of emission peaks for previously unseen Mn4+-doped phosphors. Rather than serving as an exact line-position predictor, the current model is therefore best viewed as a screening tool to identify promising candidate compositions in the desired spectral range.
| Formula | Actual | Predicted | Ref. |
|---|---|---|---|
| LaMg3Sb0.999O7Mn0.001 | 695 | 696.81 | 40 |
| CaYMgNb0.997O6Mn0.003 | 688 | 691.82 | 41 |
| CaAl2Si1.992O8Mn0.1 | 680 | 690.9 | 42 |
| Mg28Ge6.4Sn1.1O32F15.04Mn0.05 | 659 | 638.37 | 43 |
| Ca0.8Na0.6Gd0.6MgWO6Mn0.0005 | 685 | 698.2 | 44 |
| La3Ga5Si0.9998O14Mn0.0001 | 713 | 686.8 | 45 |
| CsNaWO2F4Mn0.01 | 631 | 622.55 | 46 |
| Ca1.99Mn0.01La3Sb3O14 | 709 | 702.6 | 47 |
| Zn1.99Mn0.01La3Sb3O14 | 690 | 704.75 | 47 |
| Mg1.99Mn0.01La3Sb3O14 | 705 | 703.87 | 47 |
After training the machine learning models on Mn4+-doped phosphor data, we evaluate their transferability by applying them to a dataset of Eu-doped phosphors. The experimental dataset was obtained from a recent work of Jang,31 and the results are presented in Fig. 6. As shown in Fig. 6a, the Extra Trees Regressor predicts the emission peaks with relatively high accuracy, reaching R2 = 0.89, MAE = 7.6 nm, and RMSE = 20.58 nm. It is important to note that the model was trained only on Mn4+-doped phosphors, which emit in the 620–740 nm range, yet it is able to provide reasonably accurate predictions for a broader spectral range of 360–780 nm. In contrast, Fig. 6b presents the excitation wavelength prediction using the Gradient Boosting Regressor, which obtains an R2 of 0.7, an MAE of 30.03 nm, and an RMSE of 15.88 nm. These results indicate that the cross-dopant excitation predictions are less accurate than the emission predictions.
Compared with the results reported by Jang et al.,31 where emission-peak wavelength prediction achieved R2 = 0.866 and excitation prediction for the first peak reached R2 = 0.775, our model performs competitively or better. Jang's excitation model was trained only on the first excitation peak, while our model was trained on a broader dataset. For a fair comparison, we also retrain our model using only the first excitation peak data and obtain R2 = 0.88. Details of this analysis are provided in the SI. In addition to the validation on Eu-doped phosphors, our approach also shows superior performance on the Mn4+-doped dataset, where the best emission model reaches R2 = 0.98. This indicates a significant improvement over previous models, which reported R2 values between 0.78 and 0.87.23,24,27,30–32
To further validate the generalizability of our model, we compare its predictions with experimental data from several recent studies on Eu3+-doped phosphors published in 2025. Table 3 presents a direct comparison between predicted and experimental emission peaks for a series of compositions not included in the training and testing process. Across these 13 samples, the mean absolute deviation between predicted and experimental values is on the order of 10–20 nm. The Extra Trees model therefore remains reasonable predictive accuracy for Eu3+-doped systems, even though it was trained exclusively on Mn4+-doped data. Using very recent experimental data provides an independent check on model performance and indicates that the proposed framework can be applied to newly reported luminescent materials that were not part of the original training set.
| Formula | Actual | Predicted | Ref. |
|---|---|---|---|
| Sr3CaNb1.994O9Eu0.06 | 613 | 613.3 | 48 |
| Ca2MgWO6Eu0.01Eu0.02 | 616 | 613.64 | 49 |
| Y4Al2O9Eu0.05 | 611 | 595.45 | 50 |
| La2LiNbO6Eu0.2 | 613 | 614.3 | 51 |
| LiZnPO4Eu0.03 | 594 | 572.57 | 52 |
| LiSnPO4Eu0.03 | 616 | 556.17 | 52 |
| Sr4La6Si6O24Cl2Eu0.1 | 614 | 579.35 | 53 |
| LaNb2VO9Eu0.003 | 618 | 614.95 | 54 |
| SrLaZnNbO6Eu0.11 | 618 | 613.5 | 55 |
| Ca2LaNbO6Eu0.01 | 615 | 614.15 | 56 |
| Ca3Zr2SiGa2O12Eu0.1 | 610 | 572.15 | 57 |
| Na2ZrO3Eu0.002 | 613 | 602.57 | 58 |
| Sr3La2W2O12Eu0.09 | 616 | 608.82 | 59 |
Our inverse-design approach is then applied to two datasets, one for Mn4+-doped phosphors and one for Eu-doped phosphors. The results show that on the Mn4+ dataset, the model successfully predicts the compositions of 265 out of 347 test samples. Similarly, on the Eu dataset, it correctly identifies 144 out of 333 compositions in the test set. Because the Extra Trees regressor is an unconstrained continuous model, its outputs for the atomic fractions are real numbers and are not mathematically forced to satisfy compositional constraints. In principle, the predicted fractions may sum to slightly more or less than 100% or even become negative. In our calculations, we do not observe negative fractions. To obtain chemically meaningful formulas, we therefore discard any predicted composition with a total atomic fraction that deviates from 100%. This screening is applied only to the model outputs in the inverse-design stage. All input compositions in the training and test sets are taken directly from experiment and are already physically valid. Under this constraint, about 96% of the suggested compositions remain valid. This indicates that our inverse-design scheme can propose phosphor compositions from desired optical targets. Rather than serving as a purely generative model, it provides a practical tool to rapidly screen and suggest new phosphor candidates. Thereby, our calculations support experimental synthesis and reduce the time and resources required for materials discovery. The predicted compositions generated by the inverse-design model are listed in SII for Mn4+-doped phosphors and SIII for Eu-doped phosphors in the SI.
Our inverse-design calculations are carried out purely in composition space under simple chemical constraints. All atomic fractions are non-negative, renormalized to sum to 100%, and the Mn or Eu dopant content is limited to the experimental range. The present reverse-engineering scheme operates only at the composition level and is consequently more limited than structure-aware inverse-design approaches that explicitly optimize lattice or microstructural degrees of freedom. However, such structure-resolved methods require reliable crystal structure models and high-cost atomistic calculations. This causes their systematic application to thousands of candidate phosphors to be challenging. Consequently, our inverse-design model is intended as a fast-screening tool that can guide subsequent structure-based simulations or experimental validation.
This work directly addresses the research questions raised in the Introduction. We showed that accurate predictions of the excitation wavelength, emission peak, and 4T1 and 4T2 transition wavelengths can be determined using only elemental composition without requiring experimental properties or complex descriptors. We further show that models trained solely on Mn4+-doped phosphors can generalize to Eu3+-doped systems, highlighting their transferability across different dopant types. Additionally, our models can be optimized for specific spectral regions and integrated into an inverse design to propose candidate compositions that meet desired optical targets. Compared with previous studies, our approach exhibits higher predictive accuracy while requiring only simple compositional input. Our study provides a minimal-input and data-driven approach for accelerating the discovery and design of high-performance phosphors and expanding the search space for next-generation luminescent materials.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6ra00029k.
Footnote |
| † Present address: Center for Materials Innovation and Technology, Vin University, Hanoi, Vietnam. |
| This journal is © The Royal Society of Chemistry 2026 |