Wei
Wang‡
ab,
Yueqiao
Li‡
ab,
Ang
Zou
ab,
Haochen
Shi
ab,
Xiaofeng
Huang
ab,
Yaoyao
Li
ab,
Dong
Wei
c,
Bo
Qiao
ab,
Suling
Zhao
ab,
Zheng
Xu
ab and
Dandan
Song
*ab
aKey Laboratory of Luminescence and Optical Information, Beijing Jiaotong University, Ministry of Education, Beijing 100044, China. E-mail: ddsong@bjtu.edu.cn
bInstitute of Optoelectronics Technology, Beijing Jiaotong University, Beijing 100044, China
cCollege of Physics and Energy, Fujian Normal University, Fuzhou, 350117, China
First published on 9th February 2022
Quasi-2D perovskites with the general formula of L2An−1PbnX3n+1 (L = organic spacer cation, A = small organic cation or inorganic cation, X = halide ion, and n ≤ 5) are an emerging kind of luminescent material. Their emission color can be easily tuned by their composition and n value. Accurate prediction of the photon energy before experiments is essential but unpractical based on present studies. Herein, we use machine learning (ML) to explore the quantitative relationship between the photon energies of quasi-2D perovskite materials and their precursor compositions. The random forest (RF) model presents high accuracy in prediction with a root mean square error (RMSE) of ∼0.05 eV on a test set. By feature importance analysis, the composition of the A-site cation is found to be a critical factor affecting the photon energy. Moreover, it is also found that the phase impurity greatly lowers the photon energy and needs to be minimized. Furthermore, the RF model predicts the compositions of quasi-2D perovskites with high photon energies for blue emission. These results highlight the advantage of machine learning in predicting the properties of quasi-2D perovskites before experiments and also providing color tuning directions for experiments.
The photon energy of quasi-2D perovskites can be tuned from deep red to blue by adjusting halide ions and the n value. Their physical origins are well explored by previous work,9–17 which provides general directions for color tuning. However, a large number of trial-and-error experiments are still needed to fabricate perovskites with the desired emission color, which requires lots of materials and workforce. Hence, it is critical to accurately predict the photon energy before carrying out experiments which will significantly accelerate the development of quasi-2D perovskites with expected emission colors.
The machine-learning (ML) approach is a scientific model that can efficiently learn from existing results and is gaining increasing attention in material exploration.18–22 With the assistance of ML, researchers can explore a large number of new materials (such as lead-free perovskites),23,24 develop efficient solar cells,19,21,22,27etc. Previously, using ML algorithms, we successfully predicted the bandgap of 3D lead halide perovskites from their compositions and proposed possible compositions of mixed halide perovskites, which can be used in tandem solar cells.24 Marchenko et al. established a database of 2D perovskites; they employed ML algorithms to predict the bandgap of 2D perovskites from their compositions, n values, crystal structures, and so on.25 The following work by Wan et al. utilized this database and ML algorithms to accurately predict the bandgap of 2D perovskite through molecular graphic descriptors.26 These pioneering studies reveal the power of ML in exploring material and device properties. Here, it is also possible to explore the quantitative relationship between the photon energy of quasi-2D perovskites and the governing factors.
Hence, in this work, the ML approach is employed to predict the photon energies of quasi-2D perovskites. To make the prediction more referable for experiments, we use the precursor compositions instead of the calculated material compositions as the input features and identify the relationship between the precursor compositions and the resultant photon energies. A dataset was established by collecting the reported experimental data, which covers a large range of quasi-2D perovskites, including a variety of large organic cations and small cations. In particular, a series of quasi-2D perovskites with photon energies exceeding 2.58 eV for blue LEDs are predicted, which provide essential guidance for experimental screening.
To build a dataset, we searched for studies that reported the photon energy of quasi-2D perovskites and collected approximately 300 data points. The dataset was then cleaned using the following rules: (1) data points with P2A larger than 2.5 (n value larger than 5) were removed, as they generally reflect the properties of 3D perovskites; (2) data points with missing data for one or more input features were removed; (3) duplicate data points from different literature studies were removed; (4) for the data points with same input features but different photon energies, we reserved the data point with the reported largest value. For PEA2Csn−1PbnBr3n+1 with P2L = 2.5, for example, we reserved the data point with the highest energy of 2.451 eV. As a result, we obtained 106 data points, listed in Table S1.† These data points include Cl-, Cl/Br mixed, Br-based perovskites with different organic spacer cations and A-site cations (Cs, FA, and MA). The maximum photon energy of quasi-2D perovskites is 2.840 eV, and the smallest value is 2.280 eV.
Firstly, we use a correlation matrix to learn the linear correlations between the features and the photon energy. As displayed in Fig. 1, the photon energy (abbreviated as PE in Fig. 1) of the perovskites has a non-negligible correlation with the halide anions, the cations, and the ratios of the organic spacer cation and small cation to Pb2+. As expected, the photon energy shows a positive correlation with the Cl ratio (=Cl/(Br + Cl), abbreviated as Cl) and a negative correlation with the Br ratio (=Br/(Br + Cl), abbreviated as Br). Among the three types of A-site cations (MA, FA, and Cs), the photon energy exhibits a negative correlation with the FA ratio (=FA/(Cs + FA + MA), abbreviated as FA), while it is positively changed with the Cs ratio (=Cs/(Cs + FA + MA), abbreviated as Cs). The photon energy shows a negative correlation with P2L, as expected. It shows a positive correlation with P2A, implying that a higher concentration of Pb2+, which refers to organic cations, aids in reaching higher photon energy.
To clearly show the relationship between the factors and the device performance, the statistics of the photon energy values changing with different factors are plotted, as shown in Fig. 2.
4 algorithms were used for ML, including linear regression (LR), neural network (NN), random forest (RF) and extreme gradient boosting (XGBoost). LR is simple and facile to establish manually, and we previously proved that it was effective in predicting the bandgap of 3D perovskites.24 The input features for the ML algorithms are the Cs/(Cs + FA + MA) ratio, FA/(Cs + FA + MA) ratio, Br/(Br + Cl) ratio, the molar ratios of Pb2+ to an organic spacer cation (abbreviated as P2L) and of Pb2+ to an organic cation (abbreviated as P2A) in the precursor solution, and XLogP3 value. The output is photon energy. 10-fold cross-validation was employed to optimize the hyperparameters. The dataset was randomly divided into 7:
3 halves for training (the training set) and testing (the test set). The performances of the algorithms are evaluated using the root mean square error (RMSE) and Pearson's coefficient (r value). An algorithm with a lower RMSE and a higher r value predicts with higher accuracy and reliability.
Table 1 shows the performance of the different algorithms n predicting the photon energy. Fig. 3a shows the comparison of the actual photon energies and the predicted values by the different algorithms. A low RMSE (≤0.07 eV) is achieved by all algorithms, indicating a high accuracy of these algorithms in prediction. Moreover, the r value is higher than 0.70 for all algorithms, which means that the predicted values and the experimental values have a strong linear correlation. Comparing these 4 algorithms, RF performs best, with the smallest RMSE on the test set and a high r value, larger than 0.9. We also calculated the average RMSE on the test set based on 10 executions, and the RF model still shows the lowest RMSE. LR is not as accurate as it is in predicting the bandgap of 3D perovskites in our previous work, because the quasi-2D perovskite is more complex, and LR is too simple to process it. This also indicates that the photon energy of the quasi-2D perovskite does not linearly depend on the screened factors. NN and XGBoost algorithms deliver excellent performance on the training set and a relatively poor performance on the test set, implying that their models are over-fitted due to the limited dataset size. The RF model is not over-fitted and exhibits its advantages in complex systems such as those in this work.
ML algorithms | Training set | Test set | ||
---|---|---|---|---|
RMSE (eV) | r Value | RMSE (eV) | r Value | |
LR | 0.065 | 0.80 | 0.066 | 0.76 |
RF | 0.038 | 0.94 | 0.047 | 0.92 |
XGBoost | 0.012 | 1.00 | 0.069 | 0.71 |
NN | 0.013 | 0.99 | 0.059 | 0.88 |
The accuracy of the algorithms highly depends on data screening. Fig. 3b depicts the prediction performance on the dataset, including the outlier with a photon energy of 2.844 eV. The predicted values deviate further from the true values than those shown in Fig. 3a based on the dataset without the outlier. Generally, higher photon energies are underestimated, whereas lower photon energies are overestimated in Fig. 3b. For the LR model, the RMSEs for the training and test sets are 0.068 and 0.067 eV, respectively, which are higher than the values (0.065 and 0.066 eV) listed in Table 1. For the RF algorithm, the RMSEs for the training and test sets are 0.040 and 0.050 eV, respectively, which are also higher than the corresponding values listed in Table 1. These results illustrate the importance of the smart screening of the reported experimental dataset.
The RF model can assess the significance of the input features (shown in Fig. 4). The Cs ratio, rather than the P2L or Br ratio, is the most critical factor affecting photon energy. The Br ratio is also important, while P2A and XLogP3 are less important. Considering the extreme importance of the Cs ratio and its positive correlation with photon energy, these results imply that using Cs as the major A-site cation is crucial for obtaining high photon energy. In addition, it also implies that the reported quasi-2D perovskites suffer from phase impurity, as P2L is not as essential to determining photon energy as expected.
As the RF model predicts the photon energy more accurately, we use it to screen quasi-2D perovskites with desired compositions and photon energies. Here, MA-free quasi-2D perovskites tend to achieve high material stability. The settings for the precursor compositions are listed in Table S3,† leading to the generation of nearly 40000 data points. These quasi-2D perovskites are Cs-dominated (Cs ratio in the range of 0.75–1.00) and have small n values (P2L in the range of 0.6–1.6). The RF model, trained by the experimental dataset listed in Table S1,† was used to predict their photon energies.
The predicted photon energies of the perovskites vary on the scale of 2.478–2.628 eV, with the emission peak wavelengths varying from 471 nm to 500 nm. For blue emission, photon energies larger than 2.58 eV (wavelength shorter than 480 nm) are required. Fig. 5a shows the statistics of the factors enabling the photon energies to meet this requirement. As can be seen, a high Cs ratio (0.90–1.00) is essential to get blue emission. Other factors have identical ranges as those of the settings. Moreover, it is found that the factors interplay with each other. For example, for a Cs ratio of 0.90, the Br ratio should be lower than 0.50 to get blue emission, whereas for a Cs ratio of 1.00, pure Br-based perovskites can also achieve blue emission, which is in accordance with the experimental results.34 Two aspects can explain the importance of the Cs ratio in getting high photon energy: small radius size35 and low solubility36 in the typically used polar solvents DMSO/DMF. The small radius size of Cs tilts the PbX6 octahedra, generating the large bandgap as discussed before.35,37 The lower solubility of Cs+ compared to that of FA+ and MA+ (ref. 34) makes it participate in the initial nucleating process together with large spacer cations with low solubility, which can form moderate n phases. Hence, the phase impurity can be reduced, bringing the photon energies closer to the expected values.
To clearly show the effects of the different factors on the photon energy of quasi-2D perovskites, the graphs of the predicted values and the factors are shown in Fig. 5b–d. Fig. 5b shows the variation of the photon energy with the ratios of Br and Cs, while P2L, P2A and XLogP3 are fixed to be 1.5, 1.6, and 2.80, respectively. Obviously, increasing the Cs ratio from 0.75 to 1.00 leads to an increase in photon energy from 2.50 eV to 2.58 eV. A decrease in the Br ratio (an increase in the Cl ratio) leads to a complex change in the photon energy. Pure Br-based perovskites have relatively higher photon energy than Br0.9Cl0.1-based ones; in Br–Cl mixed perovskites, the photon energy first increases with decreasing Br ratio, and then displays a slight chance. Despite the fact that this trend is contrary to theoretical knowledge, it is in accordance with the reported experimental results shown in Fig. 2b. The RF model learned the role of the Br ratio from the data in Fig. 2b, i.e.; decreasing the Br ratio does not always increase the photon energy. Hence, it assigned the corresponding weight to the Br ratio. In comparison, increasing the Cs ratio was discovered to be effective in increasing the photon energy, and the RF model assigned a large weight to the Cs ratio. As a consequence, increasing the Cs ratio efficiently increases the photon energy. These findings further demonstrate the RF model's ability to handle complex relationships.
Decreasing P2L generally increases photon energy, as shown in Fig. 5c, which is to be expected. However, the change is minor, i.e., decreasing P2L from 1.5 to 1.0 (corresponding to n decreasing from 3 to 2) leads to an increase of less than 0.02 eV. Though decreasing P2L is also thought to be efficient in increasing photon energy,14,38,39 the formation of large n phases with low photon energies produces the main emission peak arising from large n phases. Hence, the RF model discovered that decreasing P2L did not effectively enhance photon energy.
Additionally, modifying XLogP3 also leads to a change in photon energy, as shown in Fig. 5d. Increasing XLogP3 from 1.9 to 2.8 induces the largest increase of 0.04 eV in the photon energy. This is in line with the observations from the experimental results in Fig. 2d. As P2L and other factors are controlled to be the same, this increase can be ascribed to the reduction of large n phases with low photon energy.
Based on these results and the evidence in the literature, it can be summarized that the photon energy of quasi-2D perovskites is largely affected by the formation of large n phases, i.e., the phase impurities. Hence, suppressing phase impurity is critical for getting blue emission. From the correlation between the photon energy and the XLogP3 value, it can be speculated that the solubility of the organic spacer cation in the solvents is probably an important factor that needs to be considered to suppress the phase impurity. From these results, it can also be observed that a high Cs ratio is essential to obtain blue emission.
X
i
, Yi, , Ȳ, and n represent the ith value of the experimental dataset, the ith value of the predicted dataset, the mean value of the experimental dataset, the mean value of the predicted dataset, and the number of the dataset points, respectively. The ratio of the test set is 0.3. To train the ML algorithms, we use 10-fold cross-validation to optimize the hyperparameters, which divides the datasets into 10 parts (90% data points for training and 10% for validation) and does the learning 10 times. The model having the lowest average RMSE on the validation sets was screened for testing on the test set and further use. In detail, the NN model had 3 hidden layers, which had 8, 6 and 4 neurons, respectively; the tree number in the RF model was 5000. The maximum depth and the number of rounds in the XGBoost model were 10 and 40, respectively.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d2na00052k |
‡ Wei Wang and Yueqiao Li contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2022 |