Predicting Bandgap of ABX2 Materials: Supervised Machine Learning on Small Datasets
Abstract
Predicting electronic bandgap in ABX2 -type materials (where, A ={Al, Ga, Zn, Cd}, B ={Ga, In, Ge, Sn, Si}, and X = {N, P, As, Sb}) is essential for photovoltaic applications. However, when working with a limited dataset, the task becomes significantly more challenging. By means of a machine learning (ML) approach, electronic bandgap of 99 datasets of ABX2 -type materials are classified and predicted with good accuracy. We find that logistic regression achieves 97% classification accuracy. For regression, the Sure Independence Screening and Sparsifying Operator (SISSO) method is used for feature selection, followed by various ML models, where Least Absolute Shrinkage and Selection Operator (LASSO) yielding the best test R2 (0.92) and RMSE (0.30 eV). We also develop an ML model to predict volume with prediction accuracy with R2 = 0.99 and RMSE = 8.65 Å3, a structural feature used for bandgap prediction. The K-means clustering is used to find underlying patterns in the data i.e. reveals two distinct material families. Notably, when crystallographic phase information is included, the prediction accuracy further improves with R2 = 0.95 and RMSE = 0.28 eV. Therefore, our results show that ML can efficiently replace computationally expensive DFT methods and can also perform well on small, costly datasets, making it suitable for broader and more specialized problem spaces than previously assumed.
Please wait while we load your content...