Predicting the bandgap of ABX2 materials: supervised machine learning on small dataset
Abstract
Predicting the electronic bandgap in ABX2-type materials (where A = {Al, Ga, Zn, Cd}, B = {Ga, In, Ge, Sn, Si}, and X = {N, P, As, Sb}) is essential for photovoltaic applications. However, when working with a limited dataset, the task becomes significantly more challenging. By means of a machine learning (ML) approach, the electronic bandgaps of 99 datasets of ABX2-type materials are classified and predicted with good accuracy. We find that logistic regression achieves 97% classification accuracy. For regression, the sure independence screening and sparsifying operator (SISSO) method is used for feature selection, followed by various ML models, where least absolute shrinkage and selection operator (LASSO) yields the best test R2 (0.92) and RMSE (0.30 eV). We also develop an ML model to predict volume, a structural feature used for bandgap prediction, with high prediction accuracy with R2 = 0.99 and RMSE = 6.86 Å3. K-means clustering is used to find underlying patterns in the data and reveals two distinct material families. Notably, when crystallographic phase information is included, the prediction accuracy further improves with R2 = 0.95 and RMSE = 0.28 eV. Therefore, our results show that ML can efficiently replace computationally expensive DFT methods and can also perform well on small, costly datasets, making it suitable for broader and more specialized problem spaces than previously assumed.

Please wait while we load your content...