Predicting the bandgap of ABX2 materials: supervised machine learning on small dataset

Abstract

Predicting the electronic bandgap in ABX2-type materials (where A = {Al, Ga, Zn, Cd}, B = {Ga, In, Ge, Sn, Si}, and X = {N, P, As, Sb}) is essential for photovoltaic applications. However, when working with a limited dataset, the task becomes significantly more challenging. By means of a machine learning (ML) approach, the electronic bandgaps of 99 datasets of ABX2-type materials are classified and predicted with good accuracy. We find that logistic regression achieves 97% classification accuracy. For regression, the sure independence screening and sparsifying operator (SISSO) method is used for feature selection, followed by various ML models, where least absolute shrinkage and selection operator (LASSO) yields the best test R2 (0.92) and RMSE (0.30 eV). We also develop an ML model to predict volume, a structural feature used for bandgap prediction, with high prediction accuracy with R2 = 0.99 and RMSE = 6.86 Å3. K-means clustering is used to find underlying patterns in the data and reveals two distinct material families. Notably, when crystallographic phase information is included, the prediction accuracy further improves with R2 = 0.95 and RMSE = 0.28 eV. Therefore, our results show that ML can efficiently replace computationally expensive DFT methods and can also perform well on small, costly datasets, making it suitable for broader and more specialized problem spaces than previously assumed.

Graphical abstract: Predicting the bandgap of ABX2 materials: supervised machine learning on small dataset

Supplementary files

Article information

Article type
Paper
Submitted
21 Dec 2025
Accepted
27 Apr 2026
First published
28 Apr 2026

J. Mater. Chem. C, 2026, Advance Article

Predicting the bandgap of ABX2 materials: supervised machine learning on small dataset

U. Kumar, M. W. Ullah, V. Kumar, J. M. R. Muir and F. Zhang, J. Mater. Chem. C, 2026, Advance Article , DOI: 10.1039/D5TC04449A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements