Uncertainty-Aware Active Learning Reveals Reliability Limits in Lead-Free Halide Perovskite Screening
Abstract
The discovery of stable, lead-free perovskite materials for photovoltaic applications is challenged by the vast chemical space of possible compositions and by the systematic inaccuracies inherent in high-throughput density functional theory (DFT) calculations. In particular, widely used semi-local functionals such as PBE are known to underestimate band gaps, while data-driven screening workflows often treat all machine learning predictions as equally reliable. In this work, we present an uncertainty-aware active learning framework for the screening of lead-free halide perovskites that explicitly distinguishes between reliable predictions and regions of limited model knowledge. By expanding the search space beyond ideal cubic perovskites to include distorted, vacancy-ordered, and mixed-anion structures, we intentionally address a more realistic and challenging materials landscape. An ensemble regression model is employed to predict DFT band gaps while quantifying epistemic uncertainty arising from data sparsity and model disagreement. To correct the systematic bias of PBE-calculated band gaps, we introduce a statistically validated, stratified PBE-to-experiment calibration scheme based on experimentally characterized benchmark compounds. This calibration aligns theoretical predictions with experimental trends without artificially improving predictive accuracy. The resulting screening reveals recurring patterns in candidate selection, including the frequent emergence of heavy d-electron halides, which we identify as potential false positives arising from functional limitations and feature-level abstractions. Rather than claiming definitive material discoveries, this study demonstrates how uncertainty quantification and active learning can be used to expose blind spots in conventional screening pipelines and to prioritize materials for higher-fidelity electronic structure calculations. The proposed framework provides a principled strategy for allocating computational and experimental resources in the search for lead-free perovskite photovoltaics.
Please wait while we load your content...