Predicting electronic properties of molecules: a stacking ensemble model for HOMO and LUMO energy estimation
Abstract
The energies of the Highest Occupied Molecular Orbital (HOMO) and Lowest Unoccupied Molecular Orbital (LUMO) are important determinants of molecular reactivity and stability. Traditional quantum chemical (QC) methods for calculating the HOMO and LUMO energies face drawbacks, including their costliness and long computational time. Therefore, machine learning is well-positioned to act as a catalyst for QC predictions. In this study, we report a stacking ensemble model named HLP-Stack (HOMO–LUMO predictor via stacking) to predict the energy values using molecular descriptors and the QM9 dataset. The stacking achieved robust predictive performance superior to any single model by combining 2D/3D descriptors of the QM9 dataset. It achieved high predictive performance on the test set (R2 ≈ 9.999 × 10−1, RMSE ≈ 3.219 × 10−4 Hartree (Eh) for HOMO; R2 ≈ 9.999 × 10−1, RMSE ≈ 1.903 × 10−4 Eh for LUMO), outperforming individual baseline models. Feature selection using the SelectKBest algorithm with mutual information regression identified the most influential descriptors. To ensure these descriptors did not trivially encode HOMO or LUMO energies, we performed correlation analysis between each descriptor and the target properties. SHAP Tree Explainer analysis further revealed the feature contribution of each feature to model predictions. In addition, analysis of molecular topology and functional groups highlighted trends in aromaticity and ring structures, and their impact on electronic behavior. Finally, HOMO–LUMO gap analysis demonstrated how molecular structure and functionalization affect electronic properties.

Please wait while we load your content...