Exploring the degree of long-range order/disorder in indaceno-based photovoltaic small molecules using data-driven machine learning analysis†
Abstract
Long-range order and disorder in small molecules significantly impact their physical and chemical properties, affecting their performance in photovoltaic devices. For the current study, a data-driven machine learning (ML) approach has been applied to explore the relationship between molecular structure and crystallinity in 480 indaceno-based small molecules. Three ML models, including support vector machines and random forest models, were trained to predict crystal propensity. A heatmap analysis revealed that 72.71% of the small molecules exhibit crystalline behavior, while the remaining 27.29% are non-crystalline. ML models achieved near-perfect accuracy (AUC : SVM-RBF = 0.999, RF = 0.998; MSE : RF = 0.00, SVM-RBF = 0.01). The predicted crystal propensity values showed high accuracy, with a mean squared error ranging from 0.0–0.64. Feature importance analysis using SHAP values identified Chi0v, kappa1, Chi1n, and NumRotatableBonds as the most contributing factors to crystal propensity. The synthetic accessibility score of the small molecules ranged from 0.02 to 0.12, providing insights for designing and optimizing indaceno-based small molecules with tailored crystallinity and photovoltaic properties. This study demonstrates the potential of ML approaches in guiding the development of high-performance small molecules for solar energy applications.