Structure-guided machine learning for efficiency prediction of organic photovoltaics using experimentally informed molecular descriptors
Abstract
The efficiency of organic photovoltaics was estimated using a machine learning (ML) approach. We used the organic photovoltaics database built in-house by the Korea Research Institute of Chemical Technology. The dataset comprises reliable and representative experimental results for 1010 ternary organic solar cells (D1 : D2 : A), obtained through repeated measurements. The data included 67 donors and 24 non-fullerene acceptors, device structures, donor/acceptor structures, donor-to-acceptor ratios, active-layer thicknesses, experimental conditions, and local symmetry. We fragmented the donors and acceptors using a self-developed method. A dataset was created by generating descriptors of the fragmented molecules and used to train various ML algorithms, including random forest, XGBoost, LightGBM, support vector regression, and multilayer perceptron. Model performance was evaluated using the coefficient of determination (R2). XGBoost showed the highest R2 of 0.849. The contributions of key features were interpreted using SHAP analysis. This paper presents an ML framework that combines molecular fragmentation and data-driven modeling.

Please wait while we load your content...