Data-driven approach towards identifying dyesensitizer molecules for higher power conversion efficiency in solar cells†
Abstract
Machine learning (ML) research based on the quantitative structure–property relationship (QSPR) has been applied for the development of highly efficient dye-sensitized solar cells (DSSCs). This study brings forward a robust method for interpreting the QSPR model of 1448 dye molecules by combining three different properties, namely structural, quantum and experimental, in identifying the power conversion efficiency (PCE) of DSSCs via machine learning (ML) and computational methods. The features used for building the ML models to estimate PCE were extracted from PaDEL (structural properties), density functional theory (DFT)/time-dependent DFT (TD-DFT) (quantum properties) and literature/database (experimental properties). The descriptors with the most influence towards predicting the PCE were selected for developing various ML models based on linear regression, sequential minimal optimization (SMO) regression, random forest and multilayer perception neural networks. Random forest emerged as the best model with a prediction accuracy of 95.31% and a root mean squared error (RMSE) of 0.802. The reliability of the models was validated through 10-fold cross-validation. The developed ML model gives us insight into various descriptors having dominant contributions towards PCE, which has been used to propose novel dye molecules for DSSCs with improved efficiency. Interestingly, 75% of the designed molecules showed an improvement in PCE when compared to the parent molecules, which clearly indicates that such a data-driven approach can be used to design novel molecules with improved energy efficiency.