High throughput molecular design of electron donors and non-fullerene acceptors using machine learning combined with substructure importance†
Abstract
The electron donor and acceptor materials in the active layer critically influence organic solar cell (OSC) performance. However, traditional experimental methods for discovering high-performance materials are often time-consuming, costly and inefficient. Herein, to address this challenge, we established a database containing 547 donor–acceptor pairs in OSCs. Each molecule in the database was represented using Morgan and MACCS fingerprints. The machine learning random forest (RF) model was employed, with hyperparameters optimized through grid search, to develop a predictive model for power conversion efficiency (PCE). To gain insights into the relationship between PCE and molecular substructures of both donors and non-fullerene acceptors, SHapley Additive exPlanations (SHAP) analysis was performed based on MACCS fingerprints. The top five important MACCS fingerprints were figured out for donor and non-fullerene acceptor molecules that positively correlate with PCE. The donor and non-fullerene acceptor molecules in the constructed database were cut into molecular substructures for enriching the chemical space of efficient molecular design. The important donor substructures, acceptor substructures and π substructures were screened and selected to design donors (D–π–A–π type) and non-fullerene acceptor (A–π–D–π–A and A–D–A types) molecules, generated 4914 donor and 701 800 acceptor molecules. Correspondingly, 3 448 645 200 donor–acceptor pairs were obtained. The PCE of newly designed donor–acceptor pairs were predicted using the optimized RF model. The 14 296 new donor–acceptor pairs were identified with the predicted PCE exceeding 14.00%. Among them, 123 pairs exhibited a PCE greater than 15.50%, with the highest predicted PCE of 15.91%. This method enables the efficient molecular design of a large number of potential OSC materials.