The generation of a database of promising short-circuit current organic solar cell hole transport layers with machine learning and density functional theory
Abstract
In contrast to traditional quantitative structure–property relationship (QSPR) studies that primarily screen existing libraries, this work integrates machine learning (ML) with generative design to discover novel hole transport layers (HTLs) for organic solar cells (OSCs). We have developed an ML model to predict the short circuit current (JSC) of benzodithiophene-based molecules, which achieves a high R2 of 0.803 with the XtraTrees regressor. Beyond prediction, we used the breaking retrosynthetically interesting chemical species (BRICS) algorithm to generate a de novo library of 9278 novel molecules, significantly expanding the chemical space beyond the initial training set of 515 experimentally validated compounds. To provide deeper chemical insight than standard automated pipelines, we calculated the structure–activity landscape index (SALI), which revealed significant “activity cliffs” and correlations between molecular diversity and JSC values, ranging from 0.005 to 25.93 mA cm−2. Advanced clustering using t-distributed stochastic neighbor embedding (t-SNE) and k-means identified high-performing candidates with low SALI scores and distinct structural features. This study demonstrates the synergistic integration of ML, density functional theory (DFT), and generative algorithms in accelerating target discovery of efficient HTLs. The top candidates were validated by DFT, which confirmed their suitable electronic properties and revealed that extended π-conjugation and favorable charge distribution underlie their high predicted performance. This integrated pipeline – where ML guides the exploration and DFT provides validation and mechanistic insight – accelerates the targeted discovery of efficient HTLs.

Please wait while we load your content...