Xabier
Rodríguez-Martínez
a,
Enrique
Pascual-San-José
a,
Zhuping
Fei
b,
Martin
Heeney
b,
Roger
Guimerà
cd and
Mariano
Campoy-Quiles
*a
aInstitut de Ciència de Materials de Barcelona, ICMAB-CSIC, Campus UAB, 08193, Bellaterra, Spain. E-mail: m.campoy@csic.es
bDepartment of Chemistry and Centre for Plastic Electronics, White City Campus, Imperial College London, London W12 0BZ, UK
cInstitució Catalana de Recerca i Estudis Avançats, ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain
dDepartment of Chemical Engineering, Universitat Rovira i Virgili, 43007, Tarragona, Spain
First published on 7th January 2021
The continuous development of improved non-fullerene acceptors and deeper knowledge of the fundamental mechanisms governing performance underpin the vertiginous increase in efficiency witnessed by organic photovoltaics. While the influence of parameters like film thickness and morphology are generally understood, what determines the strong dependence of the photocurrent on the donor and acceptor fractions remains elusive. Here we approach this problem by training artificial intelligence algorithms with self-consistent datasets consisting of thousands of data points obtained by high-throughput evaluation methods. Two ensemble learning methods are implemented, namely a Bayesian machine scientist and a random decision forest. While the former demonstrates large descriptive power to complement the experimental high-throughput screening, the latter is found to predict with excellent accuracy the photocurrent–composition phase space for material systems outside the training set. Interestingly, we identify highly predictive models that only employ the materials band gaps, thus largely simplifying the rationale of the photocurrent–composition space.
Broader contextMulticomponent systems underline the significant performance improvements recently witnessed in many energy fields, from electrodes in batteries, to multication perovskites for photovoltaics and to high-ZT thermoelectric composites. Predicting the specific composition that would result in optimum performance is, however, one of the greatest unresolved problems in materials science. This is, in part, due to the fact that performance maximization is a complex multiparametric problem. Here we show that combining high-throughput experimental data with artificial intelligence (AI) algorithms enables unprecedented predicting capability. Specifically, we applied our methodology to the case of organic photovoltaics (OPV), since active layer composition is the parameter that affects more strongly OPV efficiency (e.g. 1:0 or 0:1 compositions result in zero efficiency for a blend whose optimum gives 18%). We generate thousands of data points in the performance–composition phase space for 15 different donor:acceptor blends and use the generated datasets to feed AI algorithms. Our work results in the identification of highly accurate and predictive models for the photocurrent–composition dependence unravelling the key material descriptors governing such behaviour (i.e. band gaps and charge mobility imbalance). This study paves the way for the use of AI and high-throughput experimentation to predict optimum composition in energy materials. |
At the molecular level, computational algorithms such as those developed in the on-going Harvard Clean Energy Project (CEP)11 serve to rapidly pre-screen millions of molecular motifs and classify them according to their theoretical OPV outcome, thus motivating their ulterior synthesis. Beyond purely in silico screening, the development of data-driven models in the OPV field has so far been mostly restricted to data mining and training of artificial intelligence (AI) algorithms using intrinsic material descriptors.12 This approach has been applied to make predictions in terms of novel materials13,14 and their corresponding expected PCE,15 as well as to guide researchers in the design of potentially top-performing materials.16 While promising, these calculations have had modest success thus far. This is due, in part, to the lack of sufficiently reproducible data in the literature; also to the difficulty to predict solid-state properties of the blend such as microstructure or gas-to-solid shifts in the optical properties and molecular energy levels; and finally, due to the fact that relevant device information is not considered in the calculations.17
At the device level, one aspect that has been modelled very successfully is the dependence of the performance on the active layer thickness. Device modelling based on transfer matrices has been demonstrated to reproduce accurately the mild oscillations of the photocurrent found experimentally.18,19 Further refinements based on advanced charge transport descriptions and unintentional doping effects have precisely described the thickness-dependent photocurrent.20
Despite acutely affecting the OPV performance,21–25 predicting the optimum D:A ratio has been much more challenging due to the complexity of charge photogeneration and transport through the blend towards the electrodes. For semicrystalline polymers blended with fullerenes, optimum D:A ratios have been rationalized by the binary phase diagram.21 In particular, slightly hypoeutectic concentrations with respect to the polymer loading were found to lead to a good compromise between charge generation and appropriate percolating pathways for charges to reach the electrodes. Also the balance of charge carrier mobilities between electrons and holes has often been considered a key feature determining the shape of the photocurrent–composition curve26 (hereafter referred to as Jsc–vol%). The current OPV paradigm led by NFAs as excellent light harvesters adds another ingredient to the Jsc–vol% dependence compared to fullerene-based devices since photocurrent generation is now fully distributed between both materials. Given the intricate optoelectronic trade-off that sets the location of the optimum D:A ratio, novel experimental approaches and data-driven predictive models are required to enhance the current understanding of the Jsc–vol% dependence in binary OPV blends.
In this work, we adopt a synergic combination of experimental high-throughput screening and AI to study the relationship between the photocurrent generation and the active layer parameters (i.e. thickness and D:A ratio) in binary OPV devices. The experimental exploration is performed by processing orthogonal parametric gradients or libraries, which in combination with local probing techniques (namely Raman spectroscopy and photocurrent imaging) serve to assess the corresponding photocurrent phase space diagrams with minimal effort.27–32 The exploration results in a plethora of possible Jsc–vol% dependences: from strongly skewed bell shapes to bimodal distributions. Then, in an attempt to rationalize these complex relationships, we implement two different AI algorithms that take as input a series of intrinsic optoelectronic material descriptors. The first algorithm is a Bayesian machine scientist,33 which is found to complement the high-throughput experimental screening due to its large descriptive power while providing an analytical equation to describe the intricate Jsc–vol% phase spaces. Second, we use a random forest (RF) algorithm as a predictive model for the normalized Jsc–vol% dependences retrieving a mean absolute error (MAE) below 0.20 in untrained OPV binaries. In the RF models, we find that descriptors related with the alignment of the frontier energy levels and the mobility difference are statistically relevant in shaping the Jsc–vol% space. Finally, feature selection procedures reveal highly predictive models when only the donor and acceptor electronic (or optical) band gaps are employed in the training step. The RF models found herein define the Jsc–vol% curves in both NFA and fullerene-based binary blends with excellent accuracy.
The highly efficient screening process generated large amounts of thickness–composition parametric combinations per D:A pair in the corresponding Jsc space (ca. 24000 data points). This was then employed in the training and validation of AI algorithms together with fundamental optoelectronic material descriptors such as highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels, absorption coefficients (see Section III in the ESI†) and charge carrier mobilities. As detailed in Section IV in the ESI,† the descriptors extracted from literature can be selected following distinct criteria. With these data, we first validated the ability of the AI models to reconstruct the complete Jsc–vol% diagram and then predict the corresponding dependence for material combinations out of the original training set. This approach is exploited to determine the optimum active layer thickness and composition in terms of photocurrent for any D:A pair (with known input descriptors).
Fig. 2 (a and b) Normalized distribution of the short-circuit current density obtained in discrete devices as a function of the D:A weight ratio in PTB7-Th:ITIC and PBDB-T:ITIC-C2C6 blends. (c and d) Normalized photocurrent dispersion obtained following the high-throughput optimization approach for the same blends. Green dashed lines are polynomial fits of the high-performing envelope of photocurrent values. In each case we indicate the number of devices needed to generate these plots. The reader is referred to Section I in the ESI† for further details on device manufacture. |
Notably, the high-throughput methodology reproduces the unimodal and bimodal photocurrent dispersions observed as a function of the donor loading in both binaries (Fig. 2c and d). Importantly, it does so with very large statistics, strongly reducing the uncertainty with respect to the actual shape of the curve. In this approach, the vertical Jsc dispersion originates from both active layer thickness and morphology variations, the latter being a consequence of the in situ mixing of the pristine inks in the blade reservoir. By quantifying the Raman blue-shifting of the corresponding D and A vibrational fingerprints, we have verified that devices containing compositional gradients can reproduce the degree of mixing attained when casting fully premixed solutions. Moreover, they embrace a richer catalogue of film morphologies than conventional methodologies expanding the available data to correlate microstructure and device performance (see Section II in the ESI†). Note that the high-throughput screening requires a significantly reduced amount of experimental time and resources: hundreds of discrete devices (here 137 and 176 devices in PTB7-Th:ITIC and PBDB-T:ITIC-C2C6, respectively) versus a single combinatorial device per binary.
The high-throughput methodology has, however, the caveat of measuring Jsc only rather than the full set of solar cell parameters (open-circuit voltage (Voc), fill factor (FF) and PCE). Nevertheless, since Voc remains fairly constant, the PCE correlates well with Jsc for the range of compositions of interest despite the variations observed in FF (see Section II in the ESI†), which is in excellent agreement with the overall trends depicted from data-mining studies (i.e. Jsc is the best proxy for PCE).17 Interestingly, we have determined that parameters related with morphology such as Raman (blue-)shifting and photoluminescence (PL) quenching and shifting are closely related with microstructure and FF, thus opening up the possibility to take into account their effect in the high-throughput determination of the D:A ratio that maximizes the overall PCE (see Section II in the ESI†). On the other hand, the use of non-standardized white light for the acquisition of the Jsc data (see Section I in the ESI†) and the intrinsic uncertainty of the Raman-based determination of composition34 can explain the small differences observed in the photocurrent distributions when comparing both optimization protocols; more specifically, the ca. 15 wt% offset in the donor content that maximizes the photocurrent in PTB7-Th:ITIC devices. Nonetheless, these drawbacks are significantly outweighed by the rapid attainment of large experimental datasets, which serve as ideal seeds for training AI algorithms. Furthermore, the experimental approach demonstrates high reproducibility from batch-to-batch (see Section II in the ESI†).
Our first implementation of AI is a Bayesian machine scientist,33 which includes the dimensionless descriptors as inputs to develop analytical models that explore the Jsc–vol% dependence (see Section V in the ESI†). For any given set of scattered data, the Bayesian machine scientist identifies the plausible and simplest mathematical models that describe the observed trends. We applied this methodology to two families of OPV binaries (Fig. 3) with PTB7-Th and PBDB-T as donors, individually blended with four ITIC-based acceptors showing either distinct end groups (ITIC-M) or side-chains (ITIC-C8 and ITIC-C2C6). We note that, experimentally, small but non-zero photocurrent has been measured for some pristine NFAs. A full study of this goes beyond the scope of this manuscript, but it is worth noting it, even when the corresponding solar cells have comparatively low overall efficiencies.
According to the solid curves in Fig. 3, which delimit the Jsc–vol% space encountered at different active layer thickness values, the Bayesian machine scientist reproduces well the highest-performing experimental trends. The actual model equation is provided in Section V in the ESI.† While we would not rationalize any physical meaning, we use it to evaluate the parameter space. Importantly, regarding the photocurrent phase space, the modeling indicates that (i) PTB7-Th binaries are characterized by sharp and unbalanced compositional optimum peaks; (ii) PBDB-T blends are more tolerant to compositional fluctuations and their maxima are more balanced in D:A ratio; (iii) binaries containing ITIC and ITIC-M show limited thickness dependence; (iv) ITIC-C8 and ITIC-C2C6 blends are very sensitive to active layer thickness variations; and (v) the bimodal distribution is more or less pronounced depending on the actual thickness range. Despite the great descriptive power of the machine scientist in completing the exploration of the complex photocurrent phase space, it has some limitations arising from its computational complexity and the size of the training dataset, including: (i) month-scale times needed for training; (ii) poor predictive capability out of the training materials dataset due to the unfeasibility of sampling models for long enough time; and (iii) an uninformative utilization of the features, which makes it impossible to determine which of them are really important.
Therefore, we tested alternative ML approaches such as the random forest (RF) algorithm to improve the predictive capability of the AI models. The RF ensemble is initially trained using the same OPV binaries previously explored by the Bayesian machine scientist, which are highlighted with green frames in Fig. 4. The validation (testing) datasets are highlighted in blue while the purely predictive scenarios are framed in magenta color. Thus, Fig. 4 accordingly depicts a combinatorial matrix of the scattered Jsc–vol% dependences obtained in distinct D:A pairs following the high-throughput experimentation approach, as well as the RF model predictions (dashed lines) at different active layer thickness values (colored from grey to black). Therein, we include organic semiconductors out of the pristine training material set such as PCDTBT as donor polymer and two additional acceptors, namely a fluorinated ITIC derivative (ITIC-4F) and the workhorse fullerene, PC70BM.
As part of the RF model validation process, we first perform a leave-one-out cross-validation (LOO-cv) of the RF ensemble including the 8 training datasets (green frames in Fig. 4), as detailed in Section VI in the ESI.† Based on the extrapolation reliability found (ca. 65% of success rate), we further validated the RF model by comparing the predicted trends with the experimental results obtained in D:A pairs out of the training set selection, i.e. binaries for which either one or the two materials have not been used within the training step (blue frames in Fig. 4). This trait is precisely the main feature desired for highly predictive models.
Our results indicate that the RF model extrapolates very well (MAE < 0.20) in all validation binaries explored, both the position of the Jsc maximum and its modulation in the composition (and thickness) diagram. The results obtained are equally consistent when validating a larger combinatorial matrix including data for high performing donor polymers such as PBDB-T-2Cl (PM7) and PBDB-T-2F (PM6), see Fig. S23 in the ESI.† It is worth highlighting that regardless of the molecular nature of the materials blended, the only model inputs for the extrapolation are the corresponding optoelectronic descriptors used in the training of the RF algorithm. In this particular case, we hand-picked the HOMO/LUMO energy levels reported from cyclic voltammetry (CV) measurements only as well as the corresponding mobilities from the same references (whenever possible), as detailed in Section IV in the ESI.†
AI models, such as the RF ensemble, also provide the so-called feature importance (FI), a magnitude that serves to identify and rank quantitatively those characteristics that mostly govern the experimental observables, i.e. the Jsc–vol% dependence. Accordingly, we first perform subsequent FI analysis using three distinct selections of optoelectronic descriptors. These differ on how the actual values are picked from the literature database (>80 references accessed): either randomly, by a consistent manual selection or calculated from the statistical medians of the scattered data. This analysis is performed to evaluate the sensitivity of the model against the consistency of the input descriptors. The accumulated analysis of the FI in each model (Fig. 5) indicates that parameters related to the HOMO/LUMO energy level alignment, such as CTe, CTh or Egap,d–a, as well as those related to the mobility of the blended species (Δμ and μimb) are, statistically, the most important descriptors in defining the Jsc–vol% dependence. These findings are in good agreement with the current understanding of the performance–composition space in OPV, as the existence of unbalanced charge carrier mobility has been considered one of the key features that influences the Jsc–vol% diagram in binary blends.26 Nevertheless, we observe that the actual selection of values for the descriptors has a large effect on the FI distribution, thus highlighting the requirement for great consistency among the experimental data selected or measured. In this regard, feature selection approaches might help in identifying those combinations of descriptors that return more robust models against experimental noise.
Fig. 5 Accumulated feature importance (FI) depending on the choice of descriptor values. Since the HOMO and LUMO energy levels as well as the charge carrier mobilities are taken from the literature, we explore the effect that the actual value of the descriptors has on the FI drawn by the RF ensemble. We generally observe that descriptors related with energy level alignment (CTe, CTh, Egap,d–a) as well as those related with the mobility imbalance (Δμ = |μd − μa|, μimb = μa/μd) show the highest accumulated FI. The reader is referred to Section IV in the ESI,† for a detailed mathematical definition of the descriptors employed. |
In particular, by performing a greedy MAE feature selection procedure we identify several two-parameter combinations that yield highly accurate RF models, even showing in some cases lower MAE than those models trained with a larger list of descriptors (see Section VI in the ESI†). Among them, we would like to highlight the pair formed by Egap,d and Egap,a, whose LOO-cv in 15 distinct D:A binaries is depicted in Fig. 6. This model, based on Egaps, is remarkably robust against experimental fluctuations and it extrapolates moderately well in some unseen blends, including all-polymer binaries (see Section VI in the ESI†). Moreover, successful model equations are drawn by the Bayesian machine scientist when employing these two descriptors only (see Section V in the ESI†). Finally, we observe that model training using the consistently extracted solid-state optical band gaps from Tauc plots results as well in successfully validated RF models (see Section VI in the ESI†). Hence, Egap,d and Egap,a (either electronic or optical) unify the main learning characteristics previously found by the 23-parameter model yet in a more physically intuitive approach and providing comparable predictive accuracy. Nevertheless, model predictions in workhorse D:A pairs such as P3HT:PC60BM are not successful, which we believe is a consequence of the limited extension of the training datasets employed and the absence of highly semi-crystalline donor polymers in our training dataset.
In spite of such limitations, we believe that the simplicity and accuracy of this two-parameter model are powerful for several reasons: (i) for theoretical material screening, as Egap is a byproduct of density functional theory (DFT) calculations; (ii) for combinatorial material synthesis, as organic semiconductors are usually subjected to CV to quantify the HOMO–LUMO energy levels as part of a routine set of electrochemical characterizations; and (iii) because Egap is a magnitude sufficiently unrelated to processing. We further note that similar predictive accuracy can be obtained by using easily measured solid-state optical band gaps in the model training.
These features are especially advantageous when dealing with small batches of novel materials. In this case, RF models may help researchers to tailor more effectively the optimal device features (i.e. active layer thickness and composition) and explore de facto the full photovoltaic potential of the new molecular species. The here employed training dataset initially formed by 15 D:A binaries is in constant growth; therefore, the conclusions extracted by the RF model will be progressively refined blend after blend. For this reason, we make accessible our combinatorial screening database in a public CSIC repository (http://hdl.handle.net/10261/223231), which is open to contributions from any researchers as part of a joint OPV materials screening project.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0ee02958k |
This journal is © The Royal Society of Chemistry 2021 |