Na Gyeong An*abc,
Leonard Wei Tat Ng
d,
Yang Liue,
Seyeong Songcf,
Mei Gao
a,
Yinhua Zhou
e,
Chang-Qi Ma
g,
Zhixiang Wei
h,
Jin Young Kim
*c,
Udo Bach
*b and
Doojin Vak
*a
aCommonwealth Scientific and Industrial Research Organisation (CSIRO) Manufacturing, Clayton, Victoria 3168, Australia. E-mail: asy0720@unist.ac.kr; Doojin.Vak@csiro.au
bDepartment of Chemical and Biological Engineering, Monash University, Clayton, Victoria 3800, Australia. E-mail: Udo.Bach@monash.edu
cGraduate School of Carbon Neutrality, School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, South Korea. E-mail: jykim@unist.ac.kr
dSchool of Materials Science and Engineering (MSE), Nanyang Technological University (NTU), 50 Nanyang Avenue, Singapore 639798, Singapore
eWuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
fResearch Institute of Molecular Alchemy, Gyeongsang National University, Jinju 52828, Republic of Korea
gi-Lab & Printable Electronics Research Center, Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences (CAS), Suzhou, 215123, China
hChinese Academy of Sciences (CAS) Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology, Beijing, 100190, China
First published on 15th September 2025
High-throughput experimentation (HTE) combined with machine learning (ML) has emerged as a powerful tool to accelerate material discovery or optimize fabrication processes. However, in the photovoltaics field, only a few studies have successfully applied this approach using industrially relevant techniques, such as the roll-to-roll (R2R) process. We developed a universal and extendable data structure for ML training that accommodates upcoming materials, while retaining compatibility with the existing dataset. Using the MicroFactory platform, which enables mass-customization of organic photovoltaics (OPVs), we fabricated and characterized over 26000 unique cells within four days. To guide the selection of the ML model for precisely predicting device behavior, photovoltaic parameter and J–V prediction models to forecast device parameters and J–V curves, respectively, were developed. The Random Forest model proved to be the most effective, achieving a PCE of 11.8% (0.025 cm2)—the highest for a fully-R2R-fabricated OPV. By integrating accumulated datasets with smaller new-component datasets, we enhanced model performance for PM6:Y6:IT-4F and PM6:D18:L8-BO systems, showing that models trained on binary systems can predict ternary device performance and enabling the development of generalized ML models for future high-performance materials.
Broader contextCombining high-throughput experimentation with machine learning (ML) accelerates material discovery or optimizes fabrication processes through data-driven decision-making. Previous studies in photovoltaics have explored ML-driven research using roll-to-roll (R2R) technology, but no systematic approach has guided the selection of ML models to precisely predict device behavior. The next critical step is to forecast device properties of new materials using insights from existing ones, which requires an expandable data structure. Here, we present a universal and expandable data structure designed to enhance ML models and develop models that enable accurate prediction of photovoltaic parameters and J–V characteristics. Using the MicroFactory platform, we fabricated over 26![]() |
High-throughput experimentation (HTE) has emerged as a promising avenue to address the limitations of conventional labor-intensive experimentation. It allows for the rapid screening of large combinational parametric libraries with fast characterization tools, thereby reducing labor, time and resource requirements. By combining HTE with machine learning (ML), vast datasets can be efficiently processed, accelerating material discovery or optimizing fabrication processes through data-driven decision-making.12 However, many ML-driven studies in photovoltaics have relied on fabrication techniques incompatible with large-scale and mass production, such as spin coating.13–18 It is difficult to utilize ML models trained with data from industry-incompatible fabrication techniques for a lab-to-fab transition. This is because such fabrication techniques require the redevelopment of processing conditions specific to scalable manufacturing methods.19 Furthermore, high-efficiency materials optimized under small-scale laboratory conditions often fail to perform reliably in large-scale processing, emphasizing the need to develop materials tailored for scalable manufacturing rather than modifying existing processes to accommodate lab-optimized materials.20,21 While some pioneering work has successfully demonstrated ML-driven research with roll-to-roll (R2R) technology, an industrially relevant method,22–24 no systematic study has been conducted to give guidance for determining ML models for accurately forecasting device behavior. Furthermore, the next critical step is to predict the device behavior of new materials, leveraging insights from existing materials. This requires an expandable data structure for ML training to be able to accommodate new high-performance materials to be developed in the future, a critical gap that is yet to be addressed.
Here, we present a universal and expandable data structure designed to train ML models that continuously improve as more data are accumulated, regardless of the OPV material components. Through feature engineering, the training features are crafted to accommodate new materials, while maintaining compatibility with the existing dataset, enabling ML models to develop a comprehensive knowledge of OPV materials. Using the MicroFactory platform that mimics industrial processes,24,25 we applied R2R slot-die coating in a desktop machine to develop a manufacturing process with minimum material usage and automated in situ formulation. This approach allowed for mass-customization of OPVs by continuously altering fabrication parameters and formulations, while a R2R tester facilitated testing of the completed devices, leading to the production of over 26000 unique OPV cells.
The fabrication parameters and device outcomes were digitized and used to train ML models. Using this unprecedented amount of OPV data, we developed not only photovoltaic parameter (PP) prediction models that forecast device parameters, but also J–V prediction models creating J–V curves of untested devices. These models guided the selection of ML models through systematic evaluation by screening various algorithms, followed by hyperparameter optimization for precisely predicting device behavior. The Random Forest (RF) model proved to be the most effective. Using the RF-predicted formulation, we fabricated a device achieving a PCE of 11.8%, a record-breaking efficiency for fully-R2R-fabricated OPVs. Furthermore, we demonstrated that combining large, accumulated datasets with smaller datasets from different material systems improves ML model performance, showing that models trained on binary systems can predict ternary performance and supporting the creation of generalized models capable of forecasting device behavior for new materials.
![]() | ||
Fig. 1 Materials, energy diagrams and structure of R2R OPV device and illustration of MicroFactory platform. (a) Chemical structures of materials used in this study. (b) Illustration of detailed device structure. (c) Energy diagram of fully-R2R-fabricated OPVs. (d) A 3D schematic of the MicroFactory platform featuring a custom-built automatic R2R coater and tester, along with a database that stores all collected data. The inset image in Fig. 1d illustrates the fabrication of an active layer with a gradient composition, represented by a rainbow-colored film, in a single deposition process. By employing programmable syringe pumps, two solution flows were controlled linearly, and the solutions were mixed in situ and deposited onto a continuously moving substrate. Nine deposition parameters were collected and stored in a database during deposition. Subsequently, the devices were automatically characterized using the R2R tester until a specified number of measurements had been made. The data collected from three coating runs and one test run were integrated into one consolidated dataset based on the position of the roll (device position). |
With this optimized device configuration and the advanced capabilities of the MicroFactory platform, we fabricated over 26000 OPV cells across two experimental batches. These experiments systematically explored fabrication parameters, with each batch divided into multiple sub-batches to alter the donor
:
acceptor (D
:
A) ratio or film thickness. We previously reported a dual-feed deposition method to formulate solutions in situ and a way to digitize active layer composition by introducing deposition density (DD) and total deposition density (TDD).23 DD is an absolute quantity of a component per unit area (in μg cm−2), and TDD is the sum of the DDs of all materials and shows a strong correlation with film thickness. Due to the lossless nature of slot-die coating, DD can be derived from deposition parameters and solution compositions. This lossless deposition significantly reduces material consumption compared to traditional spin-coating methods. In our experiments, we used only 10.3 mg of PM6 and 12.2 mg of L8-BO per 1000 devices, corresponding to ∼964 devices per mL of solution for both materials. In contrast, spin coating typically yields only 50–58 devices per mL. This high material efficiency highlights the practicality of our approach for screening new materials, even when available sample quantities are limited.
While our previous study required several months to collect data from about 2000 devices,23 the MicroFactory platform enabled the high-throughput fabrication of over 10000 devices in one day by one researcher, with testing completed the next day. Fabricating and characterizing over 26
000 OPV cells took only four days with the coater and tester operating in parallel. All data were saved online and could be simply combined using Python scripts. Although we paused fabrication for data analysis, this unprecedented fabrication capability demonstrates significant potential for creating big data on OPV manufacturing parameters.
Fig. 2a presents individual PPs, DD of each material and L8-BO/PM6 ratios for 26422 devices based on the device position on a 150-m-long substrate (Fig. 2b). Notably, failed devices were intentionally fabricated as markers between sub-batches and used to verify the combined dataset of fabrication and testing parameters. As it is impossible to display J–V curves for all devices, we showed J–V curves for the four most characteristic devices with the highest PPs in Fig. 2c and Table 1. The best device achieved a power conversion efficiency (PCE) of 11.6% at a D
:
A ratio of 1
:
1.13 (Fig. S15, SI), similar to the optimal formulation found in a previous study.27 Although this PCE is already significantly higher than that of the best fully-R2R-fabricated OPVs in the literature (Fig. S16 and Table S9, SI), further improvement might be possible through ML-assisted device optimization. However, the achievable PCE would remain below 14.1%, a boundary determined by the best fill factor (FF), current density (JSC) and open circuit voltage (VOC), due to the inherent trade-offs among those photovoltaic parameters. However, identifying PP trends within a multidimensional fabrication parameter space from such large datasets is impractical for humans, necessitating the use of ML.
Device | JSC (mA cm−2) | VOC (V) | FF | PCE (%) | Rsh (Ω cm2) | Rs (Ω cm2) |
---|---|---|---|---|---|---|
Max JSC | 22.0 | 0.805 | 0.548 | 9.70 | 1130 | 3.01 |
Max VOC | 9.89 | 0.884 | 0.410 | 3.59 | 210 | 11.6 |
Max FF | 17.4 | 0.831 | 0.727 | 10.5 | 996 | 2.00 |
Max PCE | 20.5 | 0.835 | 0.677 | 11.6 | 5004 | 2.58 |
![]() | ||
Fig. 3 Workflow overview in this study, J–V measured and predicted characteristics and predicted PPs of 3 randomly selected devices. (a) Illustration of the comprehensive workflow in this study, comprising automated device fabrication and characterization, data storage of deposition and photovoltaic data in a database, and ML application. (b)–(g) J–V actual and predicted characteristics of 3 randomly selected devices. Predicted PPs, including JSC, VOC, Rsh, and Rs, are denoted by star-shaped markers and thick solid lines, derived from each best PP prediction model. Measured J and predicted J values were obtained from actual characterization and the J–V prediction model, respectively. The MPP candidates were identified using the same method as that described in Fig. 2b, and are represented by star-shaped markers, with larger ones indicating the MPP point closest to the actual J–V curve (MPPclose). Additionally, the predicted JSC, VOC, Rsh and Rs values are represented as star-shaped markers or solid lines, derived from each best PP prediction model. The predicted FFs are depicted as ivory-colored filled rectangles, with J and V values determined from MPPclose. Predictions were generated using RF models for panels (b)–(d), and XGBoost models for panels (e)–(g). |
After preparing and cleaning the data, ML models using various algorithms were trained and compared for a given multidimensional regression problem. Seven bagging and boosting algorithms were initially selected for evaluation. Bagging independently trains multiple weak learners in parallel and averages their predictions, reducing variance and improving stability. In contrast, boosting sequentially builds weak learners, with each new model correcting the errors made by its predecessor.28 These ensemble approaches are generally less prone to overfitting, where a model learns the training data too closely, including noise or fluctuations, resulting in poor generalization to new data. To identify the best-performing models, we optimized hyperparameters for each algorithm using GridSearchCV with 5-fold cross-validation. This approach systematically explores all possible combinations of predefined hyperparameter values and evaluates performance to find the optimal set. Due to the training time required for large numbers of up to two million hyperparameter combinations, we used a small fraction (2.5%) of data for training (see Note S5 for details, SI). Each hyperparameter combination was evaluated based on performance metrics: coefficient of determination (R2), mean absolute percentage error (MAPE), root mean squared error (RMSE)23 and overfitting index (OI, defined by the R2 of the test or validation dataset/R2 of the training dataset, where lower values indicate stronger overfitting, whereas an OI close to 1 implies that the model generalizes well). This evaluation utilized 80% of the trained dataset for model training and the remaining 20% for model validation (Fig. S17, SI). While RMSE is inherently tied to the scale of target variables, making it a scale-dependent metric.29–31 MAPE can become undefined or problematic when actual values are zero or near zero, resulting in infinite or extreme values.30–32 In contrast, R2 offers a scale-independent evaluation,33 quantifying how well the model accounts for the variance in data relative to total variance. Given these considerations, we employed R2 as a universal metric to determine the optimal hyperparameter combination of models generated by each algorithm and to enable cross-scale comparison between different prediction models. The performance metrics for the best model from each algorithm are shown in Fig. S18–S24 and Tables S13–S19 with detailed discussions in Note S6. We found no outstanding model, while the Ada Boost algorithm produced a model with inferior performance to others.
Based on this evaluation, we selected one widely used algorithm from each category for subsequent analysis: Random Forest (RF) for bagging and Extreme Gradient Boosting (XGBoost) for boosting. While optimizing most of the available hyperparameters (Note S7, SI), we discovered that only a few hyperparameters (max leaf nodes, max features and number of estimators for RF, and learning rate, number of estimators and max depth for XGBoost) made a meaningful difference across all PPs. Therefore, we picked the top three most effective hyperparameters and selectively showcased their impact on model performance for predicting JSC and PCE (Fig. S25–S28, SI). This information will serve as a comprehensive guide for future hyperparameter optimization.
The models were then re-optimized with different dataset sizes (2.5%, 20%, and 80%) using only the three most influential hyperparameters. We observed general agreement in the optimal hyperparameters across various data fractions, indicating that hyperparameters optimized with a small dataset do not require re-optimization for larger datasets (Fig. S29–S32 and Tables S20 and S21, SI). The final optimized hyperparameters are shown in Note S8.
Therefore, our models offer three ways to predict PCE: (i) direct prediction from the PCE prediction model, (ii) PCE calculated from the predicted FF, VOC and JSC, and (iii) PCE calculated from the created J–V curves. We found general agreement in PCE from all three approaches, with similar and exceptionally high R2 values (Fig. 3 and Fig. S33 and Table 2 and Table S22, SI). This high model performance is attributed to a consistently produced training dataset rather than collection from multiple sources and optimized hyperparameters. Although any of these three approaches can predict PCE, the J–V-based approach provides comprehensive device insights while demanding more computing resources. However, once hyperparameters are optimized, training each model with 1.8 million datasets of 10 input parameters takes only 15 min for RF and 7 s for XGBoost on a desktop computer, suggesting that computing resources will soon no longer be a limitation.
Model | Dataset | PP | RJV2 | MAPEJV | RMSEJV | RPP2 | MAPEPP | RMSEPP |
---|---|---|---|---|---|---|---|---|
a Metrics within the parentheses are derived from Rsh values belonging to the bottom 30% (<366 Ω cm2).b Metrics within the parentheses are derived after excluding outliers (about 3% of entire dataset). | ||||||||
RF | Test | JSC | 0.980 | 13.6 | 0.720 | 0.982 | 21.0 | 0.683 |
XGBoost | 0.979 | 248 | 0.737 | 0.984 | 21.1 | 0.650 | ||
RF | VOC | 0.953 | 1.46 | 0.037 | 0.964 | 1.38 | 0.032 | |
XGBoost | 0.893 | 2.04 | 0.056 | 0.965 | 1.46 | 0.032 | ||
RF | FF | 0.925 | 3.89 | 4.37 | 0.957 | 3.24 | 3.30 | |
XGBoost | 0.866 | 6.48 | 5.84 | 0.961 | 3.20 | 3.17 | ||
RF | PCE | 0.949 | 6.82 | 0.681 | 0.958 | 6.83 | 0.613 | |
XGBoost | 0.932 | 9.28 | 0.784 | 0.962 | 6.77 | 0.589 | ||
RF | Rsh | −7.26 | 98.7 | 13![]() |
−0.225 (0.839)a | 75.5 (15.6)a | 5146 (41.5)a | |
XGBoost | −9.11 | 528 | 14![]() |
−2.65 (0.800)a | 98.5 (16.4)a | 8879 (46.3)a | ||
RF | Rs | −0.001 | 10.4 | 17![]() |
0.028 (0.886)b | 3816 (6.80)b | 17![]() |
|
XGBoost | −0.001 | 15.4 | 17![]() |
0.028 (0.885)b | 3751 (7.04)b | 17![]() |
Despite XGBoost's advantage over RF in training time due to parallelized computing, it performed poorly in the J–V prediction model, as observed in 3 and 15 additional randomly selected devices (Fig. 3 and Fig. S34, SI). Although it still achieves comparable model evaluation metrics to the RF model (Fig. S35 and Table S23, SI), these metrics fail to capture localized fluctuations in the predicted J–V curves, which become evident only when the curves are plotted and compared against experimental data. To further understand model behavior, we calculated feature importance for both RF and XGBoost J–V prediction models, where higher values indicate greater influence on the model predictions. Among all features, V showed the highest importance for both RF and XGBoost models. However, as J and V are directly correlated in J–V prediction, we excluded V from subsequent analyses to better highlight the relative importance of the remaining features (Fig. S36, SI). Our analysis revealed that ‘Acceptor Frac’ was the most influential feature for both models, while TDD showed much lower contributions. The lower importance of ‘Donor Frac’ is expected because it is linearly dependent on ‘Acceptor Frac’ (Donor Frac = 1 − Acceptor Frac), meaning that the model relies primarily on one of the two to avoid redundancy. Importantly, this does not imply that donor content is less significant for device behavior, but rather that its contribution has already been captured through ‘Acceptor Frac’. These results suggest that the D:
A ratio plays a more critical role than active layer thickness in predicting J–V characteristics. Taken together, these results indicate that while RF and XGBoost identified similar trends in feature importance, RF provided more stable and consistent predictions across datasets. Therefore, RF-based J–V prediction models were chosen for photovoltaic optimization.
We observed poor performances in resistance prediction models. For the Rs model, this was attributed to a small number of extreme outliers, possibly from measurement error (poor contact during the measurement). After excluding about 3% of outliers, the model showed good performance with an R2 score of ∼0.88. The Rsh model underperformed because the slope of the J–V curves at 0 V for good devices is close to zero, making the inverse value highly sensitive to measurement noise. Despite this, we found that the model proved useful in predicting defective devices (Rsh < 336 Ω cm2, the bottom 30% in this dataset) with an R2 score of 0.80.
To address this, we split all possible parameter combinations using coarse data resolution (2 μL min−1 or μg cm−2) into clusters and examined the optimum condition for each cluster. Using K-means clustering, an unsupervised ML algorithm, we created ten clusters (Fig. S37 and Table S24, SI). While the clusters of evenly spaced datasets differ from the conventional meaning of clusters, this provides a simple solution to grouping multidimensional data. Subsequently, we generated all possible combinations with finer data resolution (0.5 μL min−1 or μg cm−2, 6561 parameter combinations for each cluster) around each cluster's top formulation to refine them further. Finally, we compared the Euclidean distance (ED) (eqn (1)) of each cluster's top formulation from the global-best formulation in cluster 0 (Table S25 and Note S10 for details, SI).
Based on ED and predicted PCE, we fabricated devices using the predicted formulation of the global best and that of the cluster farthest from cluster 0. We found other clusters beyond the selected cluster; however, parameters of those clusters were challenging to fabricate due to excessively high flow rates, which caused overflow issues and slightly lower PCEs. The device results are shown in Fig. 4 and summarized in Table 3. The global-best formulation achieved up to 11.8% PCE, significantly higher than the best PCE for fully-R2R-fabricated OPVs in the literature. Previously, the highest reported PCE was 5.6%, as summarized in Fig. S16 and Table S9. Only recently, we reported 9.35% using this high-throughput setup,24 and this work marks another leap in the record PCE for R2R-fabricated OPVs. This clearly demonstrates how laboratory innovation and digital technologies such as ML can accelerate OPV progress and potentially other printed electronics.
Cluster Number | Prediction | Solvent | JSC (mA cm−2) | Cal. JSC (mA cm−2) | VOC (V) | FF | PCE (%) | Rsh (Ω cm2) | Rs (Ω cm2) |
---|---|---|---|---|---|---|---|---|---|
a 200 devices were fabricated for each formulation. All statistical data were calculated from 200 devices. | |||||||||
0 | PCE | CB | 19.7 | — | 0.834 | 0.632 | 11.3 | 913 | 2.72 |
J–V | 20 | — | 0.829 | 0.679 | 11.3 | 1319 | 2.67 | ||
—a | 20.3 (19.9 ± 0.821) | 19.91 | 0.835 (0.831 ± 0.010) | 0.694 (0.688 ± 0.035) | 11.8 (11.3 ± 0.550) | 874 (2065 ± 6883) | 2.18 (2.14 ± 0.279) | ||
Xyl | 20.0 (19.8 ± 0.674) | 19.77 | 0.824 (0.820 ± 0.004) | 0.678 (0.672 ± 0.025) | 11.2 (10.9 ± 0.195) | 6993 (1885 ± 5139) | 1.77 (1.99 ± 0.184) | ||
1 | PCE | CB | 20.5 | — | 0.810 | 0.625 | 10.3 | 534 | 2.09 |
J–V | 20.7 | — | 0.810 | 0.619 | 10.4 | 1730 | 2.01 | ||
—a | 21.6 (21.3 ± 0.573) | 20.83 | 0.814 (0.804 ± 0.008) | 0.632 (0.564 ± 0.056) | 11.1 (9.67 ± 1.01) | 3894 (813 ± 700) | 2.90 (3.80 ± 0.808) | ||
Xyl | 20.8 (19.7 ± 1.51) | 20.63 | 0.819 (0.816 ± 0.010) | 0.638 (0.590 ± 0.063) | 10.9 (9.51 ± 1.35) | 664 (1210 ± 3153) | 1.72 (1.96 ± 0.447) |
With a view towards the eco-friendly manufacturing of OPV, we tested the parameters using a non-halogenated solvent, o-xylene (Xyl), and found Xyl-based devices performed comparably (Fig. 4 and Table 3). These results show great potential for the commercial application of fabrication parameters found in this work. External quantum efficiency (EQE) spectra were obtained after printing the silver electrode and encapsulating the cells for manual testing (Fig. S37, SI). All four devices show good agreement between JSC measured from the tester and calculated from EQE spectra (Fig. 4e and f).
To prove this hypothesis, we combined the large dataset with smaller datasets, including new components, with one new material-specific feature per component (see Fig. S2 for detailed chemical structure, SI), trained RF-based models, and compared their performance depending on the size of the additional datasets. We first used a dataset created from our previous study on a PM6:Y6:IT-4F ternary blend.23 In that case, we simplified the system and minimized the number of features to visualize all feature-dependent performances in 3D space, manually re-creating features based on experimental parameters. New datasets of a PM6:D18:L8-BO ternary system and a PM6:PYF-T-o all-polymer system were also created by running relatively small batches. For training, we used batches of data rather than individual cells, specifically 294, 467, 1240 and 1990 for the PM6:Y6:IT-4F, 257, 517, 1289 and 2053 for the PM6:D18:L8-BO ternary system and 258, 784, 1538, and 2323 for the PM6:PYF-T-o all-polymer system. The remaining datasets were reserved for testing (for details, see Note S11, SI). The complete datasets and model performance results are available in Data S3 and S4, respectively.
Fig. 5 illustrates the performances of RF-based models trained without and with the accumulated PM6:L8-BO dataset. Despite the PM6:Y6:IT-4F system having only one common material, the models trained with the accumulated data consistently outperformed those without it; in particular, there was an improvement with smaller training datasets. The same trend was observed in the PM6:D18:L8-BO system, which shares two common materials with the accumulated dataset, leading to even more significant improvements. Notably, the model trained with only 10% of the new dataset performed well in predicting the remaining 90%. Interestingly, while integrating the accumulated dataset substantially enhanced predictive accuracy for ternary systems, the improvement was not consistent for the PM6:PYF-T-o all-polymer system (Fig. S39, SI). Specifically, in cases where the training dataset sizes were 784 or 1538, the test R2 values showed a slight decrease after adding the accumulated data. This behavior may arise from intrinsic differences between all-polymer systems and small-molecule NFA-based systems. These findings suggest that polymer acceptors may require a separate model category or an additional independent feature to better capture their distinct characteristics.
Nonetheless, these results highlight the value of digitalizing research parameters and demonstrate how existing knowledge can be applied to new material systems in OPV research. By showing that models trained on binary systems can predict ternary performance, this work illustrates how accumulated, compatible datasets enable the development of ML models with a comprehensive understanding of materials and fabrication parameters. Such models could significantly accelerate OPV commercialization.
Batch 1: active layer solutions of PM6:L8-BO were prepared in CB at 80 °C overnight, with two distinct D:
A ratios of 3
:
1 and 1
:
3 (denoted as donor- and acceptor-rich solutions), maintaining a fixed donor concentration of 10 mg mL−1. These solutions were deposited using two programmable syringe pumps, combined via a Y-connector, and the in situ mixed solution was fed through tubing with an inner diameter of 0.5 mm. 13 sub-batches were conducted throughout the batch. Total solution flows remained constant at 72, 48, 36, or 24 μL min−1 (resulting in WFT of 1.85, 1.23. 0.923, or 0.616 μm) while adjusting the relative flow rates of the two pumps for the first 4 sub-batches. This manipulation varied the D
:
A ratios of the total solution flow, leading to linear increases in TDD during deposition. In contrast, the volumetric ratios of the donor- and acceptor-rich solutions were held constant to achieve fixed D
:
A ratios of 1
:
3, 1
:
2, 1
:
1.5, 1
:
1.2, 1
:
1, 1.2
:
1, 1.5
:
1, 2
:
1, or 3
:
1 in the total solution flow. The total solution flows varied within the range 24–96 μL min−1 (resulting in WFT of 0.616–2.46 μm), leading to linear decreases in TDD during deposition for the remaining 9 sub-batches.
Batch 2: active layer solutions of PM6:L8-BO were prepared at 80 °C overnight consisted of a donor-rich solution (D:
A ratio of 3
:
1 and a donor concentration of 15 mg mL−1) and an acceptor-rich solution (D
:
A ratio of 1
:
3 and a donor concentration of 5 mg mL−1). The deposition process mirrored that for batch 1. Total solution flows were maintained at a constant rate of 120, 90, 60, or 30 μL min−1 (resulting in WFT of 3.08, 2.31, 1.54, or 0.769 μm) while the relative flow rates of the two pumps were altered for the first 4 sub-batches. The D
:
A ratios of the total solution flow were varied while TDD remained constant, corresponding to 63.5, 47.7, 31.8, and 15.9 μg cm−2, respectively. The volumetric ratios of the donor- and acceptor-rich solution were held constant to achieve fixed D
:
A ratios of 1
:
3, 1
:
2, 1
:
1.5, 1
:
1.2, 1
:
1, 1.2
:
1, 1.5
:
1, 2
:
1, or 3
:
1 in the total solution flow. The total solution flows varied within the range 24–120 μL min−1 (resulting in WFT of 0.616–3.08 μm), leading to linear decreases in TDD during deposition for the remaining 9 sub-batches.
All active layers were fabricated at RT for both head and bed temperatures, and the films were exposed to an air-blowing process during deposition. The dead volumes of the tubing from the Y-connector and the slot die were calculated to be ∼33.7 and ∼39.6 μL for batch 1 and batch 2, respectively. This implies that the in situ mixed solution was deposited around 28 and 20 s after mixing at flow rates of 72 and 120 μL min−1. The composition of the film at each position was calculated based on the relative flow rate and the deposition offset originating from the dead volume of the deposition system. BM-HTL-1, diluted with IPA in a 1:
1 ratio (v/v), was used as the HTL material, and concurrently deposited with the S315 layer in the same manner as depicted in the previous section.
![]() | (S1) |
PM6:D18:L8-BO system: the device fabrication followed the procedures outlined in the previous section, with the exception of active layer deposition. The experiment consisted of two independent batches, each with 5 sub-batches. In batch 1, donor-only solution (PM6:
D18 = 4
:
1, total concentration: 20 mg mL−1) and acceptor-only solution (L8-BO concentration: 20 mg mL−1) were prepared. In batch 2, pre-mixed solutions of PM6
:
D18
:
L8-BO (0.5
:
0.5
:
1.2, PM6 or D18 concentration: 5 mg mL−1) and PM6
:
L8-BO (1
:
1.2, PM6 concentration: 10 mg mL−1) were dissolved in CB at 80 °C overnight. During deposition, total solution flows were maintained at constant rates of 120, 96, 72, 48, or 24 μL min−1 (resulting in WFT of 3.08, 2.46, 1.85, 1.23 or 0.615 μm), with alterations in the relative flow rates of the two pumps. The D
:
A ratios (batch 1) or PM6
:
D18 ratios (batch 2) of the total solution flow were varied, while the TDD were either changed (batch 1) or kept constant (batch 2), resulting in TDD of 68.7, 54.9, 41.2, 27.5, or 13.7 μg cm−2, respectively. The deposition occurred under identical environments, and the compositions of the film at each position were calculated using the same methods as in the previous experiments. Dataset preparation involved consistent feature design and data-cleaning procedures as in the previous section. The final dataset comprises 2570 data points, with 10 features and 1 label for PCE prediction, incorporating D18 as the third material, represented by the column ‘D18 DD’ in the dataset. Any inconsistent data were removed prior to analysis.
PM6:PYF-T-o system: device fabrication followed the procedures described previously, except for the active layer deposition. Three independent batches were prepared, each consisting of 5 or 6 sub-batches and 1 sub-batch. In batch 1, donor-only (PM6 concentration, 14 mg mL−1) and acceptor-only (PYF-T-o concentration, 14 mg mL−1) solutions were prepared. In batch 2, pre-mixed solutions of PM6:
PYF-T-o at ratios of 2.5
:
1 and 1
:
2.5 (PM6 concentration: 7 mg mL−1) were prepared. In batch 3, a pre-mixed PM6:PYF-T-o solution (1
:
1.2, w/w) (PM6 concentration: 7 mg mL−1) was prepared along with an identical solution containing PN as a solid additive (100 wt% relative to the total donor and acceptor solids). All solutions were dissolved in CB and stirred overnight at 70 °C. During deposition, the total solution flow rates were set to 120, 96, 72, 48, or 24 μL min−1 in batches 1 and 2, resulting in WFTs of 3.08, 2.46, 1.85, 1.23, and 0.615 μm, respectively, while batch 3 was fixed at 72 μL min−1. Relative pump flow rates were varied to control D
:
A ratios, film thickness, or the solid content of 2-PN. The TDD was kept constant in batch 1 (43.1, 34.5, 25.8, 17.2, and 8.62 μg cm−2) but varied in batches 2 and 3. Deposition conditions were consistent with earlier experiments, and the film composition at each position was determined using the same calculation methods. Dataset preparation followed the same feature design and data-cleaning protocols described previously. The final dataset comprised 2840 data points, including 10 features and 1 target label for PCE prediction. Two additional feature columns, ‘PYF-T-o DD’ and ‘PN DD’, were introduced to represent PYF-T-o as the acceptor material and 2-PN as the additive, respectively. Any inconsistent data were removed before analysis.
In this study, all ensemble-based models were built using scikit-learn and xgboost, which are open-source libraries. All code, relevant datasets and supplementary data files are openly available at DOI: https://doi.org/10.6084/m9.figshare.28140359.
This journal is © The Royal Society of Chemistry 2025 |