Mass-customization of organic photovoltaics and data production for machine learning models precisely predicting device behavior

Na Gyeong An; Leonard Wei Tat Ng; Yang Liu; Seyeong Song; Mei Gao; Yinhua Zhou; Chang-Qi Ma; Zhixiang Wei; Jin Young Kim; Udo Bach; Doojin Vak

doi:10.1039/D5EE02815A

View PDF Version

Open Access Article

This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

DOI: 10.1039/D5EE02815A (Paper) Energy Environ. Sci., 2025, Advance Article

Mass-customization of organic photovoltaics and data production for machine learning models precisely predicting device behavior

Na Gyeong An*^abc, Leonard Wei Tat Ng^d, Yang Liu^e, Seyeong Song^cf, Mei Gao^a, Yinhua Zhou^e, Chang-Qi Ma^g, Zhixiang Wei^h, Jin Young Kim*^c, Udo Bach*^b and Doojin Vak*^a
^aCommonwealth Scientific and Industrial Research Organisation (CSIRO) Manufacturing, Clayton, Victoria 3168, Australia. E-mail: asy0720@unist.ac.kr; Doojin.Vak@csiro.au
^bDepartment of Chemical and Biological Engineering, Monash University, Clayton, Victoria 3800, Australia. E-mail: Udo.Bach@monash.edu
^cGraduate School of Carbon Neutrality, School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, South Korea. E-mail: jykim@unist.ac.kr
^dSchool of Materials Science and Engineering (MSE), Nanyang Technological University (NTU), 50 Nanyang Avenue, Singapore 639798, Singapore
^eWuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
^fResearch Institute of Molecular Alchemy, Gyeongsang National University, Jinju 52828, Republic of Korea
^gi-Lab & Printable Electronics Research Center, Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences (CAS), Suzhou, 215123, China
^hChinese Academy of Sciences (CAS) Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology, Beijing, 100190, China

Received 21st May 2025 , Accepted 9th September 2025

First published on 15th September 2025

Abstract

High-throughput experimentation (HTE) combined with machine learning (ML) has emerged as a powerful tool to accelerate material discovery or optimize fabrication processes. However, in the photovoltaics field, only a few studies have successfully applied this approach using industrially relevant techniques, such as the roll-to-roll (R2R) process. We developed a universal and extendable data structure for ML training that accommodates upcoming materials, while retaining compatibility with the existing dataset. Using the MicroFactory platform, which enables mass-customization of organic photovoltaics (OPVs), we fabricated and characterized over 26 [thin space (1/6-em)] 000 unique cells within four days. To guide the selection of the ML model for precisely predicting device behavior, photovoltaic parameter and J–V prediction models to forecast device parameters and J–V curves, respectively, were developed. The Random Forest model proved to be the most effective, achieving a PCE of 11.8% (0.025 cm²)—the highest for a fully-R2R-fabricated OPV. By integrating accumulated datasets with smaller new-component datasets, we enhanced model performance for PM6:Y6:IT-4F and PM6:D18:L8-BO systems, showing that models trained on binary systems can predict ternary device performance and enabling the development of generalized ML models for future high-performance materials.

Broader context

Combining high-throughput experimentation with machine learning (ML) accelerates material discovery or optimizes fabrication processes through data-driven decision-making. Previous studies in photovoltaics have explored ML-driven research using roll-to-roll (R2R) technology, but no systematic approach has guided the selection of ML models to precisely predict device behavior. The next critical step is to forecast device properties of new materials using insights from existing ones, which requires an expandable data structure. Here, we present a universal and expandable data structure designed to enhance ML models and develop models that enable accurate prediction of photovoltaic parameters and J–V characteristics. Using the MicroFactory platform, we fabricated over 26 [thin space (1/6-em)]

000 unique organic photovoltaics (OPVs), generating datasets that were used to train these models. Among the models tested, Random Forest proved to be the most effective model, and its predicted formulation led to a PCE of 11.8% for fully-R2R-printed OPVs. Moreover, we demonstrated that models trained on binary systems can successfully predict the performance of ternary devices, highlighting how accumulated, compatible datasets enable the development of generalized ML models with broad applicability to new material systems.

Introduction

Organic photovoltaics (OPVs) hold immense promise as a clean and renewable energy source due to their myriad advantages, including being lightweight and flexible,¹ non-toxic^2,3 and colour-tunable.^4–6 Recent advances driven largely by materials discovery have pushed the power conversion efficiencies (PCEs) of OPVs beyond 20%.^7–11 However, a gigantic parameter space—encompassing an astronomical number of material and processing parameter combinations—remains unexplored, leaving much of the potential of OPVs untapped.

High-throughput experimentation (HTE) has emerged as a promising avenue to address the limitations of conventional labor-intensive experimentation. It allows for the rapid screening of large combinational parametric libraries with fast characterization tools, thereby reducing labor, time and resource requirements. By combining HTE with machine learning (ML), vast datasets can be efficiently processed, accelerating material discovery or optimizing fabrication processes through data-driven decision-making.¹² However, many ML-driven studies in photovoltaics have relied on fabrication techniques incompatible with large-scale and mass production, such as spin coating.^13–18 It is difficult to utilize ML models trained with data from industry-incompatible fabrication techniques for a lab-to-fab transition. This is because such fabrication techniques require the redevelopment of processing conditions specific to scalable manufacturing methods.¹⁹ Furthermore, high-efficiency materials optimized under small-scale laboratory conditions often fail to perform reliably in large-scale processing, emphasizing the need to develop materials tailored for scalable manufacturing rather than modifying existing processes to accommodate lab-optimized materials.^20,21 While some pioneering work has successfully demonstrated ML-driven research with roll-to-roll (R2R) technology, an industrially relevant method,^22–24 no systematic study has been conducted to give guidance for determining ML models for accurately forecasting device behavior. Furthermore, the next critical step is to predict the device behavior of new materials, leveraging insights from existing materials. This requires an expandable data structure for ML training to be able to accommodate new high-performance materials to be developed in the future, a critical gap that is yet to be addressed.

Here, we present a universal and expandable data structure designed to train ML models that continuously improve as more data are accumulated, regardless of the OPV material components. Through feature engineering, the training features are crafted to accommodate new materials, while maintaining compatibility with the existing dataset, enabling ML models to develop a comprehensive knowledge of OPV materials. Using the MicroFactory platform that mimics industrial processes,^24,25 we applied R2R slot-die coating in a desktop machine to develop a manufacturing process with minimum material usage and automated in situ formulation. This approach allowed for mass-customization of OPVs by continuously altering fabrication parameters and formulations, while a R2R tester facilitated testing of the completed devices, leading to the production of over 26 [thin space (1/6-em)] 000 unique OPV cells.

The fabrication parameters and device outcomes were digitized and used to train ML models. Using this unprecedented amount of OPV data, we developed not only photovoltaic parameter (PP) prediction models that forecast device parameters, but also J–V prediction models creating J–V curves of untested devices. These models guided the selection of ML models through systematic evaluation by screening various algorithms, followed by hyperparameter optimization for precisely predicting device behavior. The Random Forest (RF) model proved to be the most effective. Using the RF-predicted formulation, we fabricated a device achieving a PCE of 11.8%, a record-breaking efficiency for fully-R2R-fabricated OPVs. Furthermore, we demonstrated that combining large, accumulated datasets with smaller datasets from different material systems improves ML model performance, showing that models trained on binary systems can predict ternary performance and supporting the creation of generalized models capable of forecasting device behavior for new materials.

Results and discussion

High-throughput device fabrication/characterization using the MicroFactory platform

In this study, we selected PM6 as a donor and L8-BO as an acceptor material, and optimized the R2R device configuration of PET (polyethylene terephthalate)/TCE (transparent conducting electrode)/PM6:L8-BO/BM-HTL-1/S315, as detailed in Notes S1 and S2. The material structure, energy diagram, device configuration, and resulting device performance are presented in Fig. 1a–c and Fig. S1–S12 and Tables S1–S8. Additionally, the effects of various additives, including 1,8-diiodooctane (DIO), 1-chloronaphthalene (CN), 2-hydroxy-4-methoxybenzophenone (2-HM), and 1,4-dichlorobenzene (DCB), were investigated, but no significant impact on device performance was observed (Fig. S11 and Table S7, SI). To efficiently generate a large dataset, we developed the MicroFactory platform (Fig. 1d and Fig. S13 and S14, SI) with resources from the 3D printing community. In our previous work, we introduced a 3D-printer-based slot-die coater as a lab-to-fab translation tool,²⁶ which required only minimal modification to an existing 3D printer. Building on this concept involves the construction of a fully customized machine derived from a 3D-printing framework, with operations governed by the open-source G-code protocol. Additional technical details are provided in Note S3.


	Fig. 1 Materials, energy diagrams and structure of R2R OPV device and illustration of MicroFactory platform. (a) Chemical structures of materials used in this study. (b) Illustration of detailed device structure. (c) Energy diagram of fully-R2R-fabricated OPVs. (d) A 3D schematic of the MicroFactory platform featuring a custom-built automatic R2R coater and tester, along with a database that stores all collected data. The inset image in Fig. 1d illustrates the fabrication of an active layer with a gradient composition, represented by a rainbow-colored film, in a single deposition process. By employing programmable syringe pumps, two solution flows were controlled linearly, and the solutions were mixed in situ and deposited onto a continuously moving substrate. Nine deposition parameters were collected and stored in a database during deposition. Subsequently, the devices were automatically characterized using the R2R tester until a specified number of measurements had been made. The data collected from three coating runs and one test run were integrated into one consolidated dataset based on the position of the roll (device position).

With this optimized device configuration and the advanced capabilities of the MicroFactory platform, we fabricated over 26 [thin space (1/6-em)] 000 OPV cells across two experimental batches. These experiments systematically explored fabrication parameters, with each batch divided into multiple sub-batches to alter the donor:acceptor (D:A) ratio or film thickness. We previously reported a dual-feed deposition method to formulate solutions in situ and a way to digitize active layer composition by introducing deposition density (DD) and total deposition density (TDD).²³ DD is an absolute quantity of a component per unit area (in μg cm⁻²), and TDD is the sum of the DDs of all materials and shows a strong correlation with film thickness. Due to the lossless nature of slot-die coating, DD can be derived from deposition parameters and solution compositions. This lossless deposition significantly reduces material consumption compared to traditional spin-coating methods. In our experiments, we used only 10.3 mg of PM6 and 12.2 mg of L8-BO per 1000 devices, corresponding to ∼964 devices per mL of solution for both materials. In contrast, spin coating typically yields only 50–58 devices per mL. This high material efficiency highlights the practicality of our approach for screening new materials, even when available sample quantities are limited.

While our previous study required several months to collect data from about 2000 devices,²³ the MicroFactory platform enabled the high-throughput fabrication of over 10 [thin space (1/6-em)] 000 devices in one day by one researcher, with testing completed the next day. Fabricating and characterizing over 26000 OPV cells took only four days with the coater and tester operating in parallel. All data were saved online and could be simply combined using Python scripts. Although we paused fabrication for data analysis, this unprecedented fabrication capability demonstrates significant potential for creating big data on OPV manufacturing parameters.

Fig. 2a presents individual PPs, DD of each material and L8-BO/PM6 ratios for 26 [thin space (1/6-em)] 422 devices based on the device position on a 150-m-long substrate (Fig. 2b). Notably, failed devices were intentionally fabricated as markers between sub-batches and used to verify the combined dataset of fabrication and testing parameters. As it is impossible to display J–V curves for all devices, we showed J–V curves for the four most characteristic devices with the highest PPs in Fig. 2c and Table 1. The best device achieved a power conversion efficiency (PCE) of 11.6% at a D [thin space (1/6-em)] :A ratio of 1:1.13 (Fig. S15, SI), similar to the optimal formulation found in a previous study.²⁷ Although this PCE is already significantly higher than that of the best fully-R2R-fabricated OPVs in the literature (Fig. S16 and Table S9, SI), further improvement might be possible through ML-assisted device optimization. However, the achievable PCE would remain below 14.1%, a boundary determined by the best fill factor (FF), current density (J_SC) and open circuit voltage (V_OC), due to the inherent trade-offs among those photovoltaic parameters. However, identifying PP trends within a multidimensional fabrication parameter space from such large datasets is impractical for humans, necessitating the use of ML.


	Fig. 2 PP distribution for a total of 26422 devices, photograph of an actual device and J–V characteristics of four selected devices. (a) PP data of individual devices, DD of each material, and L8-BO/PM6 ratios (w/w) across 26422 PM6:L8-BO devices, organized based on their positions in the device. These devices were part of 13 sub-batches in each batch, wherein various D:A ratios or film thicknesses were explored. The devices with 0% PCE were intentionally fabricated as checkpoints at the end of each sub-batch. (b) Photograph of actual devices corresponding to ∼74.8 m for one batch. (c) J–V characteristics of four selected devices, each showcasing the highest J_SC, V_OC, FF, and PCE values among 26422 devices. The positions of each value are indicated as stars on each J–V curve, except for the maximum PCE value. The maximum PCE, calculated to be 14.1%, was based on these values. Possible MPPs of ideal devices with the PCE are depicted as purple stars for visualization using the equation, J_MP = –PCE/V_MP, where J_MP is the current density and V_MP is the voltage of the MPPs within the voltage range 0–V_OC V (0–0.884 V in this case). The larger and darker star indicates the MPP closest to the J–V characteristics of device demonstrating the highest PCE.

Table 1 Summary of PPs of four selected devices exhibiting the highest J_SC, V_OC, FF and PCE, among 26 [thin space (1/6-em)]

422 devices

Device	J_SC (mA cm⁻²)	V_OC (V)	FF	PCE (%)	R_sh (Ω cm²)	R_s (Ω cm²)
Max J_SC	22.0	0.805	0.548	9.70	1130	3.01
Max V_OC	9.89	0.884	0.410	3.59	210	11.6
Max FF	17.4	0.831	0.727	10.5	996	2.00
Max PCE	20.5	0.835	0.677	11.6	5004	2.58

Developing ML models

Utilizing ML for data analysis involves several steps, as illustrated in Fig. 3a. With the data already collected, the next important step is to design features as input parameters for ML training. Extracting meaningful features from the available data is common practice. As we previously reported, ML models using simplified features, such as the DD of three materials, were able to visualize their effects.²³ However, we found that this approach was not expandable for additional materials or fabrication parameters. Hence, we designed generic and material-specific features (DD of each material) that can accommodate upcoming materials. Although some features may look like a duplication of another feature, this ensures that the data can be used universally. Further details can be found in Note S4, with descriptions of the data structures and features available in Tables S10–S12, and the corresponding numerical values stored in Data S1 and S2.


	Fig. 3 Workflow overview in this study, J–V measured and predicted characteristics and predicted PPs of 3 randomly selected devices. (a) Illustration of the comprehensive workflow in this study, comprising automated device fabrication and characterization, data storage of deposition and photovoltaic data in a database, and ML application. (b)–(g) J–V actual and predicted characteristics of 3 randomly selected devices. Predicted PPs, including J_SC, V_OC, R_sh, and R_s, are denoted by star-shaped markers and thick solid lines, derived from each best PP prediction model. Measured J and predicted J values were obtained from actual characterization and the J–V prediction model, respectively. The MPP candidates were identified using the same method as that described in Fig. 2b, and are represented by star-shaped markers, with larger ones indicating the MPP point closest to the actual J–V curve (MPP_close). Additionally, the predicted J_SC, V_OC, R_sh and R_s values are represented as star-shaped markers or solid lines, derived from each best PP prediction model. The predicted FFs are depicted as ivory-colored filled rectangles, with J and V values determined from MPP_close. Predictions were generated using RF models for panels (b)–(d), and XGBoost models for panels (e)–(g).

After preparing and cleaning the data, ML models using various algorithms were trained and compared for a given multidimensional regression problem. Seven bagging and boosting algorithms were initially selected for evaluation. Bagging independently trains multiple weak learners in parallel and averages their predictions, reducing variance and improving stability. In contrast, boosting sequentially builds weak learners, with each new model correcting the errors made by its predecessor.²⁸ These ensemble approaches are generally less prone to overfitting, where a model learns the training data too closely, including noise or fluctuations, resulting in poor generalization to new data. To identify the best-performing models, we optimized hyperparameters for each algorithm using GridSearchCV with 5-fold cross-validation. This approach systematically explores all possible combinations of predefined hyperparameter values and evaluates performance to find the optimal set. Due to the training time required for large numbers of up to two million hyperparameter combinations, we used a small fraction (2.5%) of data for training (see Note S5 for details, SI). Each hyperparameter combination was evaluated based on performance metrics: coefficient of determination (R²), mean absolute percentage error (MAPE), root mean squared error (RMSE)²³ and overfitting index (OI, defined by the R² of the test or validation dataset/R² of the training dataset, where lower values indicate stronger overfitting, whereas an OI close to 1 implies that the model generalizes well). This evaluation utilized 80% of the trained dataset for model training and the remaining 20% for model validation (Fig. S17, SI). While RMSE is inherently tied to the scale of target variables, making it a scale-dependent metric.^29–31 MAPE can become undefined or problematic when actual values are zero or near zero, resulting in infinite or extreme values.^30–32 In contrast, R² offers a scale-independent evaluation,³³ quantifying how well the model accounts for the variance in data relative to total variance. Given these considerations, we employed R² as a universal metric to determine the optimal hyperparameter combination of models generated by each algorithm and to enable cross-scale comparison between different prediction models. The performance metrics for the best model from each algorithm are shown in Fig. S18–S24 and Tables S13–S19 with detailed discussions in Note S6. We found no outstanding model, while the Ada Boost algorithm produced a model with inferior performance to others.

Based on this evaluation, we selected one widely used algorithm from each category for subsequent analysis: Random Forest (RF) for bagging and Extreme Gradient Boosting (XGBoost) for boosting. While optimizing most of the available hyperparameters (Note S7, SI), we discovered that only a few hyperparameters (max leaf nodes, max features and number of estimators for RF, and learning rate, number of estimators and max depth for XGBoost) made a meaningful difference across all PPs. Therefore, we picked the top three most effective hyperparameters and selectively showcased their impact on model performance for predicting J_SC and PCE (Fig. S25–S28, SI). This information will serve as a comprehensive guide for future hyperparameter optimization.

The models were then re-optimized with different dataset sizes (2.5%, 20%, and 80%) using only the three most influential hyperparameters. We observed general agreement in the optimal hyperparameters across various data fractions, indicating that hyperparameters optimized with a small dataset do not require re-optimization for larger datasets (Fig. S29–S32 and Tables S20 and S21, SI). The final optimized hyperparameters are shown in Note S8.

Performance of ML models for photovoltaics

While PCE is a key parameter of interest, we developed prediction models for all PPs, including series resistance (R_s) and shunt resistance (R_sh), for a comprehensive understanding of device behavior based on fabrication parameters. Additionally, we trained models to create J–V curves, referred to as J–V prediction models, by introducing reading voltage values in J–V curves as features in the training dataset. This approach resulted in about 1.8 million data points from 71 data points per J–V curve, with the model making 71 predictions using various reading voltages. The selection of the best J–V prediction models and their optimized hyperparameter combinations are described in Note S9. Fig. 3 shows the predicted PPs and J–V curves alongside the original measured J–V curves of three randomly selected devices from the test dataset.

Therefore, our models offer three ways to predict PCE: (i) direct prediction from the PCE prediction model, (ii) PCE calculated from the predicted FF, V_OC and J_SC, and (iii) PCE calculated from the created J–V curves. We found general agreement in PCE from all three approaches, with similar and exceptionally high R² values (Fig. 3 and Fig. S33 and Table 2 and Table S22, SI). This high model performance is attributed to a consistently produced training dataset rather than collection from multiple sources and optimized hyperparameters. Although any of these three approaches can predict PCE, the J–V-based approach provides comprehensive device insights while demanding more computing resources. However, once hyperparameters are optimized, training each model with 1.8 million datasets of 10 input parameters takes only 15 min for RF and 7 s for XGBoost on a desktop computer, suggesting that computing resources will soon no longer be a limitation.

Table 2 R² score, MAPE and RMSE of each PP (R_JV², MAPE_JV and RMSE_JV) calculated from predicted J–V curves. R² score, MAPE and RMSE of PP prediction model (R_PP², MAPE_PP and RMSE_PP) are provided for comparison

Model	Dataset	PP	R_JV²	MAPE_JV	RMSE_JV	R_PP²	MAPE_PP	RMSE_PP
a Metrics within the parentheses are derived from R_sh values belonging to the bottom 30% (<366 Ω cm²).b Metrics within the parentheses are derived after excluding outliers (about 3% of entire dataset).
RF	Test	J_SC	0.980	13.6	0.720	0.982	21.0	0.683
XGBoost		J_SC	0.979	248	0.737	0.984	21.1	0.650
RF		V_OC	0.953	1.46	0.037	0.964	1.38	0.032
XGBoost		V_OC	0.893	2.04	0.056	0.965	1.46	0.032
RF		FF	0.925	3.89	4.37	0.957	3.24	3.30
XGBoost		FF	0.866	6.48	5.84	0.961	3.20	3.17
RF		PCE	0.949	6.82	0.681	0.958	6.83	0.613
XGBoost		PCE	0.932	9.28	0.784	0.962	6.77	0.589
RF		R_sh	−7.26	98.7	13364	−0.225 (0.839)^a	75.5 (15.6)^a	5146 (41.5)^a
XGBoost		R_sh	−9.11	528	14786	−2.65 (0.800)^a	98.5 (16.4)^a	8879 (46.3)^a
RF		R_s	−0.001	10.4	17552	0.028 (0.886)^b	3816 (6.80)^b	17296 (0.773)^b
XGBoost		R_s	−0.001	15.4	17552	0.028 (0.885)^b	3751 (7.04)^b	17296 (0.775)^b

Despite XGBoost's advantage over RF in training time due to parallelized computing, it performed poorly in the J–V prediction model, as observed in 3 and 15 additional randomly selected devices (Fig. 3 and Fig. S34, SI). Although it still achieves comparable model evaluation metrics to the RF model (Fig. S35 and Table S23, SI), these metrics fail to capture localized fluctuations in the predicted J–V curves, which become evident only when the curves are plotted and compared against experimental data. To further understand model behavior, we calculated feature importance for both RF and XGBoost J–V prediction models, where higher values indicate greater influence on the model predictions. Among all features, V showed the highest importance for both RF and XGBoost models. However, as J and V are directly correlated in J–V prediction, we excluded V from subsequent analyses to better highlight the relative importance of the remaining features (Fig. S36, SI). Our analysis revealed that ‘Acceptor Frac’ was the most influential feature for both models, while TDD showed much lower contributions. The lower importance of ‘Donor Frac’ is expected because it is linearly dependent on ‘Acceptor Frac’ (Donor Frac = 1 − Acceptor Frac), meaning that the model relies primarily on one of the two to avoid redundancy. Importantly, this does not imply that donor content is less significant for device behavior, but rather that its contribution has already been captured through ‘Acceptor Frac’. These results suggest that the D [thin space (1/6-em)] :A ratio plays a more critical role than active layer thickness in predicting J–V characteristics. Taken together, these results indicate that while RF and XGBoost identified similar trends in feature importance, RF provided more stable and consistent predictions across datasets. Therefore, RF-based J–V prediction models were chosen for photovoltaic optimization.

We observed poor performances in resistance prediction models. For the R_s model, this was attributed to a small number of extreme outliers, possibly from measurement error (poor contact during the measurement). After excluding about 3% of outliers, the model showed good performance with an R² score of ∼0.88. The R_sh model underperformed because the slope of the J–V curves at 0 V for good devices is close to zero, making the inverse value highly sensitive to measurement noise. Despite this, we found that the model proved useful in predicting defective devices (R_sh < 336 Ω cm², the bottom 30% in this dataset) with an R² score of 0.80.

Photovoltaic optimization using ML models

The J–V prediction ML model provides a whole new method of device optimization. We could browse experimental parameters with an interactive ML tool, instantly showing J–V curves of virtually planned experiments (Video S1). The tool provides insights into parameter-dependent device performance to researchers, while full optimization was carried out programmatically by generating all possible parameter combinations and sorting by predicted PCE. However, this approach will provide a global optimum point, so all top formulations would be in the same zone and would miss serendipitous discoveries of high-performance devices that may not be designed by human knowledge.

To address this, we split all possible parameter combinations using coarse data resolution (2 μL min⁻¹ or μg cm⁻²) into clusters and examined the optimum condition for each cluster. Using K-means clustering, an unsupervised ML algorithm, we created ten clusters (Fig. S37 and Table S24, SI). While the clusters of evenly spaced datasets differ from the conventional meaning of clusters, this provides a simple solution to grouping multidimensional data. Subsequently, we generated all possible combinations with finer data resolution (0.5 μL min⁻¹ or μg cm⁻², 6561 parameter combinations for each cluster) around each cluster's top formulation to refine them further. Finally, we compared the Euclidean distance (ED) (eqn (1)) of each cluster's top formulation from the global-best formulation in cluster 0 (Table S25 and Note S10 for details, SI).

Based on ED and predicted PCE, we fabricated devices using the predicted formulation of the global best and that of the cluster farthest from cluster 0. We found other clusters beyond the selected cluster; however, parameters of those clusters were challenging to fabricate due to excessively high flow rates, which caused overflow issues and slightly lower PCEs. The device results are shown in Fig. 4 and summarized in Table 3. The global-best formulation achieved up to 11.8% PCE, significantly higher than the best PCE for fully-R2R-fabricated OPVs in the literature. Previously, the highest reported PCE was 5.6%, as summarized in Fig. S16 and Table S9. Only recently, we reported 9.35% using this high-throughput setup,²⁴ and this work marks another leap in the record PCE for R2R-fabricated OPVs. This clearly demonstrates how laboratory innovation and digital technologies such as ML can accelerate OPV progress and potentially other printed electronics.


	Fig. 4 Comparison of J–V characteristics and EQE spectra for R2R OPVs with top formulations under varied solvent conditions. J–V characteristics of R2R OPVs with the top formulation in clusters 0 and 1 using processing solvents (a) and (c) CB and (b) and (d) Xyl, along with (e) and (f) corresponding EQE spectra. Predicted J–V characteristics obtained from the J–V prediction model are also included for comparison.

Table 3 Summary of PPs for R2R OPVs with the top formulation in clusters 0 and 1 using CB and Xyl processing solvents. PPs obtained from PCE and the J–V prediction model are also listed for comparison

Cluster Number	Prediction	Solvent	J_SC (mA cm⁻²)	Cal. J_SC (mA cm⁻²)	V_OC (V)	FF	PCE (%)	R_sh (Ω cm²)	R_s (Ω cm²)
a 200 devices were fabricated for each formulation. All statistical data were calculated from 200 devices.
0	PCE	CB	19.7	—	0.834	0.632	11.3	913	2.72
	J–V		20	—	0.829	0.679	11.3	1319	2.67
	—^a		20.3 (19.9 ± 0.821)	19.91	0.835 (0.831 ± 0.010)	0.694 (0.688 ± 0.035)	11.8 (11.3 ± 0.550)	874 (2065 ± 6883)	2.18 (2.14 ± 0.279)
	—^a	Xyl	20.0 (19.8 ± 0.674)	19.77	0.824 (0.820 ± 0.004)	0.678 (0.672 ± 0.025)	11.2 (10.9 ± 0.195)	6993 (1885 ± 5139)	1.77 (1.99 ± 0.184)

1	PCE	CB	20.5	—	0.810	0.625	10.3	534	2.09
	J–V		20.7	—	0.810	0.619	10.4	1730	2.01
	—^a		21.6 (21.3 ± 0.573)	20.83	0.814 (0.804 ± 0.008)	0.632 (0.564 ± 0.056)	11.1 (9.67 ± 1.01)	3894 (813 ± 700)	2.90 (3.80 ± 0.808)
	—^a	Xyl	20.8 (19.7 ± 1.51)	20.63	0.819 (0.816 ± 0.010)	0.638 (0.590 ± 0.063)	10.9 (9.51 ± 1.35)	664 (1210 ± 3153)	1.72 (1.96 ± 0.447)

With a view towards the eco-friendly manufacturing of OPV, we tested the parameters using a non-halogenated solvent, o-xylene (Xyl), and found Xyl-based devices performed comparably (Fig. 4 and Table 3). These results show great potential for the commercial application of fabrication parameters found in this work. External quantum efficiency (EQE) spectra were obtained after printing the silver electrode and encapsulating the cells for manual testing (Fig. S37, SI). All four devices show good agreement between J_SC measured from the tester and calculated from EQE spectra (Fig. 4e and f).

The future of data-driven and ML-assisted photovoltaic research

While the MicroFactory platform has a unique strength in thoroughly screening parameters of given materials, it is limited in the number of materials that can be automatically processed in a batch. Conversely, robotic-arm-based HTE systems can screen a variety of materials, albeit with a relatively lower throughput compared to the MicroFactory platform.¹² It would be ideal to combine two complementary HTE platforms to accelerate the research cycle from material screening to the development of manufacturing techniques. A key question is whether large datasets can be reused to optimize newly developed materials. To address this, we aimed to create a data structure that can accommodate upcoming high-performance materials, enabling the continuous improvement of ML models with accumulated data. To achieve this, we created generic features as well as material-specific features. For example, “Donor Frac” in Table S11 may seem redundant, as it can be easily derived from “PM6 DD” and “L8-BO DD”. However, this feature becomes crucial when introducing a co-donor or a different donor material. Without such features, ML models would not understand the relationship between a new donor and PM6. By leveraging the established knowledge of known materials, ML models could be trained for new materials without re-creating extensive datasets.

To prove this hypothesis, we combined the large dataset with smaller datasets, including new components, with one new material-specific feature per component (see Fig. S2 for detailed chemical structure, SI), trained RF-based models, and compared their performance depending on the size of the additional datasets. We first used a dataset created from our previous study on a PM6:Y6:IT-4F ternary blend.²³ In that case, we simplified the system and minimized the number of features to visualize all feature-dependent performances in 3D space, manually re-creating features based on experimental parameters. New datasets of a PM6:D18:L8-BO ternary system and a PM6:PYF-T-o all-polymer system were also created by running relatively small batches. For training, we used batches of data rather than individual cells, specifically 294, 467, 1240 and 1990 for the PM6:Y6:IT-4F, 257, 517, 1289 and 2053 for the PM6:D18:L8-BO ternary system and 258, 784, 1538, and 2323 for the PM6:PYF-T-o all-polymer system. The remaining datasets were reserved for testing (for details, see Note S11, SI). The complete datasets and model performance results are available in Data S3 and S4, respectively.

Fig. 5 illustrates the performances of RF-based models trained without and with the accumulated PM6:L8-BO dataset. Despite the PM6:Y6:IT-4F system having only one common material, the models trained with the accumulated data consistently outperformed those without it; in particular, there was an improvement with smaller training datasets. The same trend was observed in the PM6:D18:L8-BO system, which shares two common materials with the accumulated dataset, leading to even more significant improvements. Notably, the model trained with only 10% of the new dataset performed well in predicting the remaining 90%. Interestingly, while integrating the accumulated dataset substantially enhanced predictive accuracy for ternary systems, the improvement was not consistent for the PM6:PYF-T-o all-polymer system (Fig. S39, SI). Specifically, in cases where the training dataset sizes were 784 or 1538, the test R² values showed a slight decrease after adding the accumulated data. This behavior may arise from intrinsic differences between all-polymer systems and small-molecule NFA-based systems. These findings suggest that polymer acceptors may require a separate model category or an additional independent feature to better capture their distinct characteristics.


	Fig. 5 Test R² performance across training dataset sizes without or with the accumulated dataset for two ternary systems. Box plots illustrating test R² for two material systems: (a) PM6:Y6:IT-4F and (b) PM6:D18:L8-BO, in relation to the average number of pieces of training data, corresponding to about 10, 20, 50 and 80% without and with the accumulated dataset (PM6:L8-BO material system composed of 20199 data points). Each data point for the R² is represented as a circle-shaped marker. Pink and blue boxes represent the interquartile range (IQR) of the test R² without and with the accumulated data, respectively, extending from the first quartile (Q1) to the third quartile (Q3). Each median is indicated by a solid line within the box, accompanied by a numerical value. The boundaries of the whiskers follow the 1.5 IQR rule to identify outliers, where the lower whisker extends to the lowest data point above Q1 − 1.5 × (Q3 − Q1), and the upper whisker extends to the highest data point below Q3 + 1.5 × (Q3 − Q1). Outliers are denoted by a marker ‘×’.

Nonetheless, these results highlight the value of digitalizing research parameters and demonstrate how existing knowledge can be applied to new material systems in OPV research. By showing that models trained on binary systems can predict ternary performance, this work illustrates how accumulated, compatible datasets enable the development of ML models with a comprehensive understanding of materials and fabrication parameters. Such models could significantly accelerate OPV commercialization.

Conclusion

We developed a new data-driven research approach that will significantly accelerate the progress of printed OPV towards commercialization. It includes the generation of reusable and expandable training data capable of predicting the properties of new materials and developing PP and J–V prediction ML models. The data was first generated using the MicroFactory platform, which facilitated the mass-customization and characterization of over 26 [thin space (1/6-em)]

000 OPV cells in just four days through fully digitalized processes. These data were then used to train the PP and J–V prediction models that can guide researchers to rapidly optimize OPV manufacturing parameters. We found that the J–V prediction models are useful not only for predicting efficiencies but also for understanding device behavior based on the composition and fabrication parameters of untested devices. The RF-J–V prediction models accurately predicted J–V curves with an exceptional R² score of 0.969. The PCEs calculated from these curves were also as precise as those from PCE prediction models, both showing an R² score of approximately 0.96. This high model performance is attributed to the large quantity of systematically collected high-quality data, as well as finding suitable algorithms followed by thorough hyperparameter optimization. In addition, an OPV device using the RF-model-predicted formulation recorded 11.8% PCE, the best efficiency for fully printed OPVs achieved to date. Further data accumulation will enable the development of more generalized ML models that leverage accumulated and compatible datasets to predict device behavior for new materials and fabrication conditions. By demonstrating that models trained on binary systems can effectively predict ternary device performance, this work highlights how data reuse accelerates optimization and drives OPV development toward commercialization. This study paves the way to a digital revolution in printed photovoltaic research and, potentially, solution-processed electronics.

Experimental section

Material preparation

PM6 (PBDB-T-2F), L8-BO (L8-BO-2F), and BM-HTL-1 were procured from Brilliant Materials. Qx-1 was synthesized following the procedures introduced in ref. 34 and was also purchased from Hyper Chem. D18 (PCE18) was obtained from 1-Material. S315 (Orgacon S315) was purchased from Agfa-Gevaert N. V. PEDOT:F was synthesized following the procedures outlined in ref. 35. A commercial solution of ZnO NPs (5.6% w/v in isopropanol (IPA)) was sourced from Infinity PV. PEIE (polyethylenimine, 80% ethoxylated solution, 37 wt% in H₂O), CB (chlorobenzene, anhydrous, 99.8%), ortho-xylene (o-xylene, anhydrous, 97.0%), 2-methoxyethanol (anhydrous, 99.0%), and IPA (anhydrous, 99.0%) were acquired from Sigma-Aldrich. The TCE (OPV8, 8 Ω sq⁻¹) was obtained pre-patterned on PET from Mekoprint A/S with a proprietary deposited metal composition. Ag paste (PV416) was purchased from DuPont. A 3M^TM ultra barrier solar film 512 was purchased from 3M.

R2R device fabrication and optimization

R2R devices were fabricated on pre-patterned TCE-coated PET substrates. A customized R2R coater equipped with programmable syringe pumps facilitated the experiment. A solution of ZnO NPs was prepared by diluting it with IPA in a 1 [thin space (1/6-em)]

1 ratio (v/v), resulting in a 2.8% solution (w/v). A solution of PEIE (0.4 wt% dissolved in 2-methoxyethanol) and the ZnO NP solution were concurrently deposited using two distinct slot-die heads (head 1: PEIE; head 2: ZnO NP). The substrate passed through head 1 followed by head 2, enabling deposition of the ZnO NPs layer over the PEIE layer. Both solutions were coated at a speed of 20 cm min⁻¹ with varying solution feed rates (PEIE: 15 μL min⁻¹ and ZnO NP: 20 μL min⁻¹), resulting in wet film thicknesses (WFT) of 5.76 μm for PEIE and 7.69 μm for ZnO NP based on a coating area of 26 cm² min⁻¹ (1.3 cm width × 20 cm length). The layers were then thermally annealed at different temperatures (PEIE: 130 °C and ZnO NP: 120 °C) on curved hot plates located beneath each slot-die head for ∼45 s immediately after deposition. Subsequent layers were fabricated at the same coating speed (30 cm min⁻¹). Two different active solutions, PM6 [thin space (1/6-em)]

Qx-1 and PM6 [thin space (1/6-em)]

L8-BO (D

A = 1

1.5, w/w), were dissolved in Xyl (25 mg mL⁻¹ total concentration) and stirred at 80 °C overnight. During deposition, the solution feed rates were fixed at 60 μL min⁻¹ (WFT: 1.54 μm based on a coating area of 39 cm² min⁻¹, 1.3 cm width × 30 cm length). Optimization of the active layer involved two technologies. First, films were fabricated using a hot deposition technique^36,37 at various deposition temperatures (head/bed: RT/RT, 60/60, 60/100, 90/60, and 90/100 °C). Heat was applied directly to the slot-die head using a custom-built heater probe into a metal slot die, while a second heat source in a coating stage was applied to the substrate during slot-die solution deposition. Films were then thermally annealed at 130 °C on the curved hot plate for ∼30 s. Next, drying kinetics were controlled by either thermally annealing the film at 130 °C on a coating stage, blowing air onto the wet film during deposition, or allowing the film to dry naturally without treatment for comparison. To fabricate a thin-film HTL, a PEDOT [thin space (1/6-em)]

F solution was diluted with IPA in a 1 [thin space (1/6-em)]

1 ratio (v/v). The PEDOT:F and S315 solutions were deposited from heads 1 and 2, respectively, to create the PEDOT:F layer underneath the S315 layer. Solution feed rates were 30 and 168 μL min⁻¹ for the PEDOT:F and S315, respectively, corresponding to WFTs of 10 μm and 8 μm based on coating areas of 30 cm² min⁻¹ (1 cm width × 30 cm length) and 21 cm² min⁻¹ (0.7 cm width × 30 cm length) at the same coating speed. The layers were then thermally annealed at different temperatures (PEDOT:F: 100 °C and S315: 130 °C) on curved hot plates for ∼30 s. All R2R device fabrication was conducted under ambient laboratory conditions, where the relative humidity and temperature were maintained at 35.8 ± 3.68% and 22.4 ± 0.58 °C (mean ± standard deviation), respectively.

High-throughput fabrication of R2R device

For high-throughput device fabrication, we selected the PM6:L8-BO system as an example. All active layers were formulated using CB solvents due to the limited solubility of L8-BO in the acceptor-rich solution when using Xyl solvents. A pre-mixed PM6 [thin space (1/6-em)]

L8-BO solution (1 [thin space (1/6-em)]

1.2, w/w, PM6 concentration: 10 mg mL⁻¹) was prepared first, along with an identical solution containing various additives: DIO, CN, 2-HM and DCB. The additive concentrations were set to 2 and 3 vol% for DIO and CN, respectively, while 2-HM and DCB were added at 150 wt% relative to the donor solids and total donor and acceptor solids, respectively. Relative pump flow rates were adjusted to control the solid content of additives and to determine the optimal additive conditions. Based on these evaluations, the best-performing solution (without additives) was selected for large-scale fabrication. Using this optimized formulation, a total of 26 [thin space (1/6-em)]

422 OPV devices were produced across two independent batches, each yielding more than 13 [thin space (1/6-em)]

000 devices. The device fabrication process mirrored those introduced in the ‘R2R device fabrication and optimization’ section, with variations in the deposition of the active layer and HTL. These variations are detailed below for each batch.

Batch 1: active layer solutions of PM6:L8-BO were prepared in CB at 80 °C overnight, with two distinct D [thin space (1/6-em)] :A ratios of 3:1 and 1:3 (denoted as donor- and acceptor-rich solutions), maintaining a fixed donor concentration of 10 mg mL⁻¹. These solutions were deposited using two programmable syringe pumps, combined via a Y-connector, and the in situ mixed solution was fed through tubing with an inner diameter of 0.5 mm. 13 sub-batches were conducted throughout the batch. Total solution flows remained constant at 72, 48, 36, or 24 μL min⁻¹ (resulting in WFT of 1.85, 1.23. 0.923, or 0.616 μm) while adjusting the relative flow rates of the two pumps for the first 4 sub-batches. This manipulation varied the D [thin space (1/6-em)] :A ratios of the total solution flow, leading to linear increases in TDD during deposition. In contrast, the volumetric ratios of the donor- and acceptor-rich solutions were held constant to achieve fixed D:A ratios of 1:3, 1:2, 1:1.5, 1:1.2, 1:1, 1.2:1, 1.5:1, 2:1, or 3:1 in the total solution flow. The total solution flows varied within the range 24–96 μL min⁻¹ (resulting in WFT of 0.616–2.46 μm), leading to linear decreases in TDD during deposition for the remaining 9 sub-batches.

Batch 2: active layer solutions of PM6:L8-BO were prepared at 80 °C overnight consisted of a donor-rich solution (D [thin space (1/6-em)] :A ratio of 3:1 and a donor concentration of 15 mg mL⁻¹) and an acceptor-rich solution (D:A ratio of 1:3 and a donor concentration of 5 mg mL⁻¹). The deposition process mirrored that for batch 1. Total solution flows were maintained at a constant rate of 120, 90, 60, or 30 μL min⁻¹ (resulting in WFT of 3.08, 2.31, 1.54, or 0.769 μm) while the relative flow rates of the two pumps were altered for the first 4 sub-batches. The D [thin space (1/6-em)] :A ratios of the total solution flow were varied while TDD remained constant, corresponding to 63.5, 47.7, 31.8, and 15.9 μg cm⁻², respectively. The volumetric ratios of the donor- and acceptor-rich solution were held constant to achieve fixed D:A ratios of 1:3, 1:2, 1:1.5, 1 [thin space (1/6-em)] :1.2, 1:1, 1.2:1, 1.5:1, 2:1, or 3:1 in the total solution flow. The total solution flows varied within the range 24–120 μL min⁻¹ (resulting in WFT of 0.616–3.08 μm), leading to linear decreases in TDD during deposition for the remaining 9 sub-batches.

All active layers were fabricated at RT for both head and bed temperatures, and the films were exposed to an air-blowing process during deposition. The dead volumes of the tubing from the Y-connector and the slot die were calculated to be ∼33.7 and ∼39.6 μL for batch 1 and batch 2, respectively. This implies that the in situ mixed solution was deposited around 28 and 20 s after mixing at flow rates of 72 and 120 μL min⁻¹. The composition of the film at each position was calculated based on the relative flow rate and the deposition offset originating from the dead volume of the deposition system. BM-HTL-1, diluted with IPA in a 1 [thin space (1/6-em)] :1 ratio (v/v), was used as the HTL material, and concurrently deposited with the S315 layer in the same manner as depicted in the previous section.

Device characterization

All devices had a defined cell area of 2.5 mm², determined by the TCE pattern and a commercially available circular aperture (Thorlab, 1.8 mm diameter). J–V characteristics of all OPV devices were characterized using an AAA solar simulator (Enlitech, SS-F5-3A), which employed a xenon lamp light source (USHIO, UXL-300D-0) coupled with an AM 1.5G solar spectrum filter. This setup was integrated with an R2R tester connected to a source measurement unit (a Keithley 2400-LV) under standard test conditions (1 sun, 100 mW cm², AM 1.5G) and ambient surroundings. The light intensity was calibrated and monitored using an Enlitech certified secondary reference cell (Si-FigRC-5021), which had been pre-calibrated by a certified KG-2 filtered Si reference cell. Ag paste was screen printed on top of the S315 layer using an SP-006 screen printer (ESSEMTEC AG) to establish electrical contact with a test clip during EQE measurement. Subsequently, the devices were encapsulated with a barrier film. EQE spectra were acquired in air using a PEC-S20 instrument (Peccel Technologies Inc.).

Film characterization

To analyze the properties of the thin films, several techniques were employed. Work function measurements were performed using photoelectron spectroscopy (PESA) in air. UV-vis absorption spectra of thin films were obtained using an SPM-C1 spectrometer (StellarNet Inc.) with a xenon lamp serving as the light source. AFM height and phase images (5 μm × 5 μm) were captured using a Bruker Dimension Icon (Bruker Corporation, USA). Scans were recorded using tapping mode and Bruker ScanAsyst-Air probes (cantilever frequency 70 kHz, spring constant 0.4 N m⁻¹, nominal tip radius 2 nm) under ambient conditions. A scan rate of 0.25 Hz was maintained for all scans. Nanoscope Analysis 1.9 (Bruker Corporation, USA) was used for image processing and production. All samples were R2R slot-die coated onto PET substrates.

ML-predicted formulation-based device fabrication

The fabrication process for all devices followed the steps outlined in the ‘R2R device fabrication and optimization’ section, excluding deposition of the active layer. PM6:L8-BO active layer solutions in CB and Xyl were prepared overnight at 80 °C, based on the ML-predicted formulations generated by the best PCE and J–V prediction models. These formulations were divided into two groups with different donor concentrations: cluster 0 (10.2 mg mL⁻¹, D [thin space (1/6-em)]

A = 1

1.14, w/w) and cluster 1 (9.29 mg mL⁻¹, D [thin space (1/6-em)]

A = 1

1.22, w/w). The active layers were deposited using a single syringe pump at solution flow rates of 72.5 and 117.5 μL min⁻¹, corresponding to WFT of 1.86 and 3.01 μm based on 39 cm² min⁻¹ coating area. This resulted in TDD of 40.5 and 62 μg cm⁻² for cluster 0 and cluster 1, respectively. All films were fabricated at RT for both head and bed deposition temperatures and dried under airflow conditions, consistent with the process used for the production of the high-throughput devices.

Similarity measurement metrics

The Euclidean distance (ED) metric was utilized for quantifying similarities between two data points in a multidimensional space using the following formula:


	(S1)

M represents the total number of features. F_k,x and F_k,y denote the kth features of two selected devices, such as the TDDs of device 1 and device 2. F_k,max denotes the maximum value of the kth features.

Ternary or all-polymer system-based device fabrication and data preparation

PM6:Y6:IT-4F system: the data from previously fabricated and characterized devices were utilized, as detailed in a previous publication.²³ The final dataset included 16 different batches, totaling 2422 data points, with 10 features and 1 label for PCE prediction. Notably, the introduction of Y6, IT-4F and o-dichlorobenzene (DCB) is represented in the dataset as columns as ‘DCB FR’, ‘Y6 DD’ and ‘IT-4F DD’.

PM6:D18:L8-BO system: the device fabrication followed the procedures outlined in the previous section, with the exception of active layer deposition. The experiment consisted of two independent batches, each with 5 sub-batches. In batch 1, donor-only solution (PM6 [thin space (1/6-em)] :D18 = 4:1, total concentration: 20 mg mL⁻¹) and acceptor-only solution (L8-BO concentration: 20 mg mL⁻¹) were prepared. In batch 2, pre-mixed solutions of PM6:D18:L8-BO (0.5:0.5:1.2, PM6 or D18 concentration: 5 mg mL⁻¹) and PM6:L8-BO (1:1.2, PM6 concentration: 10 mg mL⁻¹) were dissolved in CB at 80 °C overnight. During deposition, total solution flows were maintained at constant rates of 120, 96, 72, 48, or 24 μL min⁻¹ (resulting in WFT of 3.08, 2.46, 1.85, 1.23 or 0.615 μm), with alterations in the relative flow rates of the two pumps. The D [thin space (1/6-em)] :A ratios (batch 1) or PM6:D18 ratios (batch 2) of the total solution flow were varied, while the TDD were either changed (batch 1) or kept constant (batch 2), resulting in TDD of 68.7, 54.9, 41.2, 27.5, or 13.7 μg cm⁻², respectively. The deposition occurred under identical environments, and the compositions of the film at each position were calculated using the same methods as in the previous experiments. Dataset preparation involved consistent feature design and data-cleaning procedures as in the previous section. The final dataset comprises 2570 data points, with 10 features and 1 label for PCE prediction, incorporating D18 as the third material, represented by the column ‘D18 DD’ in the dataset. Any inconsistent data were removed prior to analysis.

PM6:PYF-T-o system: device fabrication followed the procedures described previously, except for the active layer deposition. Three independent batches were prepared, each consisting of 5 or 6 sub-batches and 1 sub-batch. In batch 1, donor-only (PM6 concentration, 14 mg mL⁻¹) and acceptor-only (PYF-T-o concentration, 14 mg mL⁻¹) solutions were prepared. In batch 2, pre-mixed solutions of PM6 [thin space (1/6-em)] :PYF-T-o at ratios of 2.5:1 and 1:2.5 (PM6 concentration: 7 mg mL⁻¹) were prepared. In batch 3, a pre-mixed PM6:PYF-T-o solution (1:1.2, w/w) (PM6 concentration: 7 mg mL⁻¹) was prepared along with an identical solution containing PN as a solid additive (100 wt% relative to the total donor and acceptor solids). All solutions were dissolved in CB and stirred overnight at 70 °C. During deposition, the total solution flow rates were set to 120, 96, 72, 48, or 24 μL min⁻¹ in batches 1 and 2, resulting in WFTs of 3.08, 2.46, 1.85, 1.23, and 0.615 μm, respectively, while batch 3 was fixed at 72 μL min⁻¹. Relative pump flow rates were varied to control D [thin space (1/6-em)] :A ratios, film thickness, or the solid content of 2-PN. The TDD was kept constant in batch 1 (43.1, 34.5, 25.8, 17.2, and 8.62 μg cm⁻²) but varied in batches 2 and 3. Deposition conditions were consistent with earlier experiments, and the film composition at each position was determined using the same calculation methods. Dataset preparation followed the same feature design and data-cleaning protocols described previously. The final dataset comprised 2840 data points, including 10 features and 1 target label for PCE prediction. Two additional feature columns, ‘PYF-T-o DD’ and ‘PN DD’, were introduced to represent PYF-T-o as the acceptor material and 2-PN as the additive, respectively. Any inconsistent data were removed before analysis.

Author contributions

N. G. A. and D. V. conceived the idea and designed the experiments. N. G. A. fabricated all OPV devices and performed photovoltaic characterizations. D. V. developed automatic roll-to-roll fabrication/testing machines and the software to run the machines and to collect data. N. G. A. designed all illustrations in this paper. N. G. A. wrote machine learning codes with assistance of D. V. Y. L. synthesized PEDOT:F, and C.-Q. M. and Z. W. provided Qx-1 material. T. R. conducted AFM measurement. L. N. W. T., S. Song., M. G., Y. Z., J. Y. K., U. B. and D. V. participated in the discussion of the experiment results. N. G. A. wrote the manuscript and N. G. A., J. Y. K., U. B., and D. V. revised the manuscript.

Conflicts of interest

The authors declare no competing interests.

Data availability

All data generated or analyzed during this study are included in the published article and its supplementary information (SI), supplementary data files and supplementary video. See DOI: https://doi.org/10.1039/d5ee02815a.

In this study, all ensemble-based models were built using scikit-learn and xgboost, which are open-source libraries. All code, relevant datasets and supplementary data files are openly available at DOI: https://doi.org/10.6084/m9.figshare.28140359.

Acknowledgements

This work is supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A3A14039635), the Australian Centre for Advanced Photovoltaics (ACAP) program funded by the Australian Government through Australian Renewable Energy Agency (ARENA), CSIRO's Labs of the Future fund, CAS-CSIRO Collaborative Research Project (121E32KYSB20190021), and the Australian Research Council (ARC) Centre of Excellence in Exciton Science (ACEx: CE170100026), the NRF grant funded by the Korea government (MSIT) (RS-2024-00429694). We would like to express our gratitude to Tom Raeber for his assistance with AFM measurements.

References

M. Kaltenbrunner, M. S. White, E. D. Głowacki, T. Sekitani, T. Someya, N. S. Sariciftci and S. Bauer, Nat. Commun., 2012, 3, 770 CrossRef PubMed.
L. Hong, H. Yao, Z. Wu, Y. Cui, T. Zhang, Y. Xu, R. Yu, Q. Liao, B. Gao, K. Xian, H. Y. Woo, Z. Ge and J. Hou, Adv. Mater., 2019, 31, 1903441 CrossRef PubMed.
L. Xu, Y. Xiong, S. Li, W. Zhao, J. Zhang, C. Miao, Y. Zhang, T. Zhang, J. Wu, S. Zhang, Q. Peng, Z. Wang, L. Ye, J. Hou and J. Wang, Adv. Funct. Mater., 2024, 34, 2314178 CrossRef CAS.
J. Kong, M. Mohadjer Beromi, M. Mariano, T. Goh, F. Antonio, N. Hazari and A. D. Taylor, Nano Energy, 2017, 38, 36–42 CrossRef CAS.
Y. Cui, C. Yang, H. Yao, J. Zhu, Y. Wang, G. Jia, F. Gao and J. Hou, Adv. Mater., 2017, 29, 1703080 CrossRef PubMed.
N. G. An, T. Lee, J. Heo, J. W. Kim, S. Song, W. Lee, B. Walker, E. Lim and J. Y. Kim, Sol. RRL, 2021, 5, 2000742 CrossRef CAS.
Y. Sun, L. Wang, C. Guo, J. Xiao, C. Liu, C. Chen, W. Xia, Z. Gan, J. Cheng, J. Zhou, Z. Chen, J. Zhou, D. Liu, T. Wang and W. Li, J. Am. Chem. Soc., 2024, 146, 12011–12019 CrossRef CAS PubMed.
Y. Jiang, S. Sun, R. Xu, F. Liu, X. Miao, G. Ran, K. Liu, Y. Yi, W. Zhang and X. Zhu, Nat. Energy, 2024, 9, 975–986 CrossRef CAS.
C. Chen, L. Wang, W. Xia, K. Qiu, C. Guo, Z. Gan, J. Zhou, Y. Sun, D. Liu, W. Li and T. Wang, Nat. Commun., 2024, 15, 6865 CrossRef CAS PubMed.
L. Zhu, M. Zhang, G. Zhou, Z. Wang, W. Zhong, J. Zhuang, Z. Zhou, X. Gao, L. Kan, B. Hao, F. Han, R. Zeng, X. Xue, S. Xu, H. Jing, B. Xiao, H. Zhu, Y. Zhang and F. Liu, Joule, 2024, 8, 3153–3168 CrossRef CAS.
N. Wei, J. Chen, Y. Cheng, Z. Bian, W. Liu, H. Song, Y. Guo, W. Zhang, Y. Liu, H. Lu, J. Zhou and Z. Bo, Adv. Mater., 2024, 36, 2408934 CrossRef CAS PubMed.
X. Rodríguez-Martínez, E. Pascual-San-José, Z. Fei, M. Heeney, R. Guimerà and M. Campoy-Quiles, Energy Environ. Sci., 2021, 14, 986–994 RSC.
N. Meftahi, M. A. Surmiak, S. O. Fürer, K. J. Rietwyk, J. Lu, S. R. Raga, C. Evans, M. Michalska, H. Deng, D. P. McMeekin, T. Alan, D. Vak, A. S. R. Chesman, A. J. Christofferson, D. A. Winkler, U. Bach and S. P. Russo, Adv. Energy Mater., 2023, 13, 2203859 CrossRef CAS.
X. Du, L. Lüer, T. Heumueller, J. Wagner, C. Berger, T. Osterrieder, J. Wortmann, S. Langner, U. Vongsaysy, M. Bertrand, N. Li, T. Stubhan, J. Hauch and C. J. Brabec, Joule, 2021, 5, 495–506 CrossRef CAS.
C. Liu, L. Lüer, V. M. L. Corre, K. Forberich, P. Weitz, T. Heumüller, X. Du, J. Wortmann, J. Zhang, J. Wagner, L. Ying, J. Hauch, N. Li and C. J. Brabec, Adv. Mater., 2024, 36, 2300259 CrossRef CAS PubMed.
J. Zhang, B. Liu, Z. Liu, J. Wu, S. Arnold, H. Shi, T. Osterrieder, J. A. Hauch, Z. Wu, J. Luo, J. Wagner, C. G. Berger, T. Stubhan, F. Schmitt, K. Zhang, M. Sytnyk, T. Heumueller, C. M. Sutter-Fella, I. M. Peters, Y. Zhao and C. J. Brabec, Adv. Energy Mater., 2023, 13, 2302594 CrossRef CAS.
Z. Liu, J. Zhang, G. Rao, Z. Peng, Y. Huang, S. Arnold, B. Liu, C. Deng, C. Li, H. Li, H. Zhi, Z. Zhang, W. Zhou, J. Hauch, C. Yan, C. J. Brabec and Y. Zhao, ACS Energy Lett., 2024, 9, 662–670 CrossRef CAS.
N. Majeed, M. Saladina, M. Krompiec, S. Greedy, C. Deibel and R. C. I. MacKenzie, Adv. Funct. Mater., 2020, 30, 1907259 CrossRef CAS.
R. Søndergaard, M. Hösel, D. Angmo, T. T. Larsen-Olsen and F. C. Krebs, Mater. Today, 2012, 15, 36–49 CrossRef.
F. C. Krebs, N. Espinosa, M. Hösel, R. R. Søndergaard and M. Jørgensen, Adv. Mater., 2014, 26, 29–39 CrossRef CAS PubMed.
B. Roth, R. R. Søndergaard and F. C. Krebs, Handbook of Flexible Organic Electronics, ed. S. Logothetidis, Woodhead Publishing, Oxford, 2015, pp. 171–197 Search PubMed.
M. Wagner, A. Distler, V. M. Le Corre, S. Zapf, B. Baydar, H.-D. Schmidt, M. Heyder, K. Forberich, L. Lüer, C. J. Brabec and H. J. Egelhaaf, Energy Environ. Sci., 2023, 16, 5454–5463 RSC.
N. G. An, J. Y. Kim and D. Vak, Energy Environ. Sci., 2021, 14, 3438–3446 RSC.
L. W. T. Ng, N. G. An, L. Yang, Y. Zhou, D. W. Chang, J.-E. Kim, L. J. Sutherland, T. Hasan, M. Gao and D. Vak, Cell Rep. Phys. Sci., 2024, 5, 102038 CrossRef CAS.
H. C. Weerasinghe, N. Macadam, J.-E. Kim, L. J. Sutherland, D. Angmo, L. W. T. Ng, A. D. Scully, F. Glenn, R. Chantler, N. L. Chang, M. Dehghanimadvar, L. Shi, A. W. Y. Ho-Baillie, R. Egan, A. S. R. Chesman, M. Gao, J. J. Jasieniak, T. Hasan and D. Vak, Nat. Commun., 2024, 15, 1656 CrossRef CAS PubMed.
D. Vak, K. Hwang, A. Faulks, Y.-S. Jung, N. Clark, D.-Y. Kim, G. J. Wilson and S. E. Watkins, Adv. Energy Mater., 2015, 5, 1401539 CrossRef.
C. Li, J. Zhou, J. Song, J. Xu, H. Zhang, X. Zhang, J. Guo, L. Zhu, D. Wei, G. Han, J. Min, Y. Zhang, Z. Xie, Y. Yi, H. Yan, F. Gao, F. Liu and Y. Sun, Nat. Energy, 2021, 6, 605–613 CrossRef CAS.
R. Polikar, in Ensemble Machine Learning: Methods and Applications, ed. C. Zhang and Y. Ma, Springer New York, New York, NY, 2012, pp. 1–34 Search PubMed.
J. W. Cort and M. Kenji, Clim. Res., 2005, 30, 79–82 CrossRef.
A. Botchkarev, Interdiscip. J. Inf. Knowl. Manage., 2019, 14, 45–79 Search PubMed.
A. Jierula, S. Wang, T.-M. Oh and P. Wang, Appl. Sci., 2021, 11, 2314 CrossRef CAS.
R. J. Hyndman and A. B. Koehler, Int. J. Forecast., 2006, 22, 679–688 CrossRef.
M. J. W. Davide Chicco and G. Jurman, PeerJ Comput. Sci., 2021, 7, e623 CrossRef PubMed.
Y. Shi, Y. Chang, K. Lu, Z. Chen, J. Zhang, Y. Yan, D. Qiu, Y. Liu, M. A. Adil, W. Ma, X. Hao, L. Zhu and Z. Wei, Nat. Commun., 2022, 13, 3256 CrossRef CAS PubMed.
Y. Jiang, X. Dong, L. Sun, T. Liu, F. Qin, C. Xie, P. Jiang, L. Hu, X. Lu, X. Zhou, W. Meng, N. Li, C. J. Brabec and Y. Zhou, Nat. Energy, 2022, 7, 352–359 CrossRef CAS.
S. Song, K. T. Lee, C. W. Koh, H. Shin, M. Gao, H. Y. Woo, D. Vak and J. Y. Kim, Energy Environ. Sci., 2018, 11, 3248–3255 RSC.
S.-I. Na, Y.-H. Seo, Y.-C. Nah, S.-S. Kim, H. Heo, J.-E. Kim, N. Rolston, R. H. Dauskardt, M. Gao, Y. Lee and D. Vak, Adv. Funct. Mater., 2019, 29, 1805825 CrossRef.

Click here to see how this site uses Cookies. View our privacy policy here.