Robert W.
Epps
a,
Amanda A.
Volk
a,
Kristofer G.
Reyes
b and
Milad
Abolhasani
*a
aDepartment of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27606, USA. E-mail: abolhasani@ncsu.edu; Web: http://www.abolhasanilab.com
bDepartment of Materials Design and Innovation, University at Buffalo, Buffalo, New York 14260, USA
First published on 9th March 2021
Autonomous robotic experimentation strategies are rapidly rising in use because, without the need for user intervention, they can efficiently and precisely converge onto optimal intrinsic and extrinsic synthesis conditions for a wide range of emerging materials. However, as the material syntheses become more complex, the meta-decisions of artificial intelligence (AI)-guided decision-making algorithms used in autonomous platforms become more important. In this work, a surrogate model is developed using data from over 1000 in-house conducted syntheses of metal halide perovskite quantum dots in a self-driven modular microfluidic material synthesizer. The model is designed to represent the global failure rate, unfeasible regions of the synthesis space, synthesis ground truth, and sampling noise of a real robotic material synthesis system with multiple output parameters (peak emission, emission linewidth, and quantum yield). With this model, over 150 AI-guided decision-making strategies within a single-period horizon reinforcement learning framework are automatically explored across more than 600000 simulated experiments – the equivalent of 7.5 years of continuous robotic operation and 400 L of reagents – to identify the most effective methods for accelerated materials development with multiple objectives. Specifically, the structure and meta-decisions of an ensemble neural network-based material development strategy are investigated, which offers a favorable technique for intelligently and efficiently navigating a complex material synthesis space with multiple targets. The developed ensemble neural network-based decision-making algorithm enables more efficient material formulation optimization in a no prior information environment than well-established algorithms.
While the aforementioned systems greatly improve upon manual, ad-hoc reaction exploration strategies, they still often require substantial quantities of reactants (up to 200 mL of solvents per experimental condition) and generate significant amounts of waste (e.g., 1–2 L of washing solvents). To address the high chemical consumption of batch reactors, microfluidic reactors, which can reduce chemical consumption and waste generation by more than two orders of magnitude, have emerged as an effective platform for autonomous robotic experimentation.21 While the majority of microfluidic reactors have been utilized for autonomous exploration of organic syntheses,6,7,13,22,23 many have recently been used for colloidal nanomaterial syntheses, primarily the formation of colloidal quantum dots (QDs).24–26 These highly efficient and modular material synthesis platforms are particularly adept in studying vast colloidal nanomaterial synthesis spaces with multiple input/outputs. Furthermore, the reduced experimental variability of microfluidic platforms can significantly improve material synthesis precision and accelerate the discovery of (i) optimized materials with desired optical and optoelectronic properties and (ii) fundamental reaction mechanisms controlling the physicochemical properties of target materials. The design of microfluidic reactors integrated within autonomous experimentation platforms vary from temperature-controlled, single-phase flow with integrated robotic sample handling27 to multi-phase flow formats28 for reduced fouling and improved reactor lifetime.
In developing fully autonomous microfluidic material exploration platforms, choosing an effective experiment selection strategy in each iteration is vital to minimize the cost of each optimization campaign. The earliest usage of a self-optimizing flow reactor for the synthesis of colloidal nanomaterials applied the algorithm Stable Noisy Optimization by Branch and FIT (SNOBFIT),21,29 which is a robust quadratics-based system frequently used in guided reaction optimizations. However, recent studies of higher dimensional parameter space chemical systems have gravitated towards more efficient system modeling approaches that use artificial intelligence (AI) based decision-making strategies.
The use of AI-guided chemical synthesis strategies, including Bayesian methods and reinforcement learning (RL) are more appropriate in the setting of autonomous, closed-loop experimental science. Here, obtaining data means running an experiment to obtain an experimental response, which can be noisy, suffer from equipment failure, or involve a complex response. Traditional optimization methods, including gradient-based techniques, are therefore not directly applicable – as they treat such an experiment as a “function evaluation” and are particularly susceptible to noise. These methods, however, can be used effectively in tandem with surrogate models – computer models that are trained on a set of experimental data and treated as the ground-truth. After training, surrogates are optimized in silico and the resulting optimized settings are tested experimentally. This process, however, decouples the general learning of a globally accurate surrogate from optimizing, which leads to inefficiencies. In contrast, modern, closed-loop AI based techniques use Bayesian models to capture evolving beliefs of the experimental response function, but acknowledge that these beliefs are uncertain. Uncertainties are used within the AI and decision-making framework to strategically explore experiment space, with a specific goal of identifying the optimal regions. In this way, rather than decoupling learning and optimization, accurate beliefs of the response co-evolve simultaneously with identification of promising response regions. Such models are then used within an AI framework to make experimental decisions that balance between exploring uncertain regions of accessible synthesis space vs. more directly achieving experimental objectives; they may also factor in operational considerations such as time and cost constraints of running an experimental campaign.
Given the nature of material synthesis space search algorithms under limited resources, the best possible theoretical methods can sometimes be beaten by random sampling within a given closed-loop experimental optimization campaign. The consistency in which an AI agent can search through a vast material synthesis space is, therefore, a more important algorithm metric than the outcome of a single optimization. However, in a physical experimentation platform, an impractically large number of experiments are required to determine the optimization variance for a large set of input parameters (≥5). In this work, we develop a simulation-optimization framework for evaluating AI algorithm performance within a single-period horizon RL setting. Single-period horizon refers to a myopic decision-making policy that maximizes expected reward through the consideration of reward/regret obtained from a single experimental decision. Here, reward/regret functions can measure information gained from the result of running an experiment but can also factor in the time to run such an experiment or the amount of material consumed. Surrogate modeling of experimentally obtained QD synthesis data using a robotic quantum dot synthesizer platform – shown in Fig. 1 – facilitated multiple material synthesis exploration campaigns. We selected lead halide perovskite (LHP) QDs as our testbed of AI-guided material synthesis exploration, due to their potential impact in next-generation display and photovoltaic devices.30–32
The surrogate model used in this study, built from over 1000 physically conducted experiments, replicates the experimentally derived failure rate of the microfluidic material synthesis platform, non-emitting sample regions, sampling variance for three output parameters, and predicted outputs. Using the developed AI-guided LHP QD synthesis framework within a high-performance computing environment, we conduct the equivalent of over 600000 experiments (normally requiring over 400 L of precursors and 7.5 years of continuous operation in a microfluidic reactor) to systematically study the performance of more than 150 AI-guided material space exploration strategies. We validate the accuracy of the developed surrogate model in replicating an experimental space with experimental optimization campaigns conducted using the developed autonomous QD synthesis bot. Then, we automatically investigate different aspects of RL-based LHP QD synthesis through tuning of the belief model architecture, AI-guided decision policies, and data boosting algorithms. The results of the optimized AI-guided material synthesis agent are then compared with four established optimization algorithms. With this optimized RL-based LHP QD synthesis approach, we demonstrate a mixed exploration/exploitation strategy for comprehensively studying the vast colloidal synthesis universe of LHP QDs in as few experiments as possible (i.e., minimum time and chemical resources).
All experiments presented in this work cover material exploration from a starting position of no prior knowledge. While many AI-guided experimentation methods implement prior knowledge or mechanistic models as initial estimates of the material space,22 the limited data availability, composition and morphological diversity, and high reaction sensitivity of many colloidal nanomaterials make informed material exploration difficult to conduct effectively.
Further building upon this distinction, (ii) the developed surrogate model checks whether the given input conditions are within a synthesizable region of the colloidal synthesis space, shown in Fig. 3A. Using a naïve Bayes classifier trained on the starting 1000 sample data set, the feasibility model predicts whether the provided input parameters (Xi) will produce an emitting solution of LHP QDs. Between experiment failure and unfeasible sampling regions, 4.5% of the training data set resulted in no measurable parameters. Given that the input parameters are projected to produce a viable experiment, (iii) the output of each parameter is predicted by sampling from a ground truth model comprised of three Gaussian process regressions (GPRs) corresponding to each of the three output parameters – peak emission energy (EP), emission full-width at half-maximum (EFWHM), and photoluminescence quantum yield (Φ).
The large data set sufficiently trained the ground truth model (i.e., LHP QD synthesis simulator) for accurate prediction of the output parameters as well as the quality metric objective functions (Z) for 200 test experiments, shown in Fig. 3B–E. Ensemble neural network (ENN) models were shown to be equally effective in prediction accuracy; however, GPs were chosen for the reduced computational requirements. Finally, (iv) to simulate the expected measurement variability, addition, homoscedastic Gaussian noise is applied to each of the output values as determined by the variance of each parameter (see ESI Section S.1‡ for more details). Using the developed multi-stage surrogate model of a real-world microfluidic material synthesis platform, the parameters and strategies surrounding AI-guided material synthesis algorithms can be extensively explored. The surrogate model was further compared to real-world experimental data sets through campaigns designed to mimic optimizations conducted in prior work. The optimizations, using the ENN model reported in prior literature, produced optimization algorithm performance windows that capture the real-world data effectively – see ESI section S.2.‡
In this work, we evaluate two objective function techniques: (i) a weighted mean utility function and (ii) probability sampling of the three output parameters. The utility form of the objective function (Zutil) is a weighted mean of the three outputs defined by percentage weights [APE, AFWHM, APLQY] after appropriate non-dimensionalization and inversion of the variables:
(1) |
Parameter sampling operates similarly; however, instead of applying these weights to the normalized output parameters, each parameter is individually optimized. At the time of experimentation, samples are allocated with a selection probability corresponding to [APE, AFWHM, APLQY], as shown by
(2) |
Using the above-mentioned multi-objective strategies with an expected improvement (EI) decision policy on a GP belief model (ESI Section S.3‡), seven sets of data weights were tested with 100 replicates for each LHP QD optimization campaign. As shown in Fig. 5A, objective function sampling outperformed probability sampling for most parameter weight sets, and, while selection of appropriate quality metrics across different output weight models is difficult to impose without bias, the weight set [0.8, 0.1, 0.1] appears to most holistically optimize the three LHP QD output parameters. The valuation of each parameter is ultimately a qualitative decision, and a comparison of the predicted optical spectra indicates that most of the tested data weights are sufficient optimization tools producing similar results (Fig. 5B). Upon closer inspection, however (Fig. 5A), the weight set [0.8, 0.1, 0.1] produced the highest precision EP after 20 experiments without sacrificing quality in EFWHM and Φ. As expected, the quality of the highest performing measurement consistently improves as the number of experiments increases and the best measured spectra across all replicates narrows onto an optimum (Fig. 5C).
Interestingly, the simulation results for the weight set [0.98, 0.1, 0.1], which should prioritize achieving an optimal EP value underperforms in terms of the best achieved EP value compared to a more balanced weight set [0.8, 0.1, 0.1] (Fig. 6A and B). This observed behavior points to a training regularization effect of running experiments not specifically geared toward achieving a single objective. By focusing only on EP values, the EI policy tended to not as effectively explore the accessible LHP QD synthesis space in order to discover potentially better regions for EP, and this haste by EI to exploit rather than explore is mitigated when opting for a multi-target optimization. This undesirable behavior of EI has been previously reported.34 For example, Gongora et al.35 suggests a random discretization of design space as a preprocessing step prior to calculating EI values to prevent early exploitation. With respect to EFWHM and Φ, the balanced weight sets allowed for notable improvements over the EP biased weighting, as expected. Furthermore, the balanced weight set was able to improve the FWHM and PLQY significantly over the biased weight experiments (Fig. 6C and D).
Individual NNs fi(x) can be trained to obtain predictions of an experimental response given some set of experimental inputs x. Associated with each model within the ensemble is a weight, θi, which represents the posterior probability that model fi describes the ground-truth function generating the data. Through combination of these NNs and their weight terms into the ensemble, we can obtain estimates of the response and corresponding uncertainty
(3) |
(4) |
(5) |
Such estimates, uncertainties, and covariances are then used in subsequent decision-making policies to calculate expected rewards/regret for running a particular experiment. While other types of neural networks are appropriate for high-dimensional structured input data, such as convolutional neural networks for image data,37,38 or for sequential data, such as long short-term memory networks,39,40 in the case of solution-processed materials, the input data is neither high-dimensional nor structured. As such, we treat the five-dimensional LHP QD synthesis parameters directly as input features to a dense NN.
Next, we systematically varied the architecture of the neural networks within the ENN model from one to six layers with one to five nodes per layer – further detailed in ESI Section S.4.‡ In addition to this strategy, different combinations of constraints were employed on a set of randomly selected architectures where each model in the ensemble featured a different combination of nodes and layers, as shown in Table 1.
Variant | N L | N N |
---|---|---|
A | 1 to 6 | 1 to 5 |
B | 1 to 6 | 1 to 3 |
C | 1 to 6 | 2 |
D | 2 to 4 | 1 to 5 |
E | 2 to 4 | 1 to 3 |
F | 2 to 4 | 2 |
G | 3 | 1 to 5 |
H | 3 | 1 to 3 |
After training an ENN model on acquired data sets, an EI decision policy was used to search the generated LHP QD synthesis model for the next experiment to conduct. The quality of the optimization was evaluated by fitting the median best measured objective function value from 100 optimization replicates to a learning rate decay curve to attain the learning rate decay
(6) |
Z b(ie) is the best measured objective function value as a function of the experiment number, η0 is the initial learning rate, ie is the experiment number, and Z∞ is the predicted convergent objective function value as ie → ∞. Additionally, the median best measured objective function value after 25 experiments was compared across the tested models.
Throughout the ENN-based optimization campaigns, the cascade forward structure consistently outperformed feed forward neural networks. For both equivalent architecture and total parameters – see ESI Section S.5‡ – cascade forward-based ENNs resulted in lower and higher than corresponding feed forward ENNs (Fig. 7B–D). It should be noted that this advantage is present while using typical training parameters. Under an alternative training algorithm, feed forward ENNs could significantly improve in performance, even beyond the capabilities of cascade forward ENNs. While many of the randomized neural network architecture ENNs performed similarly to the constant structure ENNs, the ENN architecture variant F (i.e., two to four layers with two nodes per layer) was able to reach the lowest median with a moderately high Furthermore, increasing the size of the ensembles from 50 to 200 models (Fig. 7B) improved model performance notably across all architectures. Shown in Fig. 7E, based on the behavior of this highest performing ENN architecture, the improvements were unlikely to continue past 200 models. While the number of models in the ensemble may be reduced to lower computational costs at the expense of precision, ensembles greater than 200 models do not provide any clear additional benefit.
Shown in Fig. 8A, both EI and UCB reached a lower than MV and EPLT throughout the entirety of the optimization. MV is not expected to sample the optimal formulation, but it is reasonable to expect the policy to build an accurate belief model. However, when the ENNs built through each policy were instructed to predict the optimal conditions after each experiment (Fig. 8B), EI and UCB still produced lower median objective function values than both MV and EPLT. While UCB and EI resulted in very similar learning curves, the UCB policy presented here is the result of tuning the explore-exploit control parameter (ε) over twelve campaigns – see ESI Section S.6.‡ EI inherently does not require this same tuning and is therefore considered the more favorable decision policy for use in the AI-guided LHP QD synthesis.
Furthermore, the advantage of EI over EPLT demonstrates the necessity of partially exploratory policies over purely exploiting a belief model. This advantage is relatively small for the 2.2 eV target emission used in this study; however, expanding the number of target emissions (Fig. 8C and D) further demonstrates the superiority of EI. Throughout the entirety of the tunable parameter space, EI results in lower than EPLT and can reach these values more consistently (Fig. 8E and F). This advantage is most apparent near the outer bounds of the attainable synthesis space – i.e. at an Ep of 1.9 eV and 2.4 eV. Exploitation of the model can likely capture parameters near the optima, but due to the challenges of extrapolation outside the understood reaction space, more complex strategies that attempt to first explore that external space are required when optimizing near the system extremes.
Shown in Fig. 9A, the newly developed ENN-EI structure provides clear advantages over the prior reported structure in both computational cost and optimization efficiency and effectiveness. Furthermore, the inclusion of the ensemble boosting algorithm Adaboost.RT44,45 produced no clear advantage over the standard ENN which uses a uniform weighted mean within the ensemble. Finally, the effect of staggering the experiment execution and synthesis formulation selection was evaluated. Staggered sampling, as reported in our prior work, allows the optimization algorithm to search for the next experimental conditions while the robotic material synthesizer is operating, therefore maximizing the available time of the experimental platform. This process is conducted by providing experiments 1 to ie − 1 to the AI-guided synthesis algorithm while sample ie is being collected. This process reduces the total optimization time and, as demonstrated by the simulation campaigns, does not significantly reduce the effectiveness of the AI-guided synthesis algorithm.
In the next set of simulation campaigns, we investigated the performance of the optimized ENN-EI algorithm vs. established optimization algorithms. The newly tuned ENN-based autonomous material synthesis method provided a clear advantage over all tested established optimization techniques (Fig. 9B). Even though both evolutionary strategies, CMA-ES and NSGA-II, were pre-tuned for their generation population size (ESI Section S.7‡), neither matched the performance of ENN-EI. Additionally, SNOBFIT performed similarly to a pure exploration policy, a finding similar to our prior reported work. It should be noted that at higher experiment numbers, SNOBFIT is expected to converge onto an optima, while pure exploration is most likely to fail. In comparing the highest performing AI-guided synthesis strategies, the ENN method demonstrated a clear advantage over GP – shown in Fig. 9B–D.
To better understand the performance improvement of ENN models over GPs, the marginal impact of an additional data point after every experiment was studied. As shown in Fig. 9C and D, the uncertainties associated with the experimental response before and after the experiment is run (gray and red lines) are closer to one another when using an ENN model than they are when using GP. This suggests that GPs (at least with a squared-exponential kernel) learn locally – a single data point describing the experimental response at some input x provides significant correlative information for the experimental responses x′ nearby. In contrast, in the ENN, the information provided by a single data point impacts the entire training of the ensemble models, especially in the case of low data. The correlation between prediction error and model uncertainty may be further understood using the Kendall rank coefficient. By comparing the median uncertainty before sampling and prediction error (shown in Fig. 9C–D) through a one-sided Kendall tau test, the ENN resulted in a p-value of <0.01, indicating that there is a correlation between the uncertainty and error, while the GP resulted in a p-value of 0.61, which failed to identify a correlation.
As a result, a single data point may impact the prediction being made globally. This is not unlike the effect seen when training a parametric model – a change in parameter values impacts the predictions made by such model for any choice of inputs to that model. Therefore, the ENN updates models globally while the GP updates them locally. This effect may be exaggerated particularly in the case of the EI policy, which has been demonstrated to stagnate due an eventual imbalance between exploration and exploitation. Because uncertainties are not decreased in a local manner with ENN, such a stagnation may be mitigated, allowing for a more exploratory search to take place with ENN-EI versus GP-EI.
As shown in Fig. 10C and Fig. 11A–D, the policies reliant on EPLT consistently struggle to reach the stopping criteria at the outer bounds of the LHP QD synthesis space, a finding similar to that shown in the uninformed studies. Even with a prior training set (MV-EPLT), EPLT struggles to find these outer bound optima, which suggests that MV on its own does not completely capture the full range of relevant colloidal synthesis conditions in the number of experiments allotted. In the internal positions of the material synthesis space, i.e., setpoints 2.0 eV to 2.3 eV, EPLT performs similarly to EI. However, EI-EPLT provided the most efficient navigation of the LHP QD synthesis space. This meta-decision policy operates efficiently due to the targeted exploration of the outer bounds of the material synthesis space through EI, followed by the exploitation of the newly structure model through an EPLT policy.
Four types of meta-decision policies were evaluated through multi-stage optimization completion criteria. The optimizations were performed by consecutively targeting peak emissions of 1.9–2.4 eV using 0.1 eV intervals and additive data across the set-points. The stopping criteria were met when, at each set point, the measured peak emission was within 2 meV of the target peak emission energy. The MV-EPLT policy was conducted by first sampling a starting set size of MV selected experiments, followed by EPLT for each of the target emissions (Fig. 10A). The EI-EPLT policy operated by performing consecutive EI experiments followed by a single EPLT selected experiment. The obtained ratio of EPLT to EI experiments (nEPLT/nEI) for each optimization campaign is shown in Fig. 10B. We also conducted pure EPLT and pure EI as control experiments. As shown in Fig. 10A and B, an MV starting set size of 12–25 experiments passed the stopping criteria in the shortest number of experiments for the MV-EPLT policy, but the EI-EPLT policy at a ratio of 0.25 outperformed MV-EPLT by 8 experiments.
It should be noted that while the use of a surrogate model has enabled further development of a range of experiment selection algorithms, there are limitations to surrogate representations of real-world systems. Most notably, the ground truth model surface is likely smoother than the actual system, resulting in more optimistic estimates of the selection algorithm effectiveness. In this case, the performance of ENN methods in the surrogate model will likely be boosted over a real-world system. However, based on the comparison of real-world optimization runs with the surrogate system (ESI Section S.2‡), the developed surrogate model appears to approximately reflect the expected performance of the tested algorithms. Furthermore, the comparison of the proposed optimal ENN algorithm with various ENN variants as well as with established algorithms is intended to demonstrate the effectiveness of the methods and not necessarily claim a holistic advantage.
Footnotes |
† The surrogate model used to generate the data in this study is publicly available for download at https://github.com/AbolhasaniLab/Reaction-Optimization-Surrogate-Model. |
‡ Electronic supplementary information (ESI) available: Surrogate model design and validation, Gaussian process regression training parameters, ensemble neural net architecture and tuning, UCB parameter tuning, and evolutionary algorithm tuning. See DOI: 10.1039/d0sc06463g |
This journal is © The Royal Society of Chemistry 2021 |