Open Access Article
Dan Guevarraab,
Michael J. Statt*c,
Kostiantyn Popovichc,
Brian A. Rohr*c,
John M. Gregoire
ab,
Kevin Tran
d,
Santosh K. Suram
d,
Joel A. Haber
*ab and
Willie Neiswanger*e
aDivision of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125, USA. E-mail: jahaber@caltech.edu
bLiquid Sunlight Alliance, California Institute of Technology, Pasadena, CA 91125, USA
cModelyst LLC, Palo Alto, CA 94306, USA. E-mail: michael.statt@modelyst.io; brian.rohr@modelyst.io
dToyota Research Institute, Los Altos, CA 94022, USA
eDepartment of Computer Science, University of Southern California, Los Angeles, CA 90089, USA. E-mail: neiswang@usc.edu
First published on 4th November 2025
Solid state materials are central to many modern technologies in which a given material may be exposed to a variety of environments. The material properties often vary with the sequence of environments in an irreversible manner, resulting in a quintessential path-dependency in experimental observables. While sequential learning techniques have been effectively deployed for accelerating learning of state properties of materials, they often use a consistent environment path in all experiments. To elevate such techniques for making optimal decisions in experimental investigations of path-dependent properties, we introduce an iterated expected information gain acquisition function that optimizes over entire experimental trajectories. This approach is implemented within a cloud-based Materials Acceleration Platform architecture utilizing an event-driven stateful broker coupled with remote HELAO (Hierarchical Experimental Laboratory Automation and Orchestration) instances and an AI science manager. The platform's efficacy was demonstrated through a case study optimizing multi-step spectro-electrochemical experiments to identify optically stable potential windows in (Co–Ni–Sb)Oz metal oxides. The system successfully integrated AI-driven experiment design, remote laboratory automation, and cloud-based data infrastructure, validating the platform's capability for managing complex, adaptive, path-dependent workflows in materials discovery.
The substantial cost associated with measurement time, coupled with the general scarcity of experimental data5 for such path-dependent phenomena, positions sequential learning techniques as highly suitable for the efficient exploration of these complex parameter spaces. While active experimental design has been pursued for many settings without path dependency,6–8 effective experiment selection in our setting must explicitly account for the path-dependence of predicted properties; the outcome of a future measurement is contingent not only on the current state but also on the sequence of preceding observations. To address this challenge, acquisition strategies must consider the influence of experimental history on both the observations themselves and the accessibility of subsequent data points.
We introduce an acquisition function termed iterated expected information gain (iEIG), extending the classical EIG concept.9–11 This function evaluates the anticipated reduction in uncertainty over a trajectory of potential observations. It explicitly incorporates the compounding effect of early experimental decisions on the informativeness and feasibility of subsequent measurements, which is particularly relevant when initial observations constrain or enable access to parts of an experimental path. Consequently, the total information gain accrued over a path becomes a more pertinent objective than its pointwise equivalent.
This active learning strategy is deployed within a cloud-based Materials Acceleration Platform (MAP), i.e., a self-driving laboratory.12–14 While early distributed MAPs demonstrated remote control,15 and cloud-server management for shared hardware,16 recent architectures utilize “stateless” brokers17 to integrate diverse resources across institutions. Although stateless designs offer resilience, event-driven “stateful” brokers18 provide a verifiable event log essential for data provenance, aligning with modern cloud data pipelines (e.g., AWS, Azure, GCP) that inherently support scalability and reliability. Our implementation couples a stateful, event-driven cloud broker with remote instances of the Hierarchical Experimental Laboratory Automation and Orchestration (HELAO) system19 and an AI science manager operating in a closed loop. The platform's utility is demonstrated via a case study involving the optimization of multi-step spectro-electrochemical experiments to identify optically stable potential windows in (Co–Ni–Sb)Oz metal oxides.
Operationally, the working electrode chamber was located beneath a quartz window positioned below the fiber optic terminal. The entire cell assembly was mounted to a fixed arm and kept at constant elevation. Motorized X-, Y-, and Z-stages control the translation and height of the sample composition library beneath the cell assembly. During operation, the composition library was translated to a given sample location then raised to engage the cell assembly. The assembly was gasket-sealed around the working electrode chamber aperture, then filled with electrolyte. Electrolyte was pulled into the working chamber and out through two flow paths: one high-flow path directly across the working electrode chamber, and a low-flow path that established liquid contact with the counter electrode membrane or separator frit. An independent flow path and peristaltic pump was used for the counter electrode chamber. For every sample composition, electrolyte flow was constant throughout the series of spectro-electrochemical measurements, then stopped and both working and counter chambers drained at a higher pump rate prior to disengaging the cell assembly. When switching between electrolytes, the cell assembly was purged with the working electrolyte at ∼2 mL per minute for three minutes, enough for 30 changeovers of the ∼200 μL working electrode chamber volume. Human interaction was required in replacing the electrolyte reservoir and mounting the binary and ternary composition libraries to the motorized stage. A connectorized white light LED (Doric Lenses LEDC2_385/W35) was used as the illumination source. An integrating sphere (Spectral Products AT-IS-1) was positioned about 2 m below the bottom side of the composition library and connected to a spectrometer (Spectral Products SM303) for transmission measurements.
An optical “instability” metric, β(V), was defined to quantify spectral changes during chronoamperometry (CA) relative to an initial open-circuit potential (OCV) measurement. Specifically, β(V) represents the mean relative change in absorbance, ΔA(V, λ) averaged over the wavelength range 430–700 nm during the final 5 seconds of each 85-second CA step,
![]() | (1) |
For each catalyst composition and pH, a sequence of eleven 85-second CA steps was executed at potentials evenly spaced between 0.0 V and 2.0 V vs. RHE. Each CA step was preceded by a 5-second OCV measurement. The starting potential V0 and the initial potential sweep direction for the sequence were determined by the active learning algorithm. The potential was stepped systematically in the chosen direction until a limit (0.0 V or 2.0 V vs. RHE) was reached, after which the direction reversed to scan back towards the opposite limit, without revisiting intermediate potentials. The initial potential step was measured only once during the sequence. Fig. 2 depicts an example measurement sequence.
![]() | ||
| Fig. 2 An example measurement sequence request made by the AI science manager (active learning algorithm) to the automated instrumentation (HELAO) using the cloud-deployed data requests API. We illustrate the eleven 85-second CA steps at evenly spaced potentials (column 1), the transmitted intensity at a given potential (column 2), and the relative changes in absorbance (column 3), which are used in eqn (1). | ||
The probabilistic surrogate model characterizes the optical instability metric as an unknown function where X is the 5-dimensional design space and the co-domain consists of optical instability metric measurements, real values in
. The design space X consists of five-dimensional elements,
, including: fraction of Co, fraction of Ni, pH value, potential V, and initial step direction. The composition dimension molar fraction of Sb is constrained by fraction of Sb = 1 – (fraction of Co + fraction of Ni). Each query of the black-box function f at input design point x yields a noisy observation β of the optical instability metric f(x), written as β ∼ f(x) + ε, where
represents Gaussian noise (i.e., a perturbation of the deterministic metric).
A Gaussian process (GP) Bayesian model models this black-box function. Given a dataset of T optical instability metric observations from T design points, denoted as D = {(x1, β1), …, (xT, βT)}, the GP posterior predictive distribution for an input x is written as p(β|x, D), with variance denoted as Var[p(β∣x,D)]. To make predictions, we can use the mean of our optical instability forecast,
(where
denotes an expectation).
The acquisition function maximizes the expected information gain (EIG) about f for any design point x ∈ X. For homoscedastic noise and a Gaussian posterior, maximizing EIG is equivalent to maximizing the posterior predictive variance:
![]() | (2) |
Intuitively, we want to modify the EIG to produce an acquisition function that selects design points which are both highly informative and unlikely to terminate prematurely due to instabilities. In practice, this means the acquisition function should favor compositions, starting potentials, and sweep directions that maximize expected reduction in model uncertainty while extending the number of stable steps in each scan. To incorporate a sequence of potential steps V until an instability, we employ an “expected-EIG acquisition function” using the iterated expected information gain. For a tuple
representing a potential scan (defined by composition, pH value, starting potential V0, and potential scan direction Vd), the iterated-EIG (iEIG) is:
![]() | (3) |
to be a binary variable indicating the stability of potential Vi within a given scan
(
for stable,
for unstable). This acquisition function is approximated as follows: first, for i = 1, …, m, each term in the sum (in eqn (3)) is computed similar to the vanilla EIG expression, where
EIGf( ,Vi) = EIGf(x) ∝ log(Var[p(β|x,D)]),
| (4) |
, over each i = 1, …, m, the expression
is approximated via posterior sampling (similar to Thompson sampling22,23), where we draw a posterior sample from our model and record the sequence of stable potential probabilities (starting from potental V1 and proceeding to potential Vi−1). Note that, once a stability of zero (si = 0) has been observed, then the stability indicator variable remains at zero for the remainer of the scan (i.e.,
, for all i′ > i). Inuitively, the iEIGf(
) acquisition function value is calculated from a sum of the EIGf(x) values multiplied by its probability of stability, taken over the potential scan.
We use the iEIG acquisition function to guide sequential experiments, as it balances expected information gain against the risk of path-dependent instabilities that can prematurely terminate measurement sequences—e.g., environment-induced sample destruction. The iEIG acquisition function explicitly balances these two considerations: it prioritizes queries that are expected to yield the largest reduction in model uncertainty (as in the classical EIG formulation), while simultaneously incorporating the probability of encountering instabilities along a potential scan. This “iterated” extension is essential in our experimental setup, where each query corresponds to a trajectory of sequential measurements rather than an isolated point.
In Fig. 3, we illustrate the iEIG acquisition function for a single composition and pH value, and compare it against a standard EIG acquisition function over the same scanning range. Note that both scanning up and scanning down lead to path-dependent measurements—i.e., can run into an instability—but our acquisition function aims to choose a design point (e.g., composition, initial voltage, scanning direction) that is expected to be both informative and take a larger number of measurements before hitting such an instability. In Fig. 4, we illustrate the acquisition optimization process, showing the queried value over the course of three time steps.
Upon completion of an experiment and subsequent data upload to AWS S3 by HELAO, an AWS EventBridge trigger invoked an AWS Lambda function. This function inserted metadata—including the S3 location of raw data and experiment details—into the lab DynamoDB table, adhering to the ESAMP (Event-Sourced Architecture for Materials Provenance) paradigm.24 A second Lambda function initiated data processing routines, generating performance metrics. These metrics were then associated with corresponding experiment requests within the data requests database.
The machine learning model iteratively updated its predictions based on newly available data. A Python client interfaced with the data requests API to retrieve processed data, then a Python script retrained the model, and computed acquisition scores (iterated-EIG values) for unevaluated design points. These scores were subsequently stored back into the data requests database. The HELAO orchestrator queried this database, identified the highest-priority acquisition score, and executed the corresponding spectro-electrochemical experiment sequence. Each entry in the data requests database contained a unique ID, experiment input parameters (composition, pH, V0, Vd), the calculated acquisition score, and ultimately, the resulting performance metrics upon experiment completion. This closed-loop system enabled dynamic prioritization and execution of experiments driven by the AI model. Fig. 5 provides a schematic overview of the interactions between instrument orchestration (HELAO), data orchestration (cloud services), and AI modeling entities.
Evaluating the performance of the online experimental procedure involves assessing both the probabilistic surrogate model p(β∣x, D) and the experimental design efficacy. This evaluation is complex due to the unavailability of the true underlying function f(x) and the corruption of measurements following instability events during potential scans of V, limiting ground truth access to a subset of observed measurements.
To assess the quality of our surrogate model, we utilize two distinct approaches. The first approach involves evaluating the error of the probabilistic surrogate model throughout the course of the experiments. Specifically, we compare the predicted value of a given measurement before a query is performed with the true measurement observed during experimentation. Recall that our Gaussian process model predicts the mean of our optical instability forecast,
, which is then compared against a noisy draw of the ground truth optical instability measurement β ∼ f(x) + ε. Therefore, we anticipate that there will always be a lower bound on the possible error, owing to aleatoric uncertainty.
This type of error metric has been used in prior online experimentation studies,8,14,25 and it can be computed online during the course of the experiment without the need to wait until the conclusion of the experiment or rely on synthetic experimental setups.
In Fig. 6, we show the results of this error metric over the course of our eight composition libraries. The first six experimental scenarios involve binary combinations of (Co–Ni, Co–Sb, and Ni–Sb) oxides, while the final two experimental scenarios involve ternary (Co–Ni–Sb) oxide combinations. Specifically, we plot the absolute error,
![]() | (5) |
![]() | ||
| Fig. 6 Absolute error of model predictions on each subsequent query, over all experiments. Top: for each plate, the subset of experimental compositions within the full set of ternary combinations are illustrated. Bottom: the absolute error, as defined in eqn (5), is shown for each query (excluding instabilities) on each plate. | ||
Several interesting behaviors emerge as we examine the results over the course of the training. In general, we observe a decrease in error as additional measurements are acquired for a given composition space. This trend is relatively consistent across all experiments, and suggests that the model improves as it gains more observational data. For example, the elevated error observed in the first Ni–Co plate reflects its position as the very first experiment in the sequence, when the model had little prior data. More generally, early experiments begin with higher error that decreases as observations accumulate, with brief increases only when the design space expands substantially (e.g., from binary to ternary systems).
In our second approach, we record an error metric that cannot be computed online during the experiments, but can be computed after the experiments are finished. This approach still adheres to our constraints, as it avoids using corrupted measurements that occur after encountering an instability. In this approach, we take a held-out set of data, denoted Dk, from the full ternary (Co–Ni–Sb) oxide composition space, where we have ground truth measurements from the end of the experimental regime. This held-out dataset serves as a consistent reference set with which we can estimate the error of our model over the course of the full experiment.
In this approach, we compute the mean absolute error over predictions on our held-out set, defined as
![]() | (6) |
We show the results of this metric over the course of all eight experimental scenarios in Fig. 7. In addition to evaluating this metric on a common held-out dataset from the full ternary (Ni–Co–Sb) oxide composition space (solid lines), we also compute it on plate-specific held-out datasets for each experimental plate (dashed lines). The plate-specific held-out datasets reveal how model accuracy improves during each set of plate experiments, while the common dataset provides a consistent benchmark across plates and more directly reflects the objective of learning unseen compositions—e.g., in ternary space. Together, these two evaluations distinguish between within-plate learning dynamics and cross-plate generalization.
![]() | ||
| Fig. 7 Mean absolute error of model predictions on fixed held-out datasets given measurements on the ternary space, shown over all experiments. Top: for each plate, the subset of experimental compositions within the full set of ternary combinations are illustrated. Bottom: the solid lines depict the mean absolute error, as defined in eqn (6), on a common held-out dataset derived from the final ternary plate, and the dashed lines depict the mean absolute error on a plate-specific held-out dataset, for a sequence of queries (excluding instabilities) from each plate. Note that, for the final ternary plate, the solid and dashed lines are equivalent (as the plate-specific dataset is equal to the held-out dataset) and thus only one line is shown. | ||
In general, we observe that the held-out error decreases smoothly as the experiment progresses, reflecting the continual improvement of the model as more data points are observed. The only exception to this smooth decrease occurs in the first Ni–Sb binary experiment and the first ternary-combination experiment, where a temporary increase in error was observed. Notably, in the final ternary experiment, we see a large decrease in error. This is expected, as the common held-out set (solid lines in Fig. 7) was sourced from a ternary composition space with a similar pH value.
Overall, these findings demonstrate the robustness of our probabilistic surrogate model and experimental design procedure, as well as the ability of the model to adapt to the challenges posed by the experimental environment. The consistent reduction in error, especially following space expansion and brief anomalies, supports the validity of the active learning approach in guiding the experimental process effectively.
The cloud-based active learning platform successfully orchestrated the spectro-electrochemical stability campaign, acquiring 9755 CA measurements across 887 unique composition-pH combinations over 11 days. The platform demonstrated robust integration of the AI science manager, HELAO orchestrator, cloud data infrastructure, and remote hardware. Experiment selection was efficiently guided by the iterated-EIG acquisition function, prioritizing informative measurements based on model uncertainty and stability predictions. The optical instability metric employed does not account for irreversible spectral changes. To date, the Materials Project Pourbaix application20,26 provides the largest repository of aqueous electrochemistry data, where Pourbaix energetics are derived from the thermodynamics of dissolved molecular species and bulk solid state materials, without consideration of the electrode–electrolyte interface. Given the lack of kinetic passivation in the calculations, it is interesting to consider the nature of the relationship between the optical instability metric and the Pourbaix energetics. As we show in the SI, these data are poorly correlated, with no apparent predictability of the optical instability results based on the computational data, highlighting the critical role of experimentation in mapping the electrochemical behavior of solid state materials.
EventBridge – Amazon EventBridge is a serverless event bus that makes it easy to connect applications together using data from your own applications.
DynamoDB – Amazon DynamoDB is a managed NoSQL database service provided by Amazon Web Services (AWS). It supports key-value and document data structures and is designed to handle a wide range of applications requiring scalability and performance.
S3 – Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon (https://amazon.com) uses to run its e-commerce network.
HELAO – Hierarchical Experimental Laboratory Automation and Orchestration, an open-source automation and orchestration software which facilitates parallel and distributed operation of devices and instruments through an interconnected network of web services deployed via FastAPI.
ESAMP – An event-sourced architecture for materials provenance management and application to accelerated materials discovery.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5dd00325c.
| This journal is © The Royal Society of Chemistry 2025 |