Open Access Article
Luca Torresi
ab and
Pascal Friederich
*ab
aInstitute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany. E-mail: pascal.friederich@kit.edu
bInstitute of Anthropomatics and Robotics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany
First published on 7th April 2026
Currently, Bayesian optimisation is the most widely used algorithm for identifying informative experiments in self-driving labs (SDLs). While versatile, standard Bayesian optimisation relies on fixed experimental workflows with predefined parameters and objective functions. This prevents on-the-fly adjustments to operation sequences or the inclusion of intermediate results in the decision-making process. Therefore, many real-world experiments need to be adapted and simplified to fit standard SDL settings. In this paper, we introduce multi-stage Bayesian optimisation (MSBO), an extension to Bayesian optimisation that allows flexible sampling of multi-stage workflows and makes data-efficient decisions based on intermediate observables, which we call proxy measurements. MSBO is designed to address common SDL challenges, such as high downstream characterisation costs, sequential dependencies, and the effective use of proxy measurements. To evaluate this approach, we validate our method using computational simulations and retrospective datasets of chemical discovery, demonstrating its potential to accelerate future SDLs. We systematically compare the advantage of taking into account proxy measurements over conventional Bayesian optimisation, in which only the final measurement is observed. We find that across a wide range of scenarios, proxy measurements substantially improve both the time to find solutions and their overall optimality. This not only paves the way to use more complex and thus more realistic experimental workflows in autonomous labs but also to smoothly combine simulations and experiments in the next generation of SDLs.
Several machine learning algorithms have been deployed to drive SDLs, including genetic algorithms,7 reinforcement learning,8–10 and Bayesian optimisation (BO).11 Among these, BO is the most widely adopted method thanks to its ability to operate effectively with a limited number of evaluations, navigating noisy data and complex landscapes without the need for gradient information. BO has been successfully applied to SDLs for both the optimisation of single objectives,11–13 and, more recently, for the simultaneous optimisation of multiple objectives.14
Despite the considerable advantages these methods offer over traditional approaches, they also come with inherent limitations. A key shortcoming of all these frameworks is their reliance on a fixed experimental workflow. This rigidity prevents making on-the-fly adjustments to the planned sequence of operations. Additionally, standard approaches typically treat experiments as black boxes, ignoring the known structure of the problem, i.e. the sequential dependencies between process stages and their intermediate results. As a result, existing frameworks often prove too rigid for practical materials science experiments, which are rarely isolated black-box problems but rather complex, multi-stage processes with tunable parameters at each stage and measurable intermediate outcomes.
One illustrative example is the fabrication of perovskite solar cells, which involves multiple separate stages, such as the deposition of the electrode, transport, and perovskite layers.15 Although the quality of each layer critically impacts the final device performance, key metrics such as power conversion efficiency can only be measured after the full device is completed. While it is possible to optimise each layer individually,16 this approach fails to capture the complex interdependencies between the different components.
Going beyond multi-fidelity BO, Astudillo and Frazier24 extended BO to structured objectives by considering composite functions of the form f(x) = g(h(x)), where h is an expensive vector-valued black box and g is a cheap real-valued function. They modelled h with a Gaussian process (GP) and designed a Monte Carlo-based acquisition function, called expected improvement for composite functions, which uses intermediate measurements to guide the optimisation process. This framework was later generalised to directed acyclic graphs (DAGs) of functions,25 where each node depends on its parents and is modelled as an independent GP. This approach, termed Bayesian optimisation of function networks (BOFN), leverages known structure for more efficient optimisation but requires evaluating the full network at each iteration, limiting flexibility in selectively sampling sub-graphs.
Recent extensions, such as partially observable knowledge-gradient methods for function networks,26 relax the rigidity of standard BO by allowing selective evaluation of intermediate nodes. While the original formulation achieves strong query efficiency, it comes at a high computational cost due to its nested Monte Carlo estimators and repeated acquisition optimisations. A faster variant27 reduces runtime by optimising over subsets of the input space, with modest reductions in solution quality. Both approaches assume that intermediate nodes can be probed at arbitrary inputs, independent of upstream stages. While this abstraction facilitates algorithm design, it may not hold in practical laboratory settings, where intermediate measurements usually require executing preceding stages of the process, and the possible value ranges of observations at each stage are typically not known in advance.
Building on cascade Bayesian optimisation28 and BOFN, Kusakawa et al.29 proposed a method for cascade processes, modelling each stage as a separate GP and dynamically adjusting downstream stage parameters conditional on the measured outputs of earlier stages, with the core contribution focusing on this continuous, multi-stage look-ahead sampling. The effectiveness of the method was demonstrated using a combination of synthetic numerical test functions and a solar cell simulator learned from data.
An alternative paradigm for sample efficiency relies upon dynamic resource allocation and early-stopping mechanisms; notable examples include Hyperband,30 BO hyperband,31 and Freeze–Thaw BO,32 which have become established as state-of-the-art methodologies for hyperparameter optimisation in neural networks. However, because such algorithms rely upon a temporal continuation assumption for a singular process (for instance, by extrapolating asymptotic learning curves as in Freeze–Thaw), they cannot model the complex transfer functions required when intermediate observables transition between structurally distinct, heterogeneous physical domains. Furthermore, approaches such as Hyperband employ deterministic pruning heuristics intended for massive parallelisation; such mechanisms prove fundamentally incompatible with the costly, sequential experimentation inherent to self-driving laboratories.
:
![]() | (1) |
BO is particularly suited for functions where evaluations are costly, noisy, or lack an analytic expression, precluding the use of gradient-based methods.33 The core idea is to balance exploration, i.e. sampling regions with high uncertainty, and exploitation, i.e. sampling regions likely to improve the current best-observed value, to reach the global optimum in a minimal number of function evaluations.
The BO process is iterative, relying on two essential components: a surrogate model and an acquisition function. The surrogate model is a probabilistic model, constructed using a small set of observations to approximate f(x). It provides fast estimates of predictions and associated uncertainty across the optimisation domain
. The acquisition function quantifies the expected utility of evaluating f at a specific point x. The next sampling location xt+1 is chosen by maximising this computationally inexpensive acquisition function. The alternating execution of these two steps efficiently generates a sequence of samples converging to the global optimum.
, specified completely by a mean function
and a covariance function
:| f(x) ∼ GP(m(x), k(x, x′)). | (2) |
The covariance function, or kernel k(·, ·), is a positive semi-definite function (e.g., Matern or radial basis function kernels) that defines the similarity between input pairs x and x′. The mean function is often taken to be zero for simplicity.
While standard GP training suffers from
computational complexity, various approximation methods exist to scale GPs to large datasets.36 Furthermore, GPs can be extended to model multiple correlated outputs using multi-task Gaussian processes,37 often employing an intrinsic coregionalization model to define the inter-task covariance structure.38
translate the probabilistic belief from the surrogate model into a metric for selecting the next query point. Maximising
replaces the original expensive optimisation problem with a computationally cheap one.Acquisition functions are typically defined as the expectation of a utility function l(y), which measures the desirability of a hypothetical outcome y given the current input x. If p(y|x) is the predictive distribution from the surrogate model,
is defined as:
![]() | (3) |
This integral is often approximated using methods such as Monte Carlo estimation. Standard choices for acquisition functions include the expected improvement (EI) and the upper confidence bound.39
![]() | (4) |
, Φ(·) is the cumulative distribution function of the standard normal distribution, and ϕ(·) is the probability density function of the standard normal distribution.
We define a cascade process as a specific type of multistage system presenting a sequential and dependent structure (see Fig. 1). In a cascade process, the product, or output, (hi−1) of any given stage serves as input for the subsequent stage (i). Formally, a cascade process consists of N stages, where each stage i is represented by a function fi. The stage function depends on a set of controllable parameters xi and the output from the preceding stage:
| hi = fi(xi, hi−1) + εip, for i = 1,…,N. | (5) |
| mi = Mihi + εim | (6) |
, where we assume the final measurement is scalar. While we restrict this study to scalar objectives, the framework can be readily extended to multi-objective BO.
The use of an inventory allows for sample continuation: the optimisation can resume from any intermediate state, and partially processed sample states can be retrieved and applied as inputs to subsequent sub-processes. This system promotes data efficiency, since intermediate results are preserved and reused rather than being discarded, thereby avoiding redundant computations. Finally, this architecture supports dynamic resource allocation by allowing the algorithm to select which sub-process to sample next based on acquisition function values.
This resumable sampling capability allows the algorithm to dynamically balance effort between advancing existing samples with high expected utility through subsequent evaluation stages versus initiating new samples to explore other regions of the parameter space. The resulting framework is particularly valuable in real-world scenarios, enabling dynamic decision-making and enhancing overall data efficiency.
| fi(xaugi) ∼ GP(µi(xaugi),ki(xaugi, xi′aug)) | (7) |
• Standard cascade: xaugi = [xi, mi−1], creating a direct Markovian dependency.
• Residual connections: xaugi = [xi, xi−1, mi−1], enabling the model to preserve information from previous inputs, analogous to residual links in deep neural networks. This configuration helps mitigate information loss, especially with partially observed outputs.
To propagate the epistemic uncertainty (the uncertainty in the model due to the limited amount of training data) of the GPs along the chain, we implement a Monte Carlo sampling approach. For a cascade of N processes, uncertainty propagation proceeds as follows: at stage i = 1, we draw S samples from the posterior of the first GP, m(s)1 ∼ GP1(x1) for s = 1, …, S. For subsequent stages i > 1, each sample m(s)i−1 from the previous process serves as an input to the next GP, yielding m(s)i ∼ GPk([xi, m(s)i−1]). This sampling-based approach naturally captures the non-Gaussian, potentially multimodal uncertainty distributions that emerge from the cascade of nonlinear transformations, providing realistic uncertainty quantification for the final process outputs. The resulting ensemble of samples
represents the compound uncertainty and enables acquisition functions to make decisions that account for uncertainty accumulation throughout the entire process cascade.
We formulate the acquisition function recursively. For the final stage N, the acquisition function is the standard EI on the objective y, conditioned on the previous measurement mN−1. For any preceding stage i < N, the acquisition function is defined as the expected value of the acquisition function of the subsequent stage i + 1, integrated over the probabilistic outcome of the current stage mi, thus allowing the uncertainty at intermediate stages to propagate forward to the final objective.
Formally, let y* denote the best objective value observed so far. The nested acquisition function α is defined recursively as:
![]() | (8) |
At each iteration, the optimiser queries the inventory to identify all valid continuation points, as well as the option to initialise a new sample at the first sub-process. For each candidate process i and its corresponding available input mi−1 (retrieved from the inventory), we optimise the parameters xi to maximise the nested acquisition function. To prevent stagnation in regions where the EI vanishes (i.e., is below a numerical tolerance of 5 × 10−3 across the search space), the algorithm automatically switches to the upper confidence bound acquisition function as a tie-breaking mechanism. The associated exploration weight, β, follows a linear decay schedule (from an initial value of 100 to 1) as a function of the remaining budget to favour exploration.
The final selection of the next experiment is determined by comparing the maximal acquisition values across all stages. The framework allows for cost-weighted selection to account for heterogeneous experimental costs ci per stage:
![]() | (9) |
To ensure numerical stability, we enforce strictly positive costs (ci > 0). For stages with physically negligible costs, a small positive constant (e.g., ε = 10−6) must be assigned to avoid singularities. In the experiments reported here, setting a uniform cost (ci = 1) for all stages yielded superior optimisation performance compared to explicitly modelling the cost ratios; therefore, we use uniform costs unless otherwise stated. We hypothesise that strict cost-weighting may inadvertently penalise expensive but highly informative regions. In contrast, a uniform cost structure ensures the algorithm prioritises global information gain and solution quality over short-term cost savings. The complete algorithmic procedure governing stage selection and the corresponding inventory updates is detailed in Algorithm 1 of the SI.
through quasi-random sampling of the entire cascade to build an initial surrogate model. To this aim, we use Sobol sequences, ensuring space-filling properties in the high-dimensional joint input space. In all experiments, the size of
was set to 2(D + 1), where D is the total number of controllable parameters (components of all xi for i = 1, …, N). This is followed by an adaptive optimisation phase, where new sampling points are selected based on the acquisition function. In this adaptive phase, the algorithm can choose from which sub-process to sample next, rather than being restricted to evaluating the entire cascade from the initial input. We allow users to define a constraint on the minimal relative frequency at which each sub-process is sampled. This constraint is essential in settings where the final sub-process is relatively easy to model. In this scenario, the high uncertainty propagating from complex upstream stages can dominate the acquisition function, leading the algorithm to undersample the final stage and consequently fail to realise the optimal set of parameters (i.e., the best objective value).
• Random search: in each iteration, the set of candidate parameters x = (x1, …, xN) is drawn from a uniform distribution. This establishes a lower bound for optimisation performance.
• Standard BO: this baseline treats the entire multi-stage process as a single black-box function, optimising the final objective y(x1, …, xN) without leveraging intermediate measurements. It uses the expected improvement acquisition function.
• BOFN:25 this baseline leverages the known function network structure and intermediate measurements via the same cascade GP surrogate model employed by MSBO. However, it operates under a monolithic evaluation scheme, requiring the execution of all stages per iteration to obtain the final objective. Consequently, it neither maintains an inventory for resumable sampling nor employs a stage-aware acquisition rule; rather, it applies a standard expected improvement acquisition function across the entire network.
is the best objective value observed up to iteration t. In settings with process noise, noisy samples may occasionally yield values exceeding the true, denoised optimum. To address this, we define the reference yopt as the true optimum plus three times the standard deviation of the cumulative output noise. This prevents the regret metric from becoming undefined or artificially saturated due to stochastic outliers. For the real-world task, we report percentile-based measures that reflect the best-performing fraction of the dataset discovered by the optimiser.
, so that both landscapes can be visualised directly. In these experiments, we set M = I and both process and measurement noise to 0. The complexity of the first and second stages is set to 8 and 2, respectively. In our framework, complexity is controlled by the number of random seed points used to construct the synthetic functions, where higher values correspond to more rugged landscapes and increased modelling difficulty of the objective (see SI A).We find that, for any given cost, MSBO consistently achieves lower regret than the standard BO baseline and random search, demonstrating improved data efficiency (see Fig. 3 for representative results from these experiments). All experiments are averaged over 10 independent runs to obtain uncertainty bounds and to ensure statistical significance. Furthermore, while MSBO suggests a similar number of first-stage function evaluations as BO (left panels in Fig. 3B and C), it allocates second-stage resources more efficiently, concentrating samples in the region of the global optimum. In the standard BO run, the samples for the second process are broadly scattered across the domain and are not concentrated near the true optimum of the second-stage landscape. By contrast, MSBO concentrates the majority of its second-stage samples in the region close to the maximum of the second-stage objective, showing that the method is substantially more efficient at allocating second-stage evaluations to informative regions.
MSBO discovers and explores multiple high-performing modes in the first process: samples are clustered around several distinct regions of the f1 landscape that correspond to high-valued objectives, indicating the algorithm's ability to identify and maintain promising alternatives rather than collapsing prematurely to a single local mode. This behaviour is a direct consequence of (i) the cascade GP that propagates multi-modal uncertainty, and (ii) the nested, resumable acquisition strategy that allows the optimiser to selectively invest further budget only for the most promising samples, pruning the search space early. More examples are provided in SI B (Fig. S7 and S8).
, the intermediate observables are
, and the second-stage control parameters are
. We assume fully observable states (M = I) and no noise. The synthetic generator allows us to independently adjust the complexity of both the first (f1) and second (f2) sub-processes (see SI A for details). This enables a broad exploration across different relative complexity settings.To explore different possible real-world scenarios, our experiments test nine combinations of first- and second-stage complexities. We compare MSBO with standard BO (which treats the cascade as a monolithic black box) and the BOFN baseline, which can also use the intermediate measurement m1 = h1. We find that MSBO consistently outperforms the other baselines in all scenarios. The ability of MSBO and BOFN to learn the intermediate mapping ĥ1(x1) provides a strong driving signal, overcoming the difficulties standard BO faces in optimising over the high-dimensional joint space (x1, x2). This is clearly demonstrated by the significant performance gap that both MSBO and BOFN exhibit with respect to BO across the entire sweep. Furthermore, MSBO's multi-stage acquisition function and support for resumable sampling provide a substantial, ulterior boost in performance over all baselines, including BOFN. MSBO consistently achieves the lowest regret for a given cost and maintains a clear margin over the baselines. This advantage persists whether the difficulty arises primarily in the first stage (left column), in the second stage (bottom row), or simultaneously in both. The results are illustrated in Fig. 4, which reports the mean simple regret as a function of normalised sampling cost, aggregated over 20 independent runs.
Specifically, our results suggest that MSBO's advantage is not confined to specific complexity patterns: the ability to decouple upstream and downstream exploration, and to commit second-stage resources only when justified by intermediate observations, yields systematic improvements. This is evident not only when the initial process (f1) is more complex (top-left corner), but crucially, even when the second process (f2) is the most difficult to regress (bottom-right corner). This observation suggests substantial opportunities and potential for applying MSBO in scenarios where no highly informative proxy measurement is available. Even in such cases, low-fidelity and not directly related observations, additional experiments, as well as simulations, can improve the overall sample efficiency of complex and costly experiments of the actual objective function.
Furthermore, we observe that the relative sampling frequency of first-stage and second-stage evaluations adjusts dynamically, demonstrating the framework's ability to allocate resources depending on the relative complexity of the stages. We report the number of samples drawn from each of the two subprocesses for each setting in SI B, Fig. S3.
To evaluate how robust MSBO remains under such realistic conditions, we examine its behaviour when the relative costs of the two stages are varied. We focus on the process defined by complex upstream dynamics (corresponding to the top-left case in Fig. 4) and compare three regimes: a uniform-cost setting (ratio 1
:
1); a downstream-heavy heterogeneous-cost setting (ratio 1
:
10), where the second stage is ten times more expensive than the first; and an upstream-heavy setting, representing the inverse scenario (ratio 10
:
1) (see Fig. 5). As detailed in Section 4.1, the total cost of a full cascade evaluation is normalised to 1 in all experiments. Consequently, the baseline methods (standard BO and BOFN), which do not support dynamic early stopping, exhibit identical behaviour regardless of the internal cost distribution. By contrast, MSBO naturally adapts to the cost asymmetry through its resumable sampling mechanism. Even though MSBO does not explicitly incorporate cost parameters into the acquisition function, its resumable sampling mechanism allows it to implicitly capitalise on low-cost intermediate evaluations.
Empirically, MSBO exhibits improved data efficiency in the heterogeneous-cost regime compared to the uniform one. The MSBO algorithm relies more heavily on the inexpensive first stage to discard unpromising candidates early, and only progresses samples to the costly second stage when their intermediate results indicate sufficiently high utility. This behaviour leads to faster convergence toward better-performing solutions. Furthermore, in the inverted 10
:
1 cost regime, while the absolute budget saved by early stopping inherently diminishes due to the low cost of the downstream stage, MSBO retains a strict performance advantage over both standard BO and BOFN. This demonstrates that the flexibility gained by decoupling the sampling processes allows MSBO to systematically outperform baselines irrespective of whether the first or second stage constitutes the primary cost driver.
When evaluating our model on a three-stage cascade process, we observe that the performance advantages of MSBO, as demonstrated in the two-stage setting, extend effectively to this deeper process, where it consistently outperforms the baselines (see aggregated results in Fig. S4). To test the impact of partial observability of intermediate results, we evaluate our model in a more complex three-stage cascade, where each intermediate process outputs a four-dimensional state, of which only the first two dimensions are accessible to the optimiser. Consequently, half of the information required to fully model each stage is missing. Due to the increased complexity and the information loss, both standard BO and BOFN degrade to the performance level of random search, whereas MSBO remains significantly more efficient (see Fig. S5). This result is significant, as it reflects the more realistic experimental setting, where not the entire intermediate result of an experiment is observable, but only aspects of it, e.g. a thin film material is generated, but only specific aspects of it are observed (e.g. conductivity or absorption), but not all details (microstructure, interfaces, other thin-film properties, etc.).
Finally, to assess robustness in realistic laboratory settings, we evaluate performance on a two-stage cascade with process noise ε1p. MSBO demonstrates superior robustness in this challenging regime compared to the baselines, consistently identifying high-performing solutions by effectively leveraging intermediate measurements and flexibly sampling from the various subprocesses. Furthermore, calculating the denoised performance of the sample with the best observed objective y reveals that relying on single observations often leads to selecting suboptimal parameters driven by favourable noise outliers. Conversely, selecting the final parameters based on the surrogate model's maximum predicted mean among the collected samples consistently yields superior performance. This result suggests that in noisy real-world environments, a model-derived policy for selecting the final best-performing set of parameters is significantly more reliable than one based strictly on historical observations, as the model effectively filters out stochastic fluctuations. This advantage is particularly pronounced for MSBO. Because MSBO models the underlying process structure more accurately than the baselines, the performance gain achieved by relying on its surrogate, rather than on single stochastic outcomes, is significantly larger (see Fig. S6).
S) for nearly 10
000 unique chemical compounds. The database aggregates and curates data from nine different public datasets and includes corresponding molecular representations. In our optimisation setup, we employ the partition coefficient (log
P), computed via RDKit, as a proxy for the experimental solubility values. Given that the computational cost of calculating log
P is negligible compared to the cost of performing wet-lab experiments, we treat the first-stage cost as effectively zero. We note that this specific experiment serves as a conceptual demonstration using real-world data. In scenarios where a proxy incurs negligible computational cost, and the design space is finite and discrete, exhaustive proxy evaluation remains preferable to any adaptive strategy. However, MSBO is explicitly designed for typical SDL conditions where exhaustive evaluation is not possible; such constraints arise either because the design space is continuous (e.g., flow rates, temperatures), rendering exhaustive enumeration impossible, or because the proxy itself incurs a non-trivial computational or physical cost. In this setting, we use the residual connection configuration for our surrogate model (as described in Section 3.1) to mitigate the information loss inherent in compressing the molecular features into a single scalar proxy.Furthermore, the comparison with BOFN reveals that simply modelling the cascade structure is insufficient. In the FreeSolv task, BOFN performs worse than standard BO, whereas MSBO's resumable sampling strategy effectively leverages the proxy measurements to outperform the baselines (see Panel D). Finally, MSBO exhibits the most rapid optimisation trajectory in the early stages (costs 0–20), indicating superior initial exploration and immediate identification of high-utility regions. These results confirm that the cascade structure provides a robust navigational signal, enabling efficient optimisation even with high-dimensional molecular representations and varying regimes of proxy fidelity and cost.
We validated the framework through extensive synthetic benchmarking, demonstrating that MSBO consistently achieves lower regret than standard baselines and network-based alternatives (BOFN), and using a smaller budget. These experiments confirmed the method's robustness across various regimes of stage complexity and heterogeneous cost ratios. Furthermore, our evaluation of real-world optimisation tasks, including the optimisation of molecular solubility and electronic properties, revealed significant efficiency gains. Notably, MSBO identified top-performing molecules with considerably fewer resources than traditional methods, validating its practical utility in high-dimensional search spaces.
Ultimately, these results suggest that incorporating intermediate proxy measurements is a critical step toward more autonomous and efficient materials discovery. By bridging the gap between cheap computational approximations and expensive experimental validations, MSBO offers a scalable pathway for hybrid computational-experimental self-driving labs. The MSBO approach can replace (potentially biased) multi-stage funnel approaches, where selection criteria need to be defined beforehand, and can thus efficiently learn to optimise in high-dimensional parameter spaces with multiple intermediate observations. This opens possibilities to extend self-driving labs from confined optimisation tasks to more complex open challenges of autonomous discovery.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5dd00572h.
| This journal is © The Royal Society of Chemistry 2026 |