Open Access Article
Amanda A. Volk
*ab,
Kristofer G. Reyes
cd,
Jeffrey G. Ethier
b and
Luke A. Baldwin
*b
aNational Research Council, Washington, District of Columbia 20001, USA
bDepartment of Materials Design and Innovation, University at Buffalo, Buffalo, NY 14260, USA. E-mail: luke.baldwin.1@us.af.mil
cMaterials and Manufacturing Directorate, Air Force Research Laboratory, Wright–Patterson Air Force Base, OH 45433, USA
dBrookhaven National Laboratory, Computational and Data Science Directorate, Upton, NY 11973, USA
First published on 6th March 2026
Asynchronous Bayesian optimization is a recently implemented technique that allows for parallel operation of experimental systems and disjointed workflows in autonomous experimentation settings. Contrasting with serial Bayesian optimization, which individually selects experiments one at a time after conducting a measurement for each experiment, asynchronous policies sequentially assign multiple experiments before measurements can be taken and evaluates new measurements continually as they are made available. This technique allows for faster data generation and therefore faster optimization of an experimental space. This work extends the capabilities of asynchronous optimization methods beyond prior studies by evaluating policies that incorporate pessimistic and random predictions in the training data set. The conventional realistic prediction method and five additional asynchronous policies were evaluated in a simulated environment and benchmarked with serial sampling. In many of the tested scenarios, the pessimistic prediction asynchronous policy reached optimum experimental conditions in significantly fewer experiments than both existing asynchronous methods and serial policies, and proved to be less susceptible to convergence onto local optima at higher dimensions. Without accounting for the faster sampling rate enabled by asynchronous operation, the pessimistic asynchronous algorithm could result in more efficient algorithm driven optimization of high-cost experimental spaces. Accounting for sampling rate, the presented asynchronous algorithm could facilitate faster and more robust optimization in parallel autonomous experimentation settings.
One strategy for resolving an incomplete utilization of resources is batch sampling, also referred to as parallel sampling. In batch sampling, a set of experiments are defined and conducted with complete utilization of parallelized experimental resources during each stage of an experimental process, then the measurements from that set of experiments are simultaneously returned to the algorithm for selection of the next set of experiments.10 This approach is suitable for select experimental environments, such as combinatorial screening platforms or high time cost measurements. However, batch sampling poses several intuitive challenges in sampling efficiency. First, while equipment utilization is improved, there is typically still equipment down time when alternating between the different stages of the experiments. Second, batch sampling often does not maximize data availability in algorithm decision making. Unless the experimental system is inherently structured for batch sampling, there is typically a missed opportunity to complete an experiment, and measurement, that informs the experiment selection algorithm before conducting all the experiments in the set. Finally, batch methods are not suitable for experimental systems with time dependent outcomes. For example, if an experiment were to produce a material that degrades over time, batch methods would not result in a uniform time step between experiment and measurement, resulting in imprecise data generation.
In response to the constraints of batch sampling strategies, asynchronous sampling methods have recently been implemented in high-cost experimental environments,11 specifically in delocalized experimentation networks.12,13 Shown in Fig. 1, asynchronous sampling methods implement similar strategies to batch sampling by selecting multiple experiments without completing measurements. However, in asynchronous designs, experiments are continually measured and added to the data set while other experimental steps are being conducted. In an asynchronous Bayesian optimization design, there is a moving window buffer that contains placeholder data for the currently running experiments. This buffer set is appended to the real value data set for model training. When an experiment measurement is complete, the real data replaces the placeholder data. Then, a new experiment is selected, and the placeholder data is added to the buffer. Several strategies have been implemented to generate placeholder values in asynchronous Bayesian optimization, including local penalty strategies14–17 and realistic constant liar predictions,11 among others.18,19 Within these studies, several acquisition functions and strategies have been evaluated, including Thompson sampling, expected improvement, and upper confidence bounds. In prior studies, asynchronous sampling resulted in faster data generation rates and therefore faster approach to optimal experimental conditions.
![]() | ||
| Fig. 1 Illustration of sampling policies that can be harnessed for optimization algorithms. Here a general workflow is depicted for (A) serial sampling and (B) asynchronous sampling. | ||
Most prior studies in asynchronous Bayesian optimization algorithms implement a realistic prediction, where the placeholder value is assumed to be the predicted output of the belief model. The assumed primary mechanism of this approach is that the model induces a reduced uncertainty around the previously sampled point, thereby discouraging repeat sampling. In this work, we present five asynchronous sampling value prediction policies that operate around alternative assumptions: (1) pessimistic constant liar prediction, (2) random prediction, (3) descending pessimism constant liar prediction, (4) ascending pessimism constant liar prediction, and (5) lower confidence bounds prediction. These alternative methods implement three strategies: pessimistic predictions, which presume that the outcome of queued experiments is the most undesired value; random predictions, which presumes a uniform sampling of all possible outcomes without consideration of prior information; and lower confidence bounds predictions, which infers an undesirable outcome within the prediction bounds of the belief model.
In each of these policies, we explore methods for selecting the values used in the placeholder prediction buffer. We benchmark these five policies with serial sampling and realistic prediction asynchronous sampling on a selection of representative ground truth functions using an upper confidence bounds decision policy. It should be noted that other policies could generate alternative outcomes. The simulated optimization campaigns on ground truth functions showed that with an upper confidence bounds decision policy and a Gaussian process regressor, the realistic prediction policy and all five alternative policies outperformed serial sampling considerably when accounting for the improved sampling rate. Furthermore, we found that all five alternative policies consistently performed competitively with serial sampling, and in some cases, significantly outperformed serial sampling when considering the number of experiments conducted. Additionally, the pessimistic policy was shown to provide some durability to low exploration constants in the upper confidence bounds policy, and policy's performance advantage decreases at higher exploration constant values.
The pessimistic constant liar prediction also outperforms serial sampling on discrete input spaces of real-world data modeled with a random forest regressor. The proposed strategies not only generate data at a faster rate than serial sampling, but also select experiments equally or more efficiently. Implementation of the proposed algorithm has notable implications in asynchronous experiment conduction loops for high-cost experiments, and could improve sampling efficiency of serial closed-loop systems. The findings of this study provide a broader context on the role of non-realistic prediction policies in asynchronous Bayesian optimization within real-world relevant design spaces.
| BReal = [y′(xC+1), y′(xC+2), …, y′(xC+NBuff)] |
| BPess = [0, 0, …, 0] |
is a uniform random distributed that is bounded from 0 to 1. Additionally, all values in x are constrained to the bounds (0, 1). A pessimistic value is defined as the lower bound of the expected response range, which in the case of the TriPeak function is zero. The pessimistic assumption, also referred to as censorship in prior works,23 has been leveraged in multi-worker contexts where delay distributions are randomly sampled to dynamically determine the buffer lengths, but it has not been evaluated under uniform delay asynchronous sampling. The uniform random distribution is resampled for every value in the buffer each time the model is trained.
Due to the normalization scalar, the function minimum and maximum are equal to 0 and 1, respectively. In the context of these studies, the campaign objective is to maximize the function, and the target feature set (x*) is defined by:
| x* = argmax(f(x)) |
Simulations using additional ground truth functions are reported in the SI along with randomized sampling control groups, shown in SI Fig. S3. The TriPeak function was designed to represent a parameter space that is both reminiscent of real-world experimental spaces and of non-negligible complexity for algorithm benchmarking. Many common computational benchmarking algorithms pose extreme criteria to navigate, such as many local optima. While relevant in many computational spaces, these functions are far more complex than the surface response typically found in experimental design spaces. Algorithm refinement around these functions, therefore, may not reflect performance in real-world laboratories. Conversely, simple convex unimodal surfaces, while common in experimental optimization spaces, are often not of high enough complexity to justify algorithm development and optimization.
This compression occurs from the assumptions that multiple experiments are conducted simultaneously, new experiments are executed as soon as the oldest running experiment completes, and all experiments are of equal duration. This compression also assumes that algorithm calculation times and equipment execution limitations are negligible relative to the parallel experiment conduction. While this asynchronous time compression is relevant to many experimental workflows, a simple example would be automated formulation, reaction, and characterization of liquid samples in a well plate using a liquid handler. In a serial configuration, the system would remain idle during the reaction phase of an experiment, then select a new condition after characterizing the complete reaction. In the asynchronous configuration, the system can prepare and/or characterize other experiments during this downtime, thereby increasing throughput.
and inner quartile range (IQRk) for a given experiment number (k) are formally defined as:| Lr,k = f(x*) − max([f(x1,r), f(x2,r), …, f(xk,r)]) |
| Lk = [Lk,1, Lk,2, …, Lk,NRep] |
| IQRk = IQR(Lk) |
Both the pessimistic and descending pessimism policies outperformed serial sampling for all tested buffer lengths. After 500 experiments, all buffer lengths in the descending pessimism policy reached approximately 60% of the final loss achieved in serial sampling. For the pessimistic policy the 1, 2, 4, and 9 buffer campaigns reached equivalent performance to the serial campaign with 500 experiments in approximately 380, 380, 410, and 480 experiments respectively. After 500 experiments, all asynchronous policies leveraging pessimism showed a significant reduction in the inner quartile range, with the pessimistic policy reaching over an order of magnitude lower inner quartile range than the serial trial. The realistic prediction and serial policy inner quartile ranges continued to increase or plateau after 500 experiments.
The median optimization performance results suggest that the presence of pessimism in asynchronous policies provides more time and material efficient optimizations. Additionally, the narrower convergence in the inner quartile range across trials suggests these results can be achieved more consistently than realistic or serial methods. Despite achieving a lower global accuracy, the pessimistic policy more quickly reaches an accurate estimate of near-optimal conditions than serial policies, shown in SI Fig. S5. Increasing the range of pessimistic prediction policies can further increase the consistency with which campaigns reach optimal conditions. Additionally, the greatest algorithm improvement is observed after the inclusion of a single pessimistic prediction, i.e. one buffer length for the pessimistic, ascending pessimism, and descending pessimism policies. Significant improvements are observed with modest additions of pessimistic predictions, while realistic predictions either have no impact or decreased algorithm performance. Shown in SI Fig. S6 and Section S.2, the constant buffer length implementation outperformed a simulated scenario where buffer lengths are changing dynamically between zero and the specified buffer throughout the optimization. Additionally, the prediction of pessimism near predicted optima is shown to be important, as the introduction of randomized pessimism on a realistic buffer policy performed significantly worse than the pessimistic policy.
Similar trends are observed across both pessimistic and realistic policies when varying the exploration constant across five different values, as shown in SI Fig. S1. Pessimism generates the most significant improvement over serial sampling when lower exploration constants are used, and it influences experimental efficiency less significantly when higher exploration constants are used. For the two lowest value exploration constants tested, the serial policy quickly plateaued and reached a loss approximately 16 times higher than any of the pessimistic buffer policies. For the middle exploration constant value, the serial policy demonstrated improvement with increasing experiments and reached a loss approximately 1.6 times higher than any of the pessimistic policies. However, for the two highest exploration constant values, the serial policy reached a similar loss to the pessimistic policies. The greatest discrepancy occurred between the serial and nine pessimistic buffer policies at the highest exploration constant value, where the serial policy reached a 33% lower loss. The inverse relationship between exploration constant values and the performance advantage of the pessimistic policy suggests that forced exploration may be one of the mechanisms that improve optimization efficiency.
More interestingly, pessimistic policies were more robust to the selection of the exploration term than the realistic policies. Across all tested exploration constant values, the one and two buffer length realistic policies performed similarly to the serial policy. The four and nine buffer length realistic policies, however, demonstrated substantial performance drops, particularly in the scenarios where the serial policy performed well. At the highest exploration constant value, for example, the nine-buffer realistic policy reached an order of magnitude higher loss. Due to the sensitivity of the design space towards the exploration constant, the advantage of pessimism with better tuned hyperparameters is unclear. With this in mind, a logarithmic increasing λ value policy, which in some scenarios outperforms fixed constants, was benchmarked on the pessimistic and realistic policies. Shown in SI Fig. S2, when a more robust and dynamic exploration term is used, the improvement of pessimism over realistic policies is reduced further, indicating that higher exploration rates reduce the relative effectiveness of the pessimistic policy.
Applying pessimism with dynamic exploration terms on high complexity surfaces and very low complexity surfaces generates negligible improvement for most conditions, shown in SI Fig. S2. Using the dynamic exploration term across all ground truth functions, with the one exception of the very simple surface function Trid, the pessimistic policy performed either equivalently or better than serial and realistic policies as a function of experiment number. No discernable difference could be identified between serial, realistic, and pessimistic policies for the Ackley, Michalewicz, and Schewefel functions for all buffer lengths, except the nine-buffer realistic policy on Ackley. All simulation campaigns, however, achieved poor performance on these complex surface functions, suggesting more detailed analyses may be necessary before drawing conclusions. As shown in SI Fig. S3, the dynamic exploration term policies performed worse than, or equal to, random sampling for the very low complexity ground truth, Trid, and high complexity ground truths, Michalewicz and Schewefel. In all scenarios where the decision policies outperformed random sampling, the pessimistic policy generated a substantial improvement in performance at the highest tested buffer length. When comparing all buffer lengths by experimental time rather than number, the pessimistic policy significantly outperforms serial sampling.
The lower confidence bounds policy performed equivalently to the pessimistic policy when evaluated over one and two buffer lengths, but the policy appears to converge prematurely and perform worse than the serial policy at high buffer lengths. Small buffer lower confidence bounds policies likely behave similarly to pessimistic policies in that the uncertainty near local optima is high enough to provide a sufficiently pessimistic hallucination. The failure at higher buffer lengths could be attributed to excessively confident models near local optima where clusters of buffer experiments are selected. In this latter case, the policy likely behaves more similarly to the realistic policy and provides insufficient pessimism to encourage exploration.
One potential explanation for the efficacy of pessimism assisted asynchronous sampling strategies is that the pessimistic predictions reduce the occurrence of premature convergence in upper confidence bounds policies. Forcing a pessimistic prediction on what the current model indicates is the optimal condition prevents resampling at that point, and in cases where replicates already exist outside the buffer, increases model uncertainty at that point which enables improved exploration within the peak. This advantage becomes more dominant when the number of local maxima (i.e., the dimensionality of the TriPeak function) increases.
The integration of the pessimistic prediction within the model training data set contrasts with prior pessimistic prediction methods on constant buffer length since these systems implement a penalty region over a defined area around the prior data point. It is possible that these penalty region methods could suffer from the curse of dimensionality as the volume covered by the defined penalty areas represents a smaller fraction of the overall parameter space.26
A final study was conducted by introducing noise on the five-dimensional TriPeak ground truth function using the pessimistic buffer policy across two to six dimensions. Shown in Fig. 4, increasing the noise of the ground truth system resulted in less efficient optimization algorithms in most cases, but the serial policy at higher dimensions gained a performance advantage that is likely due to the normalization effect of noisy sampling. Similar to the no noise simulations, the serial policy outperformed the asynchronous policies for all noise levels at lower dimensionality, and the magnitude of the sampling penalty increased as the buffer size increased. While the introduction of noise negates any advantage with respect to experiment number attained by the buffer policies at five and six dimensions, the asynchronous policies substantially overlap with the results of the serial policy with respect to experiment number at these higher dimensions. The pessimistic policy outperforms the realistic policy for most sets of comparable conditions, as seen in SI Fig. S7. This result further supports the notion that large buffers in pessimistic asynchronous sampling algorithms can provide faster optimizations with negligible impact on experimental efficiency.
By implementing pessimistic predictions through model integration, the asynchronous sampling policies presented here could more effectively navigate higher dimension parameter spaces through more efficient and comprehensive integration of pessimism. In the complex experimental spaces relevant to algorithm driven experimentation, asynchronous policies provide a notable advantage to serial algorithms when parallel operation is viable. Furthermore, pessimistic asynchronous policies may provide an additional advantage over realistic hallucinations.
Like the prior optimization campaigns, the pessimistic asynchronous sampling policy was benchmarked relative to a standard serial optimization strategy. In this specific campaign, we leveraged modeling information from the original study, and the highest performing model from prior work. A random forest regressor was used as the belief model instead of the Gaussian process regressor applied in all prior simulations. The random forest model was selected over Gaussian processes, shown in Fig. 5B, due to the difficulty they exhibited in navigating the experimental space. Uncertainty was estimated through the standard deviation of all forest member predictions.
Of the approximately 4000 samples in the database, two sets of conditions result in a yield of 100%. While the parameter space used for belief model training is technically 120 dimensions, it is highly constrained through discretization of the parameters and a limited number of possible chemical feature combinations. A more realistic approximation of problem space dimensionality is the five categorical parameters. Regardless, pessimistic asynchronous sampling demonstrated a significant improvement over equivalent serial sampling on this system. As seen in Fig. 5C, the pessimistic policy outperformed the serial policy by experiment number for all tested buffer lengths below nine. The highest performing policy, pessimistic sampling with one buffer, reached the optimal yield in 60% fewer samples than serial sampling. While a one sample buffer improved the sampling efficiency over all other methods, increasing the buffer size decreased the efficiency of the policies. The nine-sample buffer performed equivalently to serial sampling with respect to experiment number. Accounting for the accelerated sampling rate of asynchronous policies further amplifies this advantage.
This result not only indicates that asynchronous pessimistic policies can effectively navigate real-world chemistry systems, but it also shows viability in discrete numerical and other constrained spaces. Additionally, the observed improvement over serial methods while using a random forest belief model indicates asynchronous pessimism helps alleviate deficiencies in uncertainty estimation. The use of ensemble member variance as an uncertainty estimator in a non-parametric ensemble would likely not provide an optimal estimator of uncertainty. Despite this, the asynchronous policy performed favorably in a high complexity space. Further exploration and development of these methods could reduce the needs for accurate uncertainty estimates and enable the effective application of different models.
Disregarding the increased sampling rate of asynchronous policies, pessimistic policies may offer greater performance for Bayesian optimization algorithms in high-cost sampling systems. When considering the increased sampling rates, pessimistic policies provide a considerable advantage over existing realistic asynchronous and serial sampling approaches. To fully detail the capabilities of the methodologies presented in this work, additional benchmarking studies with similar strategies are required. Further implementation and development of the methods presented here could result in more efficient algorithm driven experimentation and more effective parallelization of experimental processes.
| This journal is © The Royal Society of Chemistry 2026 |