Open Access Article
This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

A critical examination of active learning workflows in materials science

Akhil S. Nair*ab and Lucas Foppaac
aThe NOMAD Laboratory at the Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany. E-mail: akhil.sugathan.nair@fu-berlin.de
bInstitut für Chemie und Biochemie, Freie Universität Berlin, Arnimallee 22, 14195, Berlin, Germany
cMolecular Simulations from First Principles e.V., Akazienstr. 3A, 10823 Berlin, Germany

Received 18th February 2026 , Accepted 22nd May 2026

First published on 25th May 2026


Abstract

Active learning (AL) is an increasingly important approach for data-efficient machine learning (ML) in materials science. It is widely used, from building training datasets to guiding autonomous materials discovery platforms. However, the performance of AL workflows depends on a number of often implicit design choices that are rarely examined systematically. Here, we critically analyze commonly used AL strategies in materials science, highlighting overlooked assumptions, hidden biases, and methodological limitations across different applications. Based on this, we provide practical guidelines to enhance the efficiency and reliability of AL workflows for materials science applications.


1. Introduction

The application of machine learning (ML) in materials science and engineering has expanded rapidly, enabling progress across a wide range of tasks, including high-throughput screening, accelerated materials property prediction and autonomous experimentation.1–4 Central to many of these efforts is the challenge of learning complex mappings between structure, composition, processing conditions, and materials properties, which are typically non-linear and only partially understood. By learning directly from data rather than relying on explicit physical models, ML methods provide a flexible framework for capturing such complex relationships and have demonstrated transformative potential in materials science.5,6 However, their success is fundamentally constrained by the availability, accuracy and quality of the training data. Despite advances in high-throughput experimentation and simulation frameworks,7,8 generating consistent, high-fidelity materials data remains time- and resource-intensive. As a result, many materials science applications operate in regimes better characterized as data-scarce rather than data-rich.9 Moreover, materials datasets are often not “statistically representative”, i.e., they tend to be shaped by prior domain knowledge, biased curation strategies, or feasibility constraints, and frequently fail to adequately represent the broader space of potentially interesting materials. As a result, ML models trained on such data are prone to unreliable performance when applied to unexplored or underrepresented regions.10,11

Active machine learning (hereafter referred to as active learning, AL) has emerged as a promising framework for addressing these challenges in ML-driven materials science. As formalized by Cohn et al.,12 “AL studies the closed-loop phenomenon of a learner selecting actions or making queries that influence what data are added to its training set”. In AL, the training dataset is iteratively updated during the learning process, as new data points are selected and labeled at each step. This is in contrast with static approaches which execute a predetermined design without incorporating feedback from incoming data or real-time experimental observations. By adaptively expanding the dataset, AL prioritizes the acquisition of the most relevant data points.13 While often conceptualized as a data-efficient labelling strategy, the applications of AL in materials science are broader. To illustrate the diversity of its use cases, we highlight two representative settings that are discussed in detail in this perspective: (i) efficient data acquisition for training ML models, and (ii) optimization-driven workflows for materials discovery. For data acquisition tasks, the goal of AL is to achieve broad and representative coverage of the relevant regions of the design space, enabling the ML models trained on the acquired data to generalize reliably across diverse conditions.14,15 In this context, the design space refers to all possible sets of inputs that define materials, for instance, all possible atomic arrangements and compositions. In autonomous platforms focused on materials discovery, the goal of AL is to sample specific, high-value regions of the design space. Here, AL strategies aim to balance exploration of previously unobserved regions with exploitation of ML model predictions and uncertainties to efficiently guide the search toward optimal candidates.16–18 The success of AL in such cases is defined by a specific discovery objective, such as identifying materials with target properties below a given threshold value with a minimal number of evaluations.19,20

While AL workflows are now routinely applied in materials science,17,21,22 their design choices vary widely, even for closely related tasks. This diversity partly reflects the aforementioned breadth of materials science applications and is therefore neither surprising nor inherently problematic. However, it also introduces substantial inconsistency in how AL workflows are set up, executed, and assessed, making it difficult to compare performance or draw general conclusions across studies. For example, for the data acquisition task, an AL workflow may focus to prioritize unfamiliar data points (e.g. novel compositions or structures),23 reduce biases in the initial dataset,24 improve predictive performance of ML models,15 or decrease model uncertainty.25 While these objectives are often interrelated, practitioners typically evaluate the effectiveness of AL using only a subset of metrics, most commonly improvements in model accuracy, without systematically assessing whether coverage of critical regions of the design space has also improved. Similarly, in materials discovery applications, the choice of surrogate models,10 acquisition functions (vida infra),26 and uncertainty quantification methods27 can strongly influence outcomes, yet these choices are often made heuristically and evaluated using application-specific criteria that hinder cross-study comparison. Moreover, a fundamental question remains insufficiently addressed: “to what extent does AL provide benefits beyond those achievable through simpler data selection strategies based on human intuition, prior domain knowledge, or hand-crafted rules?”. In other words, it is often unclear whether the observed performance gains stem from a principled, algorithmic AL framework or could instead be achieved by informed, human-guided selection of data points without an explicit AL loop. These issues highlight the need for modular AL workflows that enable consistent evaluation and comparison across applications, while still allowing flexibility to accommodate domain-specific objectives. Without such structure, AL workflows risk being inefficient, difficult to interpret, or misdirected, potentially leading to unnecessary computational or experimental cost and convergence toward suboptimal solutions.

In this perspective, we critically examine AL workflows commonly employed in materials science, focusing on two key applications of data acquisition and optimization-driven materials discovery. We begin with a brief overview of the AL methodology and analyze the strengths and limitations of tools and techniques used at different stages of AL workflows. We do not aim to provide a comprehensive review here and instead refer readers to existing review articles for the broader context.21,28 Our focus is to raise awareness of practitioner-driven biases in the design choices that can impact the performance of AL workflows. Building on this analysis, we propose guidelines to support the rigorous design, assessment, and interpretation of AL workflows in materials science.

2. Active learning methodology

An AL algorithm involves performing data acquisition iteratively and adaptively, with the goal of selecting the most important data points for labelling under a limited evaluation budget. In materials science, the “labels” are typically obtained from an oracle such as a high-fidelity simulation, a physical experiment, or expert annotation (Fig. 1a), all of which are costly in terms of time or resources. An AL workflow therefore seeks to maximize learning efficiency by prioritizing which data points to evaluate next, rather than relying on sampling performed randomly.29 It is important to distinguish AL from the classical framework of statistical design of experiments (DoE), which is widely used in engineering and industrial applications.30 Traditional DoE methods typically rely on predefined, space-filling or statistically optimal designs (e.g., Latin hypercube sampling31) that aim to efficiently explore the design space. In contrast, AL is inherently sequential and adaptive, selecting new samples based on information gained from previously acquired data. Because AL constructs a sequence of targeted queries, the framework is often referred to in the literature as “query learning”32,33 or “sequential learning”.34,35 In addition, different AL settings can be also distinguished depending on how unlabeled data are accessed or generated. In “query synthesis”, new candidate inputs are generated during the learning process rather than selected from a predefined dataset,36 whereas “stream-based AL” assumes that data arrive sequentially and must either be selected for labeling or discarded.37 However, in this perspective, we will use the term “active learning” and stick to discussing pool-based AL38 where it is assumed that a relatively larger pool of unlabelled data is available for labelling, because of its simplicity and its widely adopted usage in the field. Note that while the design space represents the total theoretical bounds of exploration and can be continuous, vast regions of it may not correspond to physically realizable or stable materials. A pool-based AL framework circumvents the need to map such abstract coordinates back to unique, valid materials, a task that is often non-trivial.
image file: d6dd00081a-f1.tif
Fig. 1 Schematic representation of (a) an AL workflow where both model-based and model-free AL strategies can be employed to acquire samples from pool data and update the seed data by interacting with an oracle, (b) various factors that need to be considered while designing the sampling strategy for AL.

Algorithm 1 outlines the core logic of pool-based AL, which can be tailored to accommodate specific objectives. In a supervised learning setting, let image file: d6dd00081a-t1.tif denote the initially available labelled dataset (hereafter referred to as “seed data”), and let image file: d6dd00081a-t2.tif represent the pool of unlabeled data (hereafter referred to as “pool data”). Here, image file: d6dd00081a-t3.tif represents the design space and image file: d6dd00081a-t4.tif the target space of quantities of interest (e.g., band gaps, formation energies). A surrogate ML model M is trained to approximate an unknown target function image file: d6dd00081a-t5.tif, mapping image file: d6dd00081a-t6.tif to image file: d6dd00081a-t7.tif. Crucially, the surrogate model should provide not only predictions but also reliable estimates of uncertainty, which quantify the model's confidence in its predictions at a given point. Common choices for surrogate models in AL include Gaussian processes (GP),39 random forests (RF),40 and neural networks,41 each offering different trade-offs in terms of predictive accuracy, training cost, and interpretability (see Table 2). At each iteration, a sampling strategy Q evaluates the pool data and selects a batch of k candidates image file: d6dd00081a-t8.tif. This selection is typically guided by the model's predictions and/or uncertainty estimates, with the goal of identifying regions of the design space where additional labels are expected to be most relevant. Note that Q can be deterministic, yielding a fixed set of top-ranked candidates20 or stochastic, where candidates are sampled based on a probability distribution.14 The selected candidates are then evaluated by an oracle image file: d6dd00081a-t9.tif to obtain corresponding labels, and the labeled dataset is updated accordingly. The selected candidates are removed from the unlabeled pool, and the model is retrained on the updated dataset. This process is repeated until a stopping criterion is satisfied, such as a performance threshold or exhaustion of a query budget B.

image file: d6dd00081a-u1.tif

3. Active learning applications and challenges

In this section, we highlight the key challenges in employing AL workflows for materials science problems through two representative application domains: (i) efficient data acquisition for training ML models (ii) optimization-driven materials discovery. It is to be noted that here, the term “materials discovery” refers broadly to identifying materials with desirable properties, whether among previously synthesized, computationally generated, or yet-to-be-synthesized candidates. In this sense, discovery entails jointly establishing material identity and properties within a vast, largely unexplored materials space. For both (i) and (ii), we discuss, based on existing literature, how AL can be useful and identify critical challenges that remain unaddressed or are frequently overlooked. We provide practical guidelines and recommendations for addressing these challenges for future AL-driven materials research in the Outlook section.

3.1. Efficient data acquisition for training ML models

3.1.1 Redundancy problem in materials data. Materials datasets are often curated based on prior knowledge (e.g. well-known materials with desirable properties) or ease of access (e.g. from existing data repositories). These practices can introduce substantial redundancy and lead to biased coverage of the broader materials space. Recent work by Li et al.42 demonstrated that widely used computational materials databases, including the Materials Project (MP)43 and OQMD,44 contain significant data redundancy. Using a data-pruning strategy, they showed that removing a large fraction of these redundant entries neither degraded in-distribution model performance nor improved out-of-distribution (OOD) generalization for ML models trained on the full datasets (Fig. 2a). While such redundancy is perhaps unsurprising given the high-throughput nature of computational database generation, biased sampling can persist even when subsets of these databases are selectively curated. For example, in many materials discovery studies,6,45 candidate materials are preferentially chosen near the convex hull of a given compositional phase diagram, which can distort an ML model's representation of the underlying stability landscape. Indeed, Bartel et al.46 showed that ML models can accurately predict formation energies yet still perform poorly in classifying stable versus unstable materials, particularly in sparsely sampled compositional spaces such as underrepresented quaternary systems in MP. In such scenarios, AL offers a promising alternative by adaptively focusing on data in the underexplored regions of the materials space. By doing so, AL can enable the construction of smaller, more relevant datasets that mitigate redundancy while maintaining or even improving predictive performance.
image file: d6dd00081a-f2.tif
Fig. 2 (a) Active learning reduces redundancy in materials datasets: performance of XGBoost (XGB) and Random Forests (RF) models on band gap prediction trained on datasets obtained by uncertainty guided active learning, pruning, and random sampling from the OQMD14 dataset. Comparable accuracy is achieved with only 10% of the data, highlighting substantial redundancy in the dataset, Adapted with permission from ref. 42, Copyright 2023 Springer (b) schematic representation of active learning bias induced by sampling of data points do not following i.i.d assumption, (c) information-entropy guided active learning (ETAL) minimizing the large structure-stability bias by improving the coverage of less symmetric crystal systems in the JARVIS dataset (top panel) and improved performance of ML models trained on such actively learned dataset compared to randomly sampled dataset (bottom panel). Adapted with permission from ref. 24, Copyright 2023 American Institute of Physics.
3.1.2 Inadequate sampling strategy. One of the key components of an AL workflow is the sampling strategy, which is used to indicate which datapoints to be selected from an unlabelled pool. Many of the AL workflows adapted in materials science15,47 use informativeness, i.e., ability of a sample to improve the model performance, as the sole sampling criterion. However, an effective AL sampling strategy is inherently multi-faceted and cannot be fully captured by informativeness alone (Fig. 1b). A key complementary criterion is representativeness, which assesses whether a queried sample reflects the structure of the unlabeled data distribution.48 This is particularly important to prevent sampling extreme outliers (e.g. highly distorted structures) that are not statistically representing the remaining, relevant design space of interest. While informativeness and representativeness are conceptually distinct, they are often complementary when AL performance is evaluated over the full search space. However, in the presence of a significant distribution mismatch between the labelled seed data and the unlabeled pool, the two criteria may diverge. In such cases, representativeness favors sampling from high-density regions of the pool distribution, whereas informativeness prioritizes regions that are underrepresented in the seed data, leading to potentially differing sampling. Various methods have been proposed by the ML community to include the representativeness factor,49,50 or jointly optimize informativeness and representativeness,51,52 yet most AL workflows in materials science remain focused on informativeness due to the ease of monitoring via proxies such as model test errors. Beyond these standard criteria, additional materials-science-specific factors need to be considered for sampling. These include: (i) diversity, which ensures that selected samples are sufficiently distinct relative to both the seed data and between themselves to avoid redundancy53,54 (ii) physical validity, ensuring that samples are chemically and physically meaningful, for example by excluding crystal structures under extreme conditions and (iii) feasibility, which accounts for practical constraints such as computational cost, favoring candidates that provide maximal information without incurring excessive expense (e.g. extremely large system sizes for simulations). In addition, for AL campaigns for which a reasonably large seed data are already available, pre-existing biases can persist, such as overrepresentation of certain chemistries, phases, or structural motifs. If these biases are not identified and addressed at the outset, AL may inadvertently reinforce them, leading to a sampling that further skews the dataset. Failure to consider these criteria while designing the sampling strategy can therefore severely limit the reliability of AL in materials science.
3.1.3 The ill-addressed active learning bias. While AL can mitigate the redundancy problem, it can paradoxically introduce a new form of bias, termed as active learning bias (ALB).55,56 This arises because during AL, samples are no longer drawn independently and identically distributed (i.i.d.), a fundamental assumption underlying ML model training. As a result, actively curated training sets may deviate substantially from the application-relevant data distribution in the design space of interest (Fig. 2b). This deviation has important implications for model training and evaluation. Standard empirical risk minimization, which optimizes model parameters by minimizing the average loss over the training data, implicitly assumes that the training set is representative of the target distribution. When this assumption is violated, as is often the case in AL, model performance measured on finite or randomly curated test sets may reflect optimization with respect to a biased objective rather than genuine generalization across the materials domain. In materials science applications, this can manifest as models that perform well on actively sampled configurations while failing to generalize to unexplored chemistries or structures. Statistical corrections have been proposed to mitigate ALB, such as reweighting the training loss by the inverse probability of sample acquisition,57 thereby partially restoring consistency with the underlying data distribution. However, such approaches have not yet been systematically explored in materials science. Moreover, reweighting introduces additional subtleties: in overparameterized models, including deep neural networks, ALB can sometimes act as an implicit form of regularization, reducing overfitting and even improving apparent generalization.58 While this effect may be beneficial in practice, it complicates the interpretation of AL performance, as improvements may stem from sampling-induced regularization rather than principled coverage of the design space. Because the magnitude and impact of ALB depend strongly on the mismatch between the seed dataset, the unlabeled pool, and the target application domain, careful attention to data distributions is essential. Incorporating explicit analysis of distributional coverage using tools such as dimensionality reduction59 or density estimation60,61 can therefore provide critical context for evaluating AL outcomes and for designing more robust, application-aware AL workflows.62,63 Without such considerations, AL strategies risk reinforcing hidden biases, limiting transferability, and overstating the effectiveness during the workflow deployment.
3.1.4 Model-based vs. model-free active learning. While most AL workflows employed in materials science are built around surrogate ML models and consequently face the challenges outlined above, strategies which do not involve training a surrogate model could be adapted as an alternative. These approaches are typically formulated under unsupervised settings, enabling sampling to target specific objectives without relying on model predictions or uncertainty estimates, in contrast to model-based approaches that operate in a supervised setting. In this work, we refer to such approaches as “model-free”. A key advantage of model-free strategies is their conceptual simplicity and flexibility. They are particularly useful for scenarios involving a very small amount of seed data, where surrogate models have limited predictive accuracy, and their uncertainty estimates may be unreliable. Zhang et al.24 employed an information-entropy-based AL workflow to mitigate structure-stability bias in computational crystal databases, where low-symmetry structures are often underrepresented. By prioritizing structurally informative samples, measured using information entropy, their approach improved the coverage in crystallographic space and yielded ML models with superior predictive performance compared to random sampling (Fig. 2c). Similarly, Schwalbe-Koda et al.64 demonstrated that atomistic information entropy, computed directly from local atomic descriptors, can serve as a model-free proxy for uncertainty of machine learning interatomic potentials (MLIPs), guiding molecular dynamics (MD) simulations. Beyond informativeness, model-free AL can explicitly enhance representativeness and diversity, two criteria that are often weakly controlled in model-based AL workflows. Density-based strategies, such as clustering or kernel density estimation, promote sampling from statistically significant regions of the design space, thereby preserving global coverage and facilitating the representative sampling.68,72 On the other hand, similarity-based model-free strategies emphasize improving diversity and minimizing redundancy. These methods are often implemented using some distance-based metrics, defined over feature, descriptor, latent, or embedding spaces42,73,74 and select samples that are maximally distinct from one another and from the existing seed data, promoting diversity. It has to be noted that the distinction between model-based and model-free approaches becomes less clear when distances are computed in latent spaces derived from trained models, as these representations implicitly depend on the model, even if the sampling criterion itself does not directly rely on the model. Alternatively, diversity can also be enforced using physically or chemically motivated similarity measures, for example, based on composition, local coordination environments, structural topology, bonding motifs, or symmetry classes.71,74 It has been shown that when labeled data are scarce, similarity-based model-free methods can outperform model-based AL due to their robustness against less accurate surrogate models.75 However, these approaches are not without limitations. High-dimensional vector spaces used to represent materials data may suffer from the curse of dimensionality, and outcomes of distance-based sampling are sensitive to the choice of representations (e.g. elemental-property-based features or SOAP descriptors76), similarity metrics (e.g. Euclidean or Mahalanobis distances), and analytical choices such as centroid-based versus nearest-neighbor selection.77 Additionally, since such approaches do not leverage predictive models, they may lack adaptivity to variations in the underlying response surface, potentially leading to inefficient sampling in regions where the target property varies non-uniformly across the design space.78 Recent benchmarking by Bi et al.79 further suggests that, on average, model-free strategies underperform model-based AL when evaluated across diverse materials datasets, as they lack explicit mechanisms to capture the relationship between samples and target properties. Despite these limitations, model-free AL remains practically valuable; it avoids the computational overhead of training and retraining surrogate models (e.g. ensembles of ML models) and remains effective when initial datasets are small or highly biased. These observations highlight an unresolved dilemma in AL design: whether to prioritize model-based or model-free strategies, and motivate hybrid approaches that combine the robustness of model-free sampling with the task-awareness of model-based methods.80,81 Representative model-free AL strategies and their applications in materials science are summarized in Table 1.
Table 1 Examples of model-free active learning strategies and representative applications in materials science
Sampling criterion Methods Central idea Application in materials science
Informativeness Entropy-guided Select samples that maximize information entropy computed from structural or descriptor distributions, enabling bias reduction and improved coverage without model uncertainty estimates Curation of bias-minimized crystal structure datasets;24 model-free uncertainty for MLIP-driven MD64,65
Representativeness Clustering Partition the unlabeled pool into clusters and select representative samples (e.g. cluster centroids) to preserve global coverage of the distribution and avoid oversampling statistical outliers Discovery of perovskite oxides for oxygen evolution catalysis;66 training data generation for MLIPs67
Density estimation Prioritize samples located in high-density regions of the unlabeled data distribution to ensure representativeness Functionalized nanoporous materials (MOFs/COFs) property prediction;68 assessing out-of-distribution performance of ML models60
Diversity Distance-based Select samples that maximize the minimum distance to the seed data in feature, output, latent, or an embedding space to avoid redundancy and maximize diversity Discovery of high-entropy oxides for H2 production;69 surface structure exploration for catalysis70
Physical metric-based Select batches of samples that are mutually dissimilar while also distinct from the existing labeled data based on a physical or chemically motivated metric Developing accurate property-prediction models for structure–property mapping of microstructures71


3.2. Optimization-driven materials discovery

3.2.1 Interplay between surrogate models and sampling strategy. The dominant AL practices in both computational and experimental settings for materials discovery are based on black-box optimization (BBO), with Bayesian Optimization (BO)88 being the most widely adopted approach.89 BO's ability to navigate complex search spaces under data-scarce conditions has led to numerous success stories in materials discovery.90–92 Notable examples include the identification of Pb-free BaTiO3-based piezoelectrics with enhanced electrostrictive strain,93 NiTi-based shape memory alloys with low thermal hysteresis,20 and efficient high-entropy alloy catalysts.92 The key components of a BO-driven AL (BO-AL) include: (i) a surrogate model to approximate the expensive objective function mapping the materials property to a set of given input parameters (ii) an acquisition function (analogous to sampling strategy in non-BO AL) to guide sample selection, and (iii) an oracle for labelling new data points. While BO traditionally relies on probabilistic surrogate models such as GP for their principled uncertainty estimates, non-Bayesian models have also been adopted in materials applications.94,95 This is often motivated by practical considerations, including the poor scalability of GP in high-dimensional spaces and the superior extrapolation performance observed with certain non-Bayesian models on specific datasets.96 Nevertheless, the criteria used to select a suitable surrogate model for BO-AL in materials science are not thoroughly discussed. Since no surrogate model demonstrates universally optimal performance across all problem settings, surrogate model selection is inherently task-dependent, with different models exhibiting varying performance across applications. For instance, Lim et al.94 demonstrated that GP with carefully selected kernels outperformed alternative models, including RF, on experimental materials datasets. In contrast, Liang et al.95 found that RF-based BO outperformed GP-BO using standard isotropic kernels and performed comparably to GP with anisotropic kernels. Tables 2 and 3 summarize some of the characteristics of various surrogate models and acquisition strategies used in BO-AL for materials science applications, respectively.
Table 2 Common surrogate models and their properties relevant to Bayesian optimization based active learning applications. The exp and comp in parenthesis of literature references indicates whether the works do involve experiments or purely computational simulations, respectively
Model type Data efficiency UQ Interpretability Cost Application in materials science
Gaussian processes High Principled (exact Bayesian posterior) Limited in high-dimension High image file: d6dd00081a-t10.tif Phase-change memory material for photonic switching devices78 (exp); layered materials with suitable electronic properties17 (comp)
Random forests Moderate Heuristic (ensemble-based) Moderate with feature importance Low Biochar synthesis for CO2 capture82 (exp); screening of inorganic materials83 (comp)
Gradient boosting methods Moderate Heuristic (ensemble-based) Moderate with feature importance Moderate High-entropy oxides for H2 production69 (exp); power factor prediction of thermoelectrics47 (comp)
Bayesian neural networks Low Heuristic (approximate posterior) Low High Optimal parameters for chemical reactions84 (exp); van der Waals heterostructures with suitable bandgaps85 (comp)
Support vector regression Moderate Limited Limited Moderate Shape memory alloys with low thermal hysteresis20 (exp); piezoelectric materials screening27 (comp)
Deep ensembles Low Heuristic (inter-model predictive variance) Low High Crystal structure prediction86 (comp); MLIP assisted material simulations87 (comp)
Symbolic regression High Limited High (analytical equations) High Screening of acid-stable oxides for electrocatalysis19 (comp)


Table 3 Common acquisition strategies and their properties relevant to Bayesian optimization-based active learning applications in materials science
Acquisition strategy Exploration-exploitation balance UQ dependency Application in materials science
Random sampling (RS) Exploration only (uninformed) None Baseline for benchmarking AL workflows in optimization-driven materials discovery34,95
Pure exploitation (XT) Exploitation only Low (mean prediction) Identifying perovskites with high bulk modulus;10 benchmarking against EI in shape memory alloy design20
Probability of improvement (PI) Balanced; tunable via ξ High Identifying materials with low lattice thermal conductivity;97 discovery of materials with the high melting points98
Expected improvement (EI) Balanced; tunable via ξ High Discovery of stable materials;99 accelerating synthesis of superconducting materials100
Upper confidence bound (UCB) Exploration-leaning; tunable via β High Mg alloy design;101 parametrization of DFT for accurate bandstructure prediction102
Probability of feasibility (POF) Exploitation (constraint-driven) High Computational discovery of acid-stable oxides19
Thompson sampling (TS) Exploration-leaning (stochastic) High (posterior sampling) Atomic structure determination;103 crystal structure prediction104
Knowledge gradient (KG) Balanced High Optimizing processing conditions of hybrid organic–inorganic perovskites105
Expected hypervolume improvement (EHVI/qNEHVI) Balanced (Pareto front-focused) High Nanoparticle synthesis;106 additive manufacturing107


Importantly, the performance of BO-AL is often not governed just by the surrogate model or acquisition function in isolation, but by their combined behaviour. A highly accurate surrogate model may still perform poorly if paired with a suboptimal sampling strategy. For example, expected improvement (EI), a popular acquisition function used in BO-AL for material property optimization, depends on the current best observation, which might be an unreliable benchmark if the seed data is strongly biased, misleading the optimization trajectory. Therefore, evaluating surrogate model–acquisition function combinations through after-the-fact (AFT) AL trials (where pool data is already labelled but excluded from seed data) can help identify optimal configurations,20 though these analyses may not always generalize to all pool datasets due to distribution shifts. As illustrated by Boley et al.10 (Fig. 3a), in the AFT-AL experiment for discovering perovskites with high bulk modulus, RF with both pure exploitation (XT) and EI acquisition functions perform similarly to that of GP with EI, but only as good as random sampling. However, in real AL runs (using DFT calculations as ground truth), GP with EI clearly outperforms the others, highlighting that surrogate model–acquisition function selection based solely on initial data may be misleading due to distributional shifts and the underrepresentation of high-performing materials. Although dynamic switching between acquisition functions has been proposed,108 it remains underexplored in BO-AL workflows applied for materials discovery. It is to be noted that such a sensitivity to sampling strategy is a general challenge in AL, even beyond BO frameworks. In particular, the choice of representation plays a critical role in shaping the trajectory and performance of AL.109,110 We have recently shown that efficient feature selection on-the-fly at each AL iteration can significantly enhance the performance of BO-AL campaigns compared to high-dimensional representations, which are not updated during AL iterations.111 Also, different representations induce distinct geometries of the design space, which further determine how similarity and diversity are quantified. This becomes especially important when the sampling strategy relies on geometric notions (e.g., distance-based approaches), as the representation directly influences sampling behaviour and the overall efficiency of AL. For instance, a materials representation based on global, composition-only descriptors (e.g., Magpie112) is fundamentally distinct from one based on local atomic environments (e.g., SOAP). A distance-based sampling strategy navigating a compositionally defined space will naturally prioritize exploring broad chemical families and diverse elemental combinations. This approach is ideal for AL workflows targeting exceptional materials when the underlying property landscape is predominantly composition-driven, such as screening vast compositional spaces for novel high-entropy alloys. Conversely, the same strategy operating within a local structural descriptor space will prioritize exploring local atomic environments which could be more suitable for AL focused on diversifying local environments when training MLIPs, where capturing subtle structural variations is vital for accurate force and energy predictions. Hence, the choice of representation must therefore be strictly aligned with both the specific objective of the AL campaign and the underlying physics of the target property landscape.


image file: d6dd00081a-f3.tif
Fig. 3 (a) Impact of surrogate model and acquisition function choices on active learning for identifying perovskites with high bulk modulus: Real AL runs (top panel) show varied efficiencies across GP and RF models with exploitation (XT), expected improvement (EI), and uniform (random) sampling acquisition functions, compared to retrospective seed-only baselines (bottom panel). Adapted with permission from ref. 10, Copyright 2024 Institute of Physics. (b) Lack of correlation between model performance and materials discovery in Bayesian optimization based AL for identifying high-bandgap materials. Colored curves denote acquisition functions (blue: expected value, violet: EI, red: maximum uncertainty, yellow: random sampling). Discovery yield (DY) (see Table 4 for definition) improves, but non-dimensional model error (NDME = RMSE/standard deviation of holdout set) does not consistently decrease. Adapted with permission from ref. 113, Copyright 2023 Royal Society of Chemistry.

For many applications, BO-AL is preferred to be applied in batched or parallel settings where a number of samples (instead of one as in standard BO) are selected, which could leverage the availability of oracles with parallel execution capabilities (e.g. high performance computing facilities, high-throughput synthesis platforms).22,84 However, standard acquisition functions are limited by their nonadditivity and do not account for information overlap between simultaneously queried samples. To address this limitation, specialized batch acquisition functions have been developed that explicitly account for correlations between candidate points. Notable examples include multipoint Expected Improvement (qEI),114 which generalizes EI to jointly evaluate a batch of points and the recently proposed multipoint Probability of Optimality (qPO),115 which maximizes the joint likelihood that the true optimum is contained within the selected batch. For large batches common in materials screening (10–100 samples), hybrid methods combining uncertainty-based acquisition with explicit diversity mechanisms such as determinantal point processes116 or clustering-based selection have been shown to improve coverage of the design space and reduce redundancy. It is important to note that while batching is desirable from a resource management perspective, it may degrade sampling efficiency if diversity and information gain are not explicitly accounted for, as acquisition functions can otherwise fail to correctly rank candidates in terms of marginal utility. This challenge is particularly pronounced in materials science applications, where multiscale structure–property relationships and heterogeneous synthesis or characterization pathways can further amplify redundancy and bias in batch selection. Nevertheless, platforms such as Pheonix,84 ChemOS,117 Olympous118 etc. have made success in enabling the application of BO-AL with robotic experimentation or lab automation systems to enable parallel acquisition and execution, useful for chemistry and materials science. In addition to batch selection, incorporating evaluation cost into the acquisition strategy is critical for real-world applications, as the cost of evaluating candidates can vary by orders of magnitude. This can be addressed either through cost-aware acquisition functions,119 such as Expected Improvement per Unit Cost (EIpu)120 which prioritize candidates offering high utility relative to their evaluation cost, or through multi-fidelity approaches,121 where each candidate can be evaluated at multiple levels of fidelity with different cost-accuracy trade-offs. Such approaches enable more efficient utilization of limited resources.

3.2.2 Unreliable uncertainty quantification. In AL strategies with an exploration component such as those used in BO, uncertainty quantification (UQ) plays a pivotal role. This is based on the notion that the surrogate models are most error-prone in regions where their predictions are least confident, and hence, acquiring data from such regions is often the most informative. Reliable UQ helps in identifying these regions by assigning a confidence interval to predictions, thus guiding the sampling strategy toward high-impact data points.122 However, two core challenges hinder this process: (i) most ML models lack inherent UQ capabilities (see Table 2), and (ii) many UQ techniques produce unreliable estimates which are often underconfident or overconfident relative to actual prediction errors.123,124 Unreliable UQ can misguide AL by undervaluing informative samples, potentially leading to premature convergence or excessive exploitation of suboptimal regions. For instance, in BO-AL with EI as the acquisition function, overconfident uncertainty estimates shrink the exploration term, causing the algorithm to overlook uncertain but informative regions in favor of already well-explored ones.

The reliability of UQ in ML remains an open research challenge in materials science, with relatively few studies that critically benchmark UQ methods.127,128 Existing efforts typically evaluate the quality of uncertainty estimates based on hold-out test sets using metrics based on the uncertainty distribution,129 correlation with prediction errors,127 or computational cost.128 These studies consistently show that no single UQ method outperforms across all scenarios, with strong dependencies on surrogate model and dataset characteristics. Nevertheless, calibration approaches such as scaling uncertainty estimates with respect to residuals can enhance their reliability.129,130 Openly available tools like uncertainty-toolbox131 and UQLab132 facilitate such calibration processes. However, the effectiveness of these calibration methods within AL workflows remains largely unexplored, with existing studies still in their early stages.133 Since such approaches require labeled data, they are constrained to the currently known samples and hence cannot guarantee improved uncertainty estimates on the unlabeled pool data where accurate UQ is most critical. This highlights the need for more standardized, AL-focused metrics to assess and compare UQ methods for materials property prediction and optimization tasks.

Since one of the reasons for unreliable UQ is that the surrogate models are primarily trained to minimize prediction error without providing default uncertainty estimates, a range of alternative, model-agnostic UQ strategies have been proposed to address this challenge. These rely on heuristic measures of uncertainty, such as distance in feature134 or latent spaces73 and are not specifically dependent on predictive model architecture. It has been demonstrated that latent space distance provides more reliable uncertainty estimates (by better capturing of residuals) for both artificial73 and graph135 neural networks, indicating that such model-agnostic methods could be beneficial to AL. Similarly, integrating Domain of Applicability (DoA) analysis59,60 with UQ in AL workflows allows for the identification and expansion of the model's reliable prediction regime, enabling more effective sample selection near or beyond the current DoA to improve model robustness.136 However, these methods are more mature in cheminformatics, and lack standardized implementation in materials science, due to the absence of universally accepted DoA frameworks.60,137 Also, a fair comparison between these model-agnostic vs. model-based UQ methods for AL performance for materials discovery has not been done to the best of our knowledge, which is also necessary for improving the reliability of AL practices in materials science.

3.2.3 Lack of standardized performance evaluation metrics. A rigorous and accurate evaluation of AL performance is essential for assessing its efficiency and determining when to terminate the AL loop. This is especially critical in closed-loop, automated experimentation platforms with limited human intervention, where failure to recognize diminishing returns can result in substantial resource consumption. Various performance metrics have been proposed to quantify AL efficiency in optimization-driven workflows for materials discovery. However, no single metric universally captures all aspects of performance, and even metrics which are expected to be complementary could show diverging trends for performance estimation throughout AL iterations. The most popular metrics used in assessing the efficacy of AL are based on comparison with random sampling,94,125 which involves sampling data points from the pool without the use of a surrogate model-derived insight and hence also can be referred to as “passive learning”. Rohr et al.34 proposed quantitative metrics such as the acceleration and enhancement factors (see Table 4), measuring the fraction of promising materials found and the iterations required to find them relative to random sampling, which have been used to benchmark surrogate models and acquisition strategies for AFT materials discovery campaigns.95,138 However, using random sampling as a baseline may not be considered a universal performance metric in materials science, where experimental choices are typically guided by prior knowledge rather than random selection. Moreover, AL can underperform random sampling if the surrogate model's inductive bias, i.e., the set of foundational assumptions about the underlying data, does not accurately reflect the true property landscape, thereby misguiding the search process.139 In such cases, acquisition strategies like uncertainty sampling can become ‘trapped’ in intrinsically noisy or physically irrelevant regions of the design space that offer little contribution to the model's generalizability. This was recently found to be the case for training MLIPs,140 where random sampling achieved superior predictive accuracy because the AL oversampled high-energy and distorted configurations. This introduced systematic energy offsets and compromised the model's performance within the physically relevant regions of interest.
Table 4 Evaluation metrics used for assessing AL performance for optimization-driven materials discovery. iAL is the AL iteration(s) and iRS is the random sampling iterations. The figure of merit (FOM) is a general quantitative metric measuring the AL efficiency and could indicate the desired material property value, the number of promising materials in a target range, etc. The previous studies employed the metrics for performance evaluation of AL are indicated
Metric Definition Limitation
Discovery yield34,113 image file: d6dd00081a-t11.tif Requires apriori knowledge of interesting materials in the pool
Acceleration factor95,113 image file: d6dd00081a-t12.tif Not (always) a fair baseline
Enhancement factor95,113 image file: d6dd00081a-t13.tif Not (always) a fair baseline
Decision efficiency34 image file: d6dd00081a-t14.tif Dependence on pool quality
Model accuracy42,125 Error of surrogate model estimated on a (holdout) test set Prone to overestimation due to active learning bias
Model uncertainty19,126 Uncertainty of surrogate model Prone to misleading due to unreliable uncertainties


Another common evaluation criterion in optimization-driven AL workflows is tracking model performance over AL iterations. However, Borg et al.113 observed that improvements in model accuracy do not necessarily correlate with better discovery rates of high-performing materials. As shown in Fig. 3b, AFT BO-AL campaigns targeting high band gap materials achieved high discovery rates, even without noticeable improvements in surrogate model accuracy. This suggests that the popular notion of “model is getting better” may not qualify as an accurate evaluation criteria for AL for materials discovery. This arises because global accuracy metrics (e.g. model error measured on hold out test sets) primarily reflect performance in the bulk of the data distribution, while successful discovery depends on the model's ability to prioritize candidates in sparsely populated, high-performing regions. Similarly, Koizumi et al.14 demonstrated that the AL performance can vary substantially with changes in the material property, descriptor dimension and size of seed data even when using a fixed choice for surrogate model and sampling strategy. Kim et al.141 emphasized that beyond the model and seed data, the quality of the candidate pool, particularly the fraction of promising materials, critically affects AL performance evaluation. They pointed out that the efficiency of optimization-driven AL workflows depend on how likely candidates in the pool are to outperform those in the initial seed data and proposed metrics such as predicted fraction of improved candidates to quantify the pool quality. It is important to note that these insights are derived from post hoc studies with fully labeled datasets and may not directly translate to real-time AL workflows involving partial and potentially noisy data acquisition. Nevertheless, the use of diverse datasets spanning different properties, surrogate models, and acquisition functions, along with repeated trials to gather statistics suggests that these trends may hold more broadly, including for AL workflows involving real-time experiments and simulations.

In BO-AL frameworks focused on multi-fidelity or multi-objective optimization, performance metrics must be adapted beyond those used in single-fidelity and single-objective tasks. In the multi-fidelity setting, data acquisition comes with differing costs and fidelities, making it essential to quantify not just predictive accuracy, but also cost-efficiency. Acquisition functions that explicitly incorporate cost information across fidelity levels such as multi-fidelity Expected Improvement, multi-fidelity Maximum Entropy Search,142 and Targeted Variance Reduction,143 have been proposed for this purpose. Despite their promise, the performance of multi-fidelity AL is highly sensitive to the cost ratio and informativeness across the different fidelities. Jacobs et al.144 demonstrated that low-fidelity data improves optimization only when it is significantly cheaper (e.g. ≤5% of high-fidelity cost) and partially available upfront. This underscores the need for performance metrics that jointly account for data cost and the degree of information transfer between fidelity levels. Steps in this direction include the recently proposed discount metric, which measures how much cheaper a multi-fidelity optimization campaign is compared to a single-fidelity one to achieve a comparable quality of solution.121 In multi-objective AL, the aim is to identify candidate materials that perform well across multiple properties. Most studies have centered on Pareto optimization, where the goal is to approximate the Pareto front of non-dominated solutions,145,146 using acquisition strategies such as Expected Hypervolume Improvement (EHVI),147,148 Pareto AL,149 and scalarization-based approaches such as ParEGO.146 As the number of objectives grows, however, computing the hypervolume improvement becomes increasingly expensive and coverage of the Pareto front becomes sparse, limiting the efficiency of these approaches. Furthermore, materials properties are often unevenly distributed, and may vary substantially in measurement cost, which are not accounted for in standard multi-objective formulations. These challenges collectively highlight the pressing need for task-aware and cost-sensitive AL performance metrics that go beyond standard error, accuracy, or hypervolume, and that reflect the practical realities of materials design.

4. Outlook

AL has shown substantial promise for advancing data-driven materials science, yet its broader impact remains limited by unresolved challenges in the design, execution, and evaluation of AL workflows. While this perspective has highlighted several aspects in this direction, we conclude by summarizing these into a set of core challenges and outlining future directions that we believe are most critical for advancing the field.

• Towards systematic selection of design choices: AL workflows involve many interacting design choices including the surrogate model, acquisition function, sampling strategy, UQ method, and stopping criterion, yet these components are rarely tuned systematically for a specific materials task. In analogy to ML model selection, these workflow components should be treated as hyperparameters that require careful optimization before deploying AL in resource-intensive discovery campaigns. An important next step is the development of large-scale benchmarking studies that systematically compare combinations of these choices across diverse materials datasets and target properties. Such benchmarks could establish practical guidelines and serve as reference points for future AL applications in related materials design problems. In parallel, open and modular software frameworks that support automated optimization of these hyperparameters would reduce reliance on ad hoc decisions and lower the barrier to adoption for practitioners.

• Improved integration of domain knowledge and practical constraints: generic AL frameworks often overlook the domain-specific constraints inherent to materials science, such as physical validity of candidate structures, synthetic feasibility and cost. This limitation is particularly pronounced in complex, non-uniform design spaces, where the underlying property landscape may vary significantly across regions, and uninformed sampling can fail to concentrate effort where it is most needed. When such considerations are neglected, AL may waste valuable evaluation budget on impractical or infeasible candidates, reducing the efficiency gains. Future AL workflows should incorporate domain knowledge and practical constraints, for example, through physics-informed surrogate models or sampling strategies that directly encode feasibility constraints.78,150 Additionally, physically motivated representations enable AL to be grounded on the relevant structural and physicochemical characteristics of the materials. Such integration not only enhances data efficiency but also improves the reliability and interpretability of AL outcomes, particularly in scenarios with limited data.

• Leveraging emerging AI paradigms: several recent advances in AI offer promising pathways to address persistent limitations of current AL workflows. The unreliability of surrogate models in early AL stages could be alleviated by incorporating pretrained or foundational models trained on broad materials or even synthetic datasets, which provide more reliable uncertainty estimates under low-data regimes.151 However, care must be taken as these models can introduce systematic inductive biases; if the pretraining distribution is poorly aligned with the target domain, the AL process may become trapped in over-represented regions of the design space. Also, for high-dimensional and complex design spaces where conventional surrogate-based AL struggles, reformulating the sampling problem as sequential decision-making through reinforcement learning offers a route to more adaptive exploration policies.152 In parallel, recent advances in generative models153,154 open opportunities to go beyond pool-based AL, where a predefined set of unlabeled candidates is required, which may not be feasible for many materials discovery applications. In this setting, AL can be combined with generative models to sample new candidates by exploring an open design space, which are then decoded into material compositions and structures using generative models such as variational autoencoders.155 However, it is important to ensure that such generated candidates are physically plausible and chemically meaningful by enforcing strict physicochemical constraints. Realizing the full potential of these approaches also requires a stronger integration between the materials science and ML communities, as many advances relevant to AL, such as advanced UQ methods156 developed by the later remain underexplored for AL for materials science applications.

Author contributions

A. S. N. proposed the idea, conducted the literature study and drafted the manuscript. All authors discussed and commented on the manuscript.

Conflicts of interest

The authors declare no conflicts of interest.

Data availability

No data is newly generated during this work.

Acknowledgements

A. S. N. thanks Matthias Scheffler for helpful discussions, and Beate Paulus and the German Research Foundation (DFG) for support through the Walter-Benjamin Fellowship Program (project no. 540316537).

References

  1. K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev and A. Walsh, Machine learning for molecular and materials science, Nature, 2018, 559, 547 CrossRef CAS PubMed.
  2. R. Vasudevan, G. Pilania and P. V. Balachandran, Machine learning for materials design and discovery, J. Appl. Phys., 2021, 129, 070401 CrossRef CAS.
  3. S. Axelrod, D. Schwalbe-Koda, S. Mohapatra, J. Damewood, K. P. Greenman and R. Gómez-Bombarelli, Learning matter: Materials design with machine learning and atomistic simulations, Acc. Mater. Res., 2022, 3, 343 CrossRef CAS.
  4. B. G. Sumpter, R. K. Vasudevan, T. Potok and S. V. Kalinin, A bridge for accelerating materials by design, npj Comput. Mater., 2015, 1, 1 Search PubMed.
  5. E. O. Pyzer-Knapp, J. W. Pitera, P. W. Staar, S. Takeda, T. Laino, D. P. Sanders, J. Sexton, J. R. Smith and A. Curioni, Accelerating materials discovery using artificial intelligence, high performance computing and robotics, npj Comput. Mater., 2022, 8, 84 CrossRef.
  6. A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon and E. D. Cubuk, Scaling deep learning for materials discovery, Nature, 2023, 624, 80 CrossRef CAS PubMed.
  7. W. F. Maier, K. Stoewe and S. Sieg, Combinatorial and high-throughput materials science, Angew. Chem., Int. Ed., 2007, 46, 6016 CrossRef CAS PubMed.
  8. S. Curtarolo, G. L. Hart, M. B. Nardelli, N. Mingo, S. Sanvito and O. Levy, The high-throughput highway to computational materials design, Nat. Mater., 2013, 12, 191 CrossRef CAS PubMed.
  9. P. Xu, X. Ji, M. Li and W. Lu, Small data machine learning in materials science, npj Comput. Mater., 2023, 9, 42 CrossRef.
  10. S. Bauer, P. Benner, T. Bereau, V. Blum, M. Boley, C. Carbogno, R. Catlow, G. Dehm, S. Eibl and R. Ernstorfer, et al., Roadmap on data-centric materials science, Model. Simulat. Mater. Sci. Eng., 2024, 063301 CrossRef.
  11. P. Karande, B. Gallagher and T. Y.-J. Han, A strategic approach to machine learning for material science: how to tackle real-world challenges and avoid pitfalls, Chem. Mater., 2022, 34, 7650 CrossRef CAS.
  12. D. A. Cohn, Z. Ghahramani and M. I. Jordan, Active learning with statistical models, J. Artif. Intell. Res., 1996, 4, 129 CrossRef.
  13. D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Mach. Learn., 1994, 15, 201 CrossRef.
  14. A. Koizumi, G. Deffrennes, K. Terayama and R. Tamura, Performance of uncertainty-based active learning for efficient approximation of black-box functions in materials science, Sci. Rep., 2024, 14, 27019 CrossRef CAS PubMed.
  15. A. Jose, E. Devijver, N. Jakse and R. Poloni, Informative training data for efficient property prediction in metal–organic frameworks by active learning, J. Am. Chem. Soc., 2024, 146, 6134 CrossRef CAS.
  16. H. Jang, W. Lee, H.-J. Kim, S. Cha, H. Shin, W. B. Lee, M.-W. Oh, Y. S. Jung and Y. Kim, Active learning-guided accelerated discovery of ultra-efficient high-entropy thermoelectrics, Adv. Mater., 2026, 38, e15054 CrossRef CAS PubMed.
  17. L. Bassman Oftelie, P. Rajak, R. K. Kalia, A. Nakano, F. Sha, J. Sun, D. J. Singh, M. Aykol, P. Huck and K. Persson, et al., Active learning for accelerated design of layered materials, npj Comput. Mater., 2018, 4, 74 CrossRef.
  18. K. Tran and Z. W. Ulissi, Active learning across intermetallics to guide discovery of electrocatalysts for co2 reduction and h2 evolution, Nat. Catal., 2018, 1, 696 CrossRef CAS.
  19. A. S. Nair, L. Foppa and M. Scheffler, Materials-discovery workflow guided by symbolic regression for identifying acid-stable oxides for electrocatalysis, npj Comput. Mater., 2025, 11, 1 Search PubMed.
  20. D. Xue, P. V. Balachandran, J. Hogden, J. Theiler, D. Xue and T. Lookman, Accelerated search for materials with targeted properties by adaptive design, Nat. Commun., 2016, 7, 1 Search PubMed.
  21. T. Lookman, P. V. Balachandran, D. Xue and R. Yuan, Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design, npj Comput. Mater., 2019, 5, 21 CrossRef.
  22. O. Ozbayram, D. Olsen, M. Annamaraju, A. E. Robertson, A. Venkatraman, S. R. Kalidindi, M. Zhou and L. Graham-Brady, Batch active learning for microstructure–property relations in energetic materials, Mech. Mater., 2025, 205, 105308 CrossRef.
  23. M. Hu, Q. Tan, R. Knibbe, M. Xu, G. Liang, J. Zhou, J. Xu, B. Jiang, X. Li and M. Ramajayam, et al., Designing unique and high-performance al alloys via machine learning: Mitigating data bias through active learning, Comput. Mater. Sci., 2024, 244, 113204 CrossRef CAS.
  24. H. Zhang, W. W. Chen, J. M. Rondinelli and W. Chen, et al, entropy-targeted active learning for bias mitigation in materials data, Appl. Phys. Rev., 2023, 10, 021403 CAS.
  25. K. Kang, T. A. Purcell, C. Carbogno and M. Scheffler, Accelerating the training and improving the reliability of machine-learned interatomic potentials for strongly anharmonic materials through active learning, Phys. Rev. Mater., 2025, 9, 063801 CrossRef CAS.
  26. A. Wang, H. Liang, A. McDannald, I. Takeuchi and A. G. Kusne, Benchmarking active learning strategies for materials optimization and discovery, Oxford Open Mater. Sci., 2022, 2, itac006 CrossRef CAS.
  27. Y. Tian, R. Yuan, D. Xue, Y. Zhou, X. Ding, J. Sun and T. Lookman, Role of uncertainty estimation in accelerating materials development via active learning, J. Appl. Phys., 2020, 128, 014103 CrossRef CAS.
  28. M. Kulichenko, B. Nebgen, N. Lubbers, J. S. Smith, K. Barros, A. E. Allen, A. Habib, E. Shinkle, N. Fedik and Y. W. Li, et al., Data generation for machine learning interatomic potentials and beyond, Chem. Rev., 2024, 124, 13681 CrossRef CAS PubMed.
  29. K. Konyushkova, R. Sznitman and P. Fua, Learning active learning from data, Adv. Neural Inf. Process. Syst., 2017, 30, 4228 Search PubMed.
  30. T. Lookman, P. V. Balachandran, D. Xue, J. Hogden and J. Theiler, Statistical inference and adaptive design for materials discovery, Curr. Opin. Solid State Mater. Sci., 2017, 21, 121 CrossRef CAS.
  31. M. Stein, Large sample properties of simulations using latin hypercube sampling, Technometrics, 1987, 29, 143 CrossRef.
  32. C. Campbell, N. Cristianini, A. Smola, et al., in Query learning with large margin classifiers, ICML, 2000, vol. 20 Search PubMed.
  33. J.-N. Hwang, J. J. Choi, S. Oh and R. Marks, et al., Query-based learning applied to partially trained multilayer perceptrons, IEEE Trans. Neural Network., 1991, 2, 131 CrossRef CAS PubMed.
  34. B. Rohr, H. S. Stein, D. Guevarra, Y. Wang, J. A. Haber, M. Aykol, S. K. Suram and J. M. Gregoire, Benchmarking the acceleration of materials discovery by sequential learning, Chem. Sci., 2020, 11, 2696 RSC.
  35. H. Khosravi, T. Olajire, A. S. Raihan and I. Ahmed, A data driven sequential learning framework to accelerate and optimize multi-objective manufacturing decisions, J. Intell. Manuf., 2024, 1, 4087 CrossRef.
  36. D. Angluin, Queries and concept learning, Mach. Learn., 1988, 2, 319 CrossRef.
  37. L. Atlas, D. Cohn and R. Ladner, Training connectionist networks with queries and selective sampling, Adv. Neural Inf. Process. Syst., 1989, 2, 566–573 Search PubMed.
  38. D. D. Lewis, A sequential algorithm for training text classifiers: Corrigendum and additional data, ACM SIGIR Forum, 1995, 29, 13–19 CrossRef.
  39. C. Williams and C. Rasmussen, Gaussian processes for regression, Adv. Neural Inf. Process. Syst., 1995, 8, 514–520 Search PubMed.
  40. L. Breiman, Random forests, Mach. Learn., 2001, 45, 5 CrossRef.
  41. Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature, 2015, 521, 436 CrossRef CAS PubMed.
  42. K. Li, D. Persaud, K. Choudhary, B. DeCost, M. Greenwood and J. Hattrick-Simpers, Exploiting redundancy in large materials datasets for efficient machine learning with less data, Nat. Commun., 2023, 14, 7283 CrossRef CAS.
  43. A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner and G. Ceder, et al., Commentary: The materials project: A materials genome approach to accelerating materials innovation, APL Mater., 2013, 1, 011002 CrossRef.
  44. S. Kirklin, J. E. Saal, B. Meredig, A. Thompson, J. W. Doak, M. Aykol, S. Rühl and C. Wolverton, The open quantum materials database (oqmd): assessing the accuracy of dft formation energies, npj Comput. Mater., 2015, 1, 1 Search PubMed.
  45. P. Lyngby and K. S. Thygesen, Data-driven discovery of 2d materials by deep generative models, npj Comput. Mater., 2022, 8, 232 CrossRef.
  46. C. J. Bartel, A. Trewartha, Q. Wang, A. Dunn, A. Jain and G. Ceder, A critical examination of compound stability predictions from machine-learned formation energies, npj Comput. Mater., 2020, 6, 97 CrossRef.
  47. Y. Sheng, Y. Wu, J. Yang, W. Lu, P. Villars and W. Zhang, Active learning for the power factor prediction in diamond-like thermoelectric materials, npj Comput. Mater., 2020, 6, 171 CrossRef CAS.
  48. B. Settles, Active learning literature survey, Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009 Search PubMed.
  49. Z. Wang and J. Ye, Querying discriminative and representative samples for batch mode active learning, ACM Trans. Knowl. Discov. Data, 2015, 9, 1 CAS.
  50. X. Li, D. Kuang and C. X. Ling, Active learning for hierarchical text classification, in Pacific-Asia conference on knowledge discovery and data mining, Springer, 2012, pp. 14–25 Search PubMed.
  51. B. Du, Z. Wang, L. Zhang, L. Zhang, W. Liu, J. Shen and D. Tao, Exploring representativeness and informativeness for active learning, IEEE Trans. Cybern., 2015, 47, 14 Search PubMed.
  52. S.-J. Huang, R. Jin and Z.-H. Zhou, Active learning by querying informative and representative examples, Adv. Neural Inf. Process. Syst., 2010, 23, 892–900 Search PubMed.
  53. V. Zaverkin, D. Holzmüller, I. Steinwart and J. Kästner, Exploring chemical and conformational spaces by batch mode deep active learning, Digital Discovery, 2022, 1, 605 RSC.
  54. K. Brinker, Incorporating diversity in active learning with support vector machines, in Proceedings of the 20th international conference on machine learning, (ICML-03), 2003, pp. 59–66 Search PubMed.
  55. D. J. MacKay, Information-based objective functions for active data selection, Neural Comput., 1992, 4, 590 CrossRef.
  56. S. Dasgupta, Two faces of active learning, Theor. Comput. Sci., 2011, 412, 1767 CrossRef.
  57. S. Farquhar, Y. Gal, and T. Rainforth, On statistical bias in active learning: How and when to fix it, arXiv, 2021, preprint, arXiv:2101.11665,  DOI:10.48550/arXiv.2101.11665.
  58. C. Murray, J. U. Allingham, J. Antorán and J. M. Hernández-Lobato, Addressing bias in active learning with depth uncertainty networks…or not, in I (Still) Can't Believe It's Not Better! Workshop at NeurIPS 2021, PMLR, 2022, pp. 59–63 Search PubMed.
  59. J. Hu, D. Liu, N. Fu and R. Dong, Realistic material property prediction using domain adaptation based machine learning, Digital Discovery, 2024, 3, 300 RSC.
  60. L. E. Schultz, Y. Wang, R. Jacobs and D. Morgan, A general approach for determining applicability domain of machine learning models, npj Comput. Mater., 2025, 11, 95 CrossRef.
  61. C. Zeni, A. Anelli, A. Glielmo and K. Rossi, Exploring the robust extrapolation of high-dimensional machine learning potentials, Phys. Rev. B, 2022, 105, 165141 CrossRef CAS.
  62. X. Yang, Y. Liu, C. Mi and X. Wang, Active learning kriging model combining with kernel-density-estimation-based importance sampling method for the estimation of low failure probability, J. Mech. Des., 2018, 140, 051402 CrossRef.
  63. S. Xiong, J. Azimi and X. Z. Fern, Active learning of constraints for semi-supervised clustering, IEEE Trans. Knowl. Data Eng., 2013, 26, 43 Search PubMed.
  64. D. Schwalbe-Koda, S. Hamel, B. Sadigh, F. Zhou and V. Lordi, Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory, Nat. Commun., 2025, 16, 4014 CrossRef CAS PubMed.
  65. A. PA Subramanyam and D. Perez, Information-entropy-driven generation of material-agnostic datasets for machine-learning interatomic potentials, npj Comput. Mater., 2025, 11, 218 CrossRef CAS.
  66. J. Moon, W. Beker, M. Siek, J. Kim, H. S. Lee, T. Hyeon and B. A. Grzybowski, Active learning guides discovery of a champion four-metal perovskite oxide for oxygen evolution electrocatalysis, Nat. Mater., 2024, 23, 108 CrossRef CAS PubMed.
  67. J. Qi, T. W. Ko, B. C. Wood, T. A. Pham and S. P. Ong, Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling, npj Comput. Mater., 2024, 10, 43 CrossRef.
  68. V. Gkatsis, P. Maratos, C. Rekatsinas, G. Giannakopoulos and P. Krokidas, Density-aware active learning for materials discovery: A case study on functionalized nanoporous materials, Phys. Chem. Chem. Phys., 2025, 23152–23165 RSC.
  69. S. Nie, Y. Xiang, L. Wu, G. Lin, Q. Liu, S. Chu and X. Wang, Active learning guided discovery of high entropy oxides featuring high h2-production, J. Am. Chem. Soc., 2024, 146, 29325 CrossRef CAS PubMed.
  70. H. Jung, L. Sauerland, S. Stocker, K. Reuter and J. T. Margraf, Machine-learning driven global optimization of surface adsorbate geometries, npj Comput. Mater., 2023, 9, 114 CrossRef.
  71. H. Liu, B. Yucel, B. Ganapathysubramanian, S. R. Kalidindi, D. Wheeler and O. Wodo, Active learning for regression of structure–property mapping: the importance of sampling and representation, Digital Discovery, 2024, 3, 1997 RSC.
  72. Y. Kim and B. Shin, in defense of core-set: A density-aware core-set selection for active learning, in Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, 2022, pp. 804–812 Search PubMed.
  73. J. P. Janet, C. Duan, T. Yang, A. Nandy and H. J. Kulik, A quantitative uncertainty metric controls error in neural network-driven chemical discovery, Chem. Sci., 2019, 10, 7913 RSC.
  74. Q. Li, N. Fu, S. S. Omee and J. Hu, Md-hit: Machine learning for material property prediction with dataset redundancy control, npj Comput. Mater., 2024, 10, 245 CrossRef.
  75. X. Zhan, H. Liu, Q. Li and A. B. Chan, A comparative survey: Benchmarking for pool-based active learning, in International Joint Conference on Artificial Intelligence, 2021, pp. 4679–4686 Search PubMed.
  76. A. P. Bartók, R. Kondor and G. Csányi, On representing chemical environments, Phys. Rev. B: Condens. Matter Mater. Phys., 2013, 87, 184115 CrossRef.
  77. S. S. Omee, N. Fu, R. Dong, M. Hu and J. Hu, Structure-based out-of-distribution (ood) materials property prediction: a benchmark study, npj Comput. Mater., 2024, 10, 144 CrossRef.
  78. A. G. Kusne, H. Yu, C. Wu, H. Zhang, J. Hattrick-Simpers, B. DeCost, S. Sarker, C. Oses, C. Toher and S. Curtarolo, et al., On-the-fly closed-loop materials discovery via bayesian active learning, Nat. Commun., 2020, 11, 5966 CrossRef CAS PubMed.
  79. J. Bi, Y. Xu, F. Conrad, H. Wiemer and S. Ihlenfeldt, A comprehensive benchmark of active learning strategies with automl for small-sample regression in materials science, Sci. Rep., 2025, 15, 37167 CrossRef CAS PubMed.
  80. A. Jose, J. P. A. de Mendonça, E. Devijver, N. Jakse, V. Monbet and R. Poloni, Regression tree-based active learning, Data Min. Knowl. Discov., 2024, 38, 420 CrossRef.
  81. S. Kee, E. Del Castillo and G. Runger, Query-by-committee improvement with diversity and density in batch active learning, Inf. Sci., 2018, 454, 401 CrossRef.
  82. X. Yuan, M. Suvarna, J. Y. Lim, J. Pérez-Ramírez, X. Wang and Y. S. Ok, Active learning-based guided synthesis of engineered biochar for co2 capture, Environ. Sci. Technol., 2024, 58, 6628 CrossRef CAS PubMed.
  83. K. Min and E. Cho, Accelerated discovery of novel inorganic materials with desired properties using active learning, J. Phys. Chem. C, 2020, 124, 14759 CrossRef CAS.
  84. F. Hase, L. M. Roch, C. Kreisbeck and A. Aspuru-Guzik, Phoenics: a bayesian optimizer for chemistry, ACS Cent. Sci., 2018, 4, 1134 CrossRef CAS PubMed.
  85. M. Fronzi, O. Isayev, D. A. Winkler, J. G. Shapter, A. V. Ellis, P. C. Sherrell, N. A. Shepelin, A. Corletto and M. J. Ford, Active learning in bayesian neural networks for bandgap predictions of novel van der waals heterostructures, Adv. Intell. Syst., 2021, 3, 2100080 CrossRef.
  86. S. S. Hessmann, K. T. Schütt, N. W. Gebauer, M. Gastegger, T. Oguchi and T. Yamashita, Accelerating crystal structure search through active learning with neural networks for rapid relaxations, npj Comput. Mater., 2025, 11, 44 CrossRef.
  87. L. Zhang, D.-Y. Lin, H. Wang, R. Car and W. E, Active learning of uniformly accurate interatomic potentials for materials simulation, Phys. Rev. Mater., 2019, 3, 023804 CrossRef CAS.
  88. B. Shahriari, K. Swersky, Z. Wang, R. P. Adams and N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE, 2015, 104, 148 Search PubMed.
  89. Y. Wu, A. Walsh and A. M. Ganose, Race to the bottom: Bayesian optimisation for chemical problems, Digital Discovery, 2024, 3, 1086 RSC.
  90. A. Deshwal, C. M. Simon and J. R. Doppa, Bayesian optimization of nanoporous materials, Mol. Syst. Des. Eng., 2021, 6, 1066 RSC.
  91. P. Honarmandi, V. Attari and R. Arroyave, Accelerated materials design using batch bayesian optimization: A case study for solving the inverse problem from materials microstructure to process specification, Comput. Mater. Sci., 2022, 210, 111417 CrossRef CAS.
  92. J. K. Pedersen, C. M. Clausen, O. A. Krysiak, B. Xiao, T. A. Batchelor, T. Löffler, V. A. Mints, L. Banko, M. Arenz and A. Savan, et al., Bayesian optimization of high-entropy alloy compositions for electrocatalytic oxygen reduction, Angew. Chem., 2021, 133, 24346 CrossRef.
  93. R. Yuan, Z. Liu, P. V. Balachandran, D. Xue, Y. Zhou, X. Ding, J. Sun, D. Xue and T. Lookman, Accelerated discovery of large electrostrains in batio3-based piezoelectrics using active learning, Adv. Mater., 2018, 30, 1702884 CrossRef PubMed.
  94. Y.-F. Lim, C. K. Ng, U. Vaitesswar and K. Hippalgaonkar, Extrapolative bayesian optimization with gaussian process and neural network ensemble surrogate models, Adv. Intell. Syst., 2021, 3, 2100101 CrossRef.
  95. Q. Liang, A. E. Gongora, Z. Ren, A. Tiihonen, Z. Liu, S. Sun, J. R. Deneault, D. Bash, F. Mekki-Berrada and S. A. Khan, et al., Benchmarking the performance of bayesian optimization across multiple experimental materials science domains, npj Comput. Mater., 2021, 7, 188 CrossRef.
  96. R. Moriconi, M. P. Deisenroth and K. Sesh Kumar, High-dimensional bayesian optimization using low-dimensional feature spaces, Mach. Learn., 2020, 109, 1925 CrossRef.
  97. A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput and I. Tanaka, Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and bayesian optimization, Phys. Rev. Lett., 2015, 115, 205901 CrossRef PubMed.
  98. A. Seko, T. Maekawa, K. Tsuda and I. Tanaka, Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single-and binary-component solids, Phys. Rev. B: Condens. Matter Mater. Phys., 2014, 89, 054303 CrossRef.
  99. Y. Zuo, M. Qin, C. Chen, W. Ye, X. Li, J. Luo and S. P. Ong, Accelerating materials discovery with bayesian optimization and graph deep learning, Mater. Today, 2021, 51, 126 CrossRef CAS.
  100. A. Ishii, S. Kikuchi, A. Yamanaka and A. Yamamoto, Application of bayesian optimization to the synthesis process of bafe2 (as, p) 2 polycrystalline bulk superconducting materials, J. Alloys Compd., 2023, 966, 171613 CrossRef CAS.
  101. M. Ghorbani, M. Boley, P. Nakashima and N. Birbilis, An active machine learning approach for optimal design of magnesium alloys using bayesian optimisation, Sci. Rep., 2024, 14, 8299 CrossRef CAS PubMed.
  102. M. Yu, S. Yang, C. Wu and N. Marom, Machine learning the hubbard u parameter in dft+ u using bayesian optimization, npj Comput. Mater., 2020, 6, 180 Search PubMed.
  103. T. Ueno, T. D. Rhone, Z. Hou, T. Mizoguchi and K. Tsuda, Combo: An efficient bayesian optimization library for materials science, Mater. Discover, 2016, 4, 18 Search PubMed.
  104. T. Yamashita, N. Sato, H. Kino, T. Miyake, K. Tsuda and T. Oguchi, Crystal structure prediction accelerated by bayesian optimization, Phys. Rev. Mater., 2018, 2, 013803 CrossRef.
  105. H. C. Herbol, M. Poloczek and P. Clancy, Cost-effective materials discovery: Bayesian optimization across multiple information sources, Mater. Horiz., 2020, 7, 2113 RSC.
  106. S. R. Chitturi, A. Ramdas, Y. Wu, B. Rohr, S. Ermon, J. Dionne, F. H. d. Jornada, M. Dunne, C. Tassone and W. Neiswanger, et al., Targeted materials discovery using bayesian algorithm execution, npj Comput. Mater., 2024, 10, 156 CrossRef.
  107. J. I. Myung, J. R. Deneault, J. Chang, I. Kang, B. Maruyama and M. A. Pitt, Multi-objective bayesian optimization: a case study in material extrusion, Digital Discovery, 2025, 4, 464 RSC.
  108. C. Benjamins, E. Raponi, A. Jankovic, K. van der Blom, M. L. Santoni, M. Lindauer and C. Doerr, Pi is back! switching acquisition functions in bayesian optimization, arXiv, 2022, preprint, arXiv:2211.01455,  DOI:10.48550/arXiv.2211.01455.
  109. S. B. Torrisi, M. Z. Bazant, A. E. Cohen, M. G. Cho, J. S. Hummelshøj, L. Hung, G. Kamat, A. Khajeh, A. Kolluru and X. Lei, et al., Materials cartography: A forward-looking perspective on materials representation and devising better maps, APL Mach. Learn., 2023, 1, 020901 CrossRef.
  110. M. Tenorio, M. H. Rahman, A. Mannodi-Kanakkithodi and J. Chapman, Out-of-distribution machine learning for materials discovery: Challenges and opportunities, Chem. Phys. Rev., 2026, 7, 011317 CrossRef CAS.
  111. A. S. Nair, L. Foppa and M. Scheffler, Interpretable bayesian optimization for catalyst discovery, Faraday Discuss., 2026 10.1039/D5FD00159E.
  112. L. Ward, A. Agrawal, A. Choudhary and C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials, npj Comput. Mater., 2016, 2, 16028 CrossRef.
  113. C. K. Borg, E. S. Muckley, C. Nyby, J. E. Saal, L. Ward, A. Mehta and B. Meredig, Quantifying the performance of machine learning models in materials discovery, Digital Discovery, 2023, 2, 327 RSC.
  114. Y. Tenne and C.-K. Goh, Computational intelligence in expensive optimization problems, Springer Science & Business Media, 2010, vol. 2 Search PubMed.
  115. J. Fromer, R. Wang, M. Manjrekar, A. Tripp, J. M. Hernández-Lobato and C. W. Coley, Batched bayesian optimization by maximizing the probability of including the optimum, J. Chem. Inf. Model., 2025, 65, 4808 CrossRef CAS PubMed.
  116. T. Kathuria, A. Deshpande and P. Kohli, Batched gaussian process bandit optimization via determinantal point processes, Adv. Neural Inf. Process. Syst., 2016, 29, 4206–4214 Search PubMed.
  117. L. M. Roch, F. Häse, C. Kreisbeck, T. Tamayo-Mendoza, L. P. Yunker, J. E. Hein and A. Aspuru-Guzik, Chemos: orchestrating autonomous experimentation, Sci. Robot., 2018, 3, eaat5559 CrossRef PubMed.
  118. F. Häse, M. Aldeghi, R. J. Hickman, L. M. Roch, M. Christensen, E. Liles, J. E. Hein and A. Aspuru-Guzik, Olympus: a benchmarking framework for noisy optimization and experiment planning, Mach. Learn.: Sci. Technol., 2021, 2, 035021 Search PubMed.
  119. E. H. Lee, V. Perrone, C. Archambeau and M. Seeger, Cost-aware bayesian optimization, arXiv, 2020, preprint, arXiv:2003.10870,  DOI:10.48550/arXiv.2003.10870.
  120. J. Snoek, H. Larochelle and R. P. Adams, Practical bayesian optimization of machine learning algorithms, Adv. Neural Inf. Process. Syst., 2012, 25, 2951–2959 Search PubMed.
  121. V. Sabanza-Gil, R. Barbano, D. Pacheco Gutiérrez, J. S. Luterbacher, J. M. Hernández-Lobato, P. Schwaller and L. Roch, Best practices for multi-fidelity bayesian optimization in materials and molecular research, Nat. Comput. Sci., 2025, 5, 572 CrossRef PubMed.
  122. F. Grasselli, S. Chong, V. Kapil, S. Bonfanti and K. Rossi, Uncertainty in the era of machine learning for atomistic modeling, Digital Discovery, 2025, 4, 2654 RSC.
  123. P. Pernot, Calibration in machine learning uncertainty quantification: beyond consistency to target adaptivity, APL Mach. Learn., 2023, 1, 046121 CrossRef.
  124. Y. Hwang, W. Jo, J. Hong and Y. Choi, Overcoming overconfidence for active learning, IEEE Access, 2024, 12, 118707–118716 Search PubMed.
  125. L. Kavalsky, V. I. Hegde, E. Muckley, M. S. Johnson, B. Meredig and V. Viswanathan, By how much can closed-loop frameworks accelerate computational materials discovery?, Digital Discovery, 2023, 2, 1112 RSC.
  126. L. Kavalsky, V. I. Hegde, B. Meredig and V. Viswanathan, A multiobjective closed-loop approach towards autonomous discovery of electrocatalysts for nitrogen reduction, Digital Discovery, 2024, 3, 999 Search PubMed.
  127. D. Varivoda, R. Dong, S. S. Omee and J. Hu, Materials property prediction with uncertainty quantification: A benchmark study, Appl. Phys. Rev., 2023, 10, 021409 CAS.
  128. K. Tran, W. Neiswanger, J. Yoon, Q. Zhang, E. Xing and Z. W. Ulissi, Methods for comparing uncertainty quantifications for material property predictions, Mach. Learn.: Sci. Technol., 2020, 1, 025006 Search PubMed.
  129. C. J. Gruich, V. Madhavan, Y. Wang and B. R. Goldsmith, Clarifying trust of materials property predictions using neural networks with distribution-specific uncertainty quantification, Mach. Learn.: Sci. Technol., 2023, 4, 025019 Search PubMed.
  130. G. Palmer, S. Du, A. Politowicz, J. P. Emory, X. Yang, A. Gautam, G. Gupta, Z. Li, R. Jacobs and D. Morgan, Calibration after bootstrap for accurate uncertainty quantification in regression models, npj Comput. Mater., 2022, 8, 115 CrossRef.
  131. Y. Chung, I. Char, H. Guo, J. Schneider and W. Neiswanger, Uncertainty toolbox: an open-source library for assessing, visualizing, and improving uncertainty quantification, arXiv, 2021, preprint, arXiv:2109.10254,  DOI:10.48550/arXiv.2109.10254.
  132. C. Lataniotis, S. Marelli and B. Sudret, Uqlab 2.0 and uqcloud: open-source vs. cloud-based uncertainty quantification, in SIAM Conference on Uncertainty Quantification (SIAM UQ 2022), ETH Zurich, Institute of Structural Engineering, 2022 Search PubMed.
  133. A. Thomas-Mitchell, G. Hawe and P. L. Popelier, Calibration of uncertainty in the active learning of machine learning force fields, Mach. Learn.: Sci. Technol., 2023, 4, 045034 Search PubMed.
  134. V. Korolev, I. Nevolin and P. Protsenko, A universal similarity based approach for predictive uncertainty quantification in materials science, Sci. Rep., 2022, 12, 14931 CrossRef CAS PubMed.
  135. J. Musielewicz, J. Lan, M. Uyttendaele and J. R. Kitchin, Improved uncertainty estimation of graph neural network potentials using engineered latent space distances, J. Phys. Chem. C, 2024, 128, 20799 CrossRef CAS.
  136. S. Zhong, D. R. Lambeth, T. K. Igou and Y. Chen, Enlarging applicability domain of quantitative structure–activity relationship models through uncertainty-based active learning, ACS ES&T Eng., 2022, 2, 1211 Search PubMed.
  137. C. Sutton, M. Boley, L. M. Ghiringhelli, M. Rupp, J. Vreeken and M. Scheffler, Identifying domains of applicability of machine learning models for materials science, Nat. Commun., 2020, 11, 4428 CrossRef CAS PubMed.
  138. A. Palizhati, S. B. Torrisi, M. Aykol, S. K. Suram, J. S. Hummelshøj and J. H. Montoya, Agents for sequential learning using multiple-fidelity data, Sci. Rep., 2022, 12, 4694 CrossRef CAS PubMed.
  139. J. N. Fuhg, A. Fau and U. Nackenhorst, State-of-the-art and comparative review of adaptive sampling methods for kriging, Arch. Comput. Methods Eng., 2021, 28, 2689 CrossRef.
  140. N. Stolte, J. Daru, H. Forbert, D. Marx and J. Behler, Random sampling versus active learning algorithms for machine learning potentials of quantum liquid water, J. Chem. Theory Comput., 2025, 21, 886 CrossRef CAS.
  141. Y. Kim, E. Kim, E. Antono, B. Meredig and J. Ling, Machine-learned metrics for predicting the likelihood of success in materials discovery, npj Comput. Mater., 2020, 6, 131 CrossRef.
  142. S. Takeno, H. Fukuoka, Y. Tsukada, T. Koyama, M. Shiga, I. Takeuchi and M. Karasuyama, Multi-fidelity bayesian optimization with max-value entropy search and its parallelization, in International Conference on Machine Learning, PMLR, 2020, pp. 9334–9345 Search PubMed.
  143. C. Fare, P. Fenner, M. Benatan, A. Varsi and E. O. Pyzer-Knapp, A multi-fidelity machine learning approach to high throughput materials screening, npj Comput. Mater., 2022, 8, 257 CrossRef.
  144. R. Jacobs, P. E. Goins and D. Morgan, Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap, Mach. Learn.: Sci. Technol., 2023, 4, 045060 Search PubMed.
  145. V. Trinquet, M. L. Evans, C. J. Hargreaves, P.-P. De Breuck and G.-M. Rignanese, Optical materials discovery and design with federated databases and machine learning, Faraday Discuss., 2025, 256, 459 RSC.
  146. P. Xu, Y. Ma, W. Lu, M. Li, W. Zhao and Z. Dai, Multi-objective optimization in machine learning assisted materials design and discovery, J. Mater. Inf., 2025, 5, 26 Search PubMed.
  147. K. Park, C. Song, J. Park and S. Ryu, Multi-objective bayesian optimization for the design of nacre-inspired composites: optimizing and understanding biomimetics through ai, Mater. Horiz., 2023, 10, 4329 RSC.
  148. B. P. MacLeod, F. G. Parlane, C. C. Rupnow, K. E. Dettelbach, M. S. Elliott, T. D. Morrissey, T. H. Haley, O. Proskurin, M. B. Rooney and N. Taherimakhsousi, et al., A self-driving laboratory advances the pareto front for material properties, Nat. Commun., 2022, 13, 995 CrossRef CAS PubMed.
  149. K. M. Jablonka, G. M. Jothiappan, S. Wang, B. Smit and B. Yoo, Bias free multiobjective active learning for materials design and discovery, Nat. Commun., 2021, 12, 2312 CrossRef CAS PubMed.
  150. H. A. Doan, G. Agarwal, H. Qian, M. J. Counihan, J. Rodríguez-López, J. S. Moore and R. S. Assary, Quantum chemistry-informed active learning to accelerate the design and discovery of sustainable energy storage materials, Chem. Mater., 2020, 32, 6338 CrossRef CAS.
  151. J. Hu, R. Dong, Y. Feng, M. Hu, and J. Hu, Foundation-model surrogates enable data-efficient active learning for materials discovery, arXiv, 2026, preprint, arXiv:2603.12567,  DOI:10.48550/arXiv.2603.12567.
  152. Y. Xian, X. Ding, X. Jiang, Y. Zhou, J. Sun, D. Xue and T. Lookman, Unlocking the black box beyond bayesian global optimization for materials design using reinforcement learning, npj Comput. Mater., 2025, 11, 1 Search PubMed.
  153. H. Metni, L. Ruple, L. N. Walters, L. Torresi, J. Teufel, H. Schopmans, J. Östreicher, Y. Zhang, M. Neubert and Y. Koide, et al., Generative models for crystalline materials, Adv. Mater., 2026, e23620 CrossRef CAS PubMed.
  154. H. Park, Z. Li and A. Walsh, Has generative artificial intelligence solved inverse materials design?, Matter, 2024, 7, 2355 CrossRef CAS.
  155. R. Xin, E. M. Siriwardane, Y. Song, Y. Zhao, S.-Y. Louis, A. Nasiri and J. Hu, Active-learning-based generative design for the discovery of wide-band-gap materials, J. Phys. Chem. C, 2021, 125, 16118 CrossRef CAS.
  156. S. Lahlou, M. Jain, H. Nekoei, V. I. Butoi, P. Bertin, J. Rector-Brooks, M. Korablyov and Y. Bengio, Deup: Direct epistemic uncertainty prediction, arXiv, 2021, preprint, arXiv:2102.08501,  DOI:10.48550/arXiv.2102.08501.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.