Open Access Article
Cayden Maguire
a,
Christofer Hardcastle
a,
Trevor Hastings
a,
Raymundo Arróyave
abc and
Brent Vela
*a
aDepartment of Materials Science and Engineering, Texas A&M University, College Station, TX 77843, USA. E-mail: brentvela@tamu.edu
bJ. Mike Walker '66 Department of Mechanical Engineering, Texas A&M University, College Station, TX 77843, USA
cWm Michael Barnes '64 Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843, USA
First published on 28th May 2026
In alloy design, the search for candidate materials is often framed as an optimization problem, with the goal of identifying Pareto-optimal solutions across multiple objectives. However, Pareto-optimal solutions do not necessarily satisfy all minimum performance thresholds required for practical deployment. An alternative approach is to treat alloy design as a constraint satisfaction problem, in which the goal is to identify any solution that meets all bare minimum requirements across multiple quantities of interest. These approaches have yet to be benchmarked against each other in the context of realistic alloy design problems. In this work, we demonstrate that, in realistic alloy design campaigns involving multiple objectives and constraints, the constraint satisfaction framework yields a higher likelihood of finding viable alloys than optimization-based approaches. Furthermore, constraint-satisfaction approaches find the first viable alloy solutions earlier than optimization. Our results suggest that focusing on feasibility rather than optimality can lead to more actionable outcomes in materials discovery, particularly in highly constrained applications.
In the materials design literature, multi-objective optimization has been the dominant paradigm, enabled by advances in Bayesian optimization, evolutionary algorithms, and other surrogate-modeling techniques.1,2,4–6 However, the emphasis on optimal trade-offs often overlooks the reality that many candidate materials, while Pareto-efficient, are unsuitable for deployment due to failing one or more hard constraints (e.g., corrosion resistance, manufacturability, phase stability, cost, etc.).3 The constraints may be binary7 or thresholds8 on continuous properties.
The alternative design strategy is constraint satisfaction frameworks7,9–13 which explicitly seek any solution that satisfies all predefined requirements, regardless of whether it is optimal in the multi-objective sense. While this perspective aligns more directly with the decision-making processes of industrial alloy qualification, its systematic evaluation within the context of high-dimensional, uncertainty-aware Bayesian materials design has been largely absent, with few exceptions.8,14,15
For example, in our prior work,1 we deployed a feasibility-first active-learning pipeline: we devoted the early budget to learning the constraint surfaces until the Shannon entropy of the feasibility map plateaued, indicating a stable decision boundary. We then pruned candidates with low probability of satisfying all constraints and performed multi-objective optimization within the resulting feasible subspace to trace the relevant trade-offs. This two-stage strategy avoids spending effort in infeasible regions and yields Pareto sets that are directly actionable for qualification. Despite the efficiency gain with respect to a brute-force querying of constraints, there were inefficiencies in this method. There is no need to accurately map the decision boundary in regions of the design space where objective values are poor i.e. there is no need to finely resolve the boundary in regions with poor objective values, but to rapidly identify alloys that satisfy all constraints.
This issue was recently addressed with the ANUBIS method,15 which uses a modified optimization acquisition function to account for the probability of feasibility (PoF), e.g. EI × PoF where EI is expected improvement. ANUBIS adopts a feasibility-aware BO paradigm, learning the unknown constraints on-the-fly with a variational Gaussian process classifier while simultaneously learning the objectives. This joint treatment focuses effort where high performance and plausible feasibility coincide, learning just enough of the boundary to discriminate promising candidates rather than mapping it everywhere.
However, this paradigm still seeks optimal (albeit feasible) points. Furthermore, all examples provided were single constraints; in practice, alloy design imposes multiple, often coupled constraints. In this paper, we raise the question: “in realistic alloy design scenarios, are we truly interested in finding the feasible Pareto set?” We argue that, in closed-loop alloy design, the practical goal is neither to train the most accurate regressor nor to chase a notional “utopia” composition, but to minimize the time it takes to find the first feasible alloy to identify a material that satisfies all non-negotiable requirements rapidly. We substantiate this perspective with a benchmark comparing our proposed Bayesian multi-constraint satisfaction against Bayesian multi-objective optimization and a modified Bayesian multi-objective optimization scheme that accounts for feasibility (EI × PoF), with a random search as a baseline. Additionally, we show that the time to find the first feasible alloy can be reduced by equipping Gaussian Process Classifiers (GPCs) with informative prior mean functions.
Constraint satisfaction naturally aligns with stakeholder priorities: certify that a candidate quickly meets pre-registered, application-specific minimum thresholds. Faster feasibility discovery reduces program risk and preserves the runway for scale-up, qualification, and integration.16,17 Moreover, in black-box, high-dimensional, discrete spaces, global Pareto optimality is not practically provable; at best, one can certify local non-dominance within the sampled set,18 or one can run statistical tests for “Bayesian advantage” by comparing principled optimization outcomes against a baseline random experiment selection policy.19 By contrast, feasibility is auditable: a candidate either satisfies the thresholds (under uncertainty-aware acceptance tests) or does not. This reframes success around certifiable outcomes, aligns with decision-maker needs, and provides a natural, decision-relevant stopping rule for alloy discovery.
. For any design
, the GP predictive distribution is Gaussian with mean µ(x) and standard deviation σ(x), quantifying both our best estimate and its uncertainty. These uncertainties enable principled exploration, risk-aware selection, and efficient allocation of experimental and computational budgets to optimize objectives and learn constraints. In other words, the output of a GP is a normal distrubtion at any point
, and based on all the normal distributions within
principled exploration and design of experiments can be done.
The posterior predictive distribution (i.e. the output of a trained GP) is defined by the equations below. An untrained GP is specified by a prior mean function m(x) and a positive-definite covariance (kernel) k(x, x′). Let
denote the training designs with noisy observations
(i.i.d. Gaussian noise with variance σn2). For any test design
, the posterior is Gaussian with mean and variance
| µ(x*) = m(x*) + k*⊤[K + σn2I]−1(y − m(X)), | (1) |
| σ2(x*) = k(x*, x*) − k*⊤[K + σn 2I]−1k*, | (2) |
![]() | (3) |
is a per-feature length-scale.
We use ARD to assign an independent length-scale to each elemental fraction (feature). Compositions are expressed in at %; accordingly, we initialize
in composition units and constrain them to broad bounds (e.g., 30–100 at %), letting marginal-likelihood optimization tune each
to the data and grid resolution.
To capture observation noise, we add an independent white-noise term:
| ktotal(x, x′) = kRBF(x, x′) + σn2δxx′, | (4) |
, the probability that the constraint is satisfied is| p(y = 1|x) = σ(f(x)), | (5) |
![]() | (6) |
The link function σ(·) maps the GP output to probabilities i.e. the link function bounds the latent function between 0 and 1 so it can be interpreted as a probability. Here we use the logistic sigmoid,
![]() | (7) |
A one-dimensional example illustrating the latent GP and its categorical GP counterpart is shown in Fig. 1.
![]() | ||
| Fig. 1 Categorical Gaussian Processes (GPCs) are used to model binary feasibility constraints during both constraint satisfaction and constrained optimization. Latent GP f(x) is fit to binary labels. Feasibility probability p(y = 1|x) = σ(f(x)) with the same uncertainty propagated via σ(µf ± σf). This approach is appropriate for modeling single-phase stability, as demonstrated previously,7 and is amenable to incorporating physics-based prior knowledge via informative prior mean functions to accelerate learning; see (ref. 7) for details. | ||
In this work, categorical GPs are used to model binary feasibility constraints during both constraint satisfaction and constrained optimization. This approach is appropriate for modeling single phase stability, as demonstrated in previous work.7 Furthermore, this implementation is amenable to the incorporation of physics-based prior knowledge, further accelerating the learning process. More details on Gaussian process classification with informative priors can be found in ref. 7.
we model the latent response
with a GP and compute the probability that a thresholded constraint is satisfied. Let the constraint be y(x) ≥ t (the case y(x) ≤ t follows analogously).
We place a GP prior on the latent function,
![]() | (8) |
![]() | (9) |
![]() | (10) |
Fig. 2 illustrates how a Gaussian-process regressor (GPR) can be used for continuous classification via thresholding. The left panel shows observations of y, the posterior mean prediction µ(x), and a credibility band (e.g., ± 2σ(x)). The horizontal dashed line marks the feasibility threshold y = 5. The marked point at x = 11 highlights the model's prediction and its uncertainty (vertical error bar). Under a Gaussian predictive marginal
, the probability of satisfying the constraint is
Which converts the continuous surrogate into a probabilistic classifier relative to the constraint.
be subject to C constraints. For each constraint c ∈ {1, …, C}, letAssuming conditional independence of the constraints given x and the learned surrogates, the overall probability of feasibility (PoF) is the product of the individual probabilities:
![]() | (11) |
Fig. 3 shows an example of PoF calculations across the design domain. The product in ref. 11 is simple and intuitive; however, it assumes conditional independence of constraints given x and the fitted surrogates. When constraints are strongly correlated (e.g., share underlying physics or data)11, can over- or underestimate the joint feasibility and become overconfident. In future work, we will replace the pure product with dependence-aware aggregators, however, the current implementation is shown to be effective for alloy design.
![]() | ||
| Fig. 3 PoF calculations across the design space. The Nb–Mo–Ta–V–W–Cr system is represented using a barycentric projection. The markers show the data used to train the GP. After training, the probability of meeting each invdividual constraint is computed across the design space (upper five projections). Assuming conditional independence, the PoF is the product of the individual probabilities (bottom projection). Note, overlap occurs on barycentric projections when datapoints are mapped closely together. This limitation is discussed in our previous work.20 | ||
Formally, let yi≔f(xi) denote the objective vector for design xi, and write
for the current Pareto set. Let
be a chosen reference point that is dominated by all feasible designs. The hypervolume indicator is defined as the d-dimensional Lebesgue measure (i.e., length in 1D, area in 2D, volume in 3D, and hypervolume in dD) of the region dominated by
and bounded by r:21
![]() | (12) |
denotes the axis-aligned hyperrectangle between y and r. In other words, take each Pareto point y, draw the axis-aligned box to the reference point r, take the union of all such boxes, and measure its size—area in 2D, volume in 3D, and hypervolume in dD.
A bi-objective example of this is shown in Fig. 4. In Fig. 4, the two axes correspond to the objectives f1 and f2, both to be minimized. The black points y1, …, y4 represent nondominated solutions that together form the Pareto front. The dashed lines mark the reference point r, which defines the bounding box for hypervolume calculation. The shaded staircase-shaped region highlights the portion of objective space dominated by the Pareto set and bounded by r. The total area of this region corresponds to the hypervolume indicator
, and it increases as additional nondominated points are discovered closer to the origin. This visual emphasizes how hypervolume quantifies simultaneous improvements in both objectives by capturing the volume of objective space excluded from domination by the reference point.
![]() | ||
Fig. 4 Bi-objective hypervolume (minimization). The shaded staircase is the region dominated by the Pareto set and bounded by the reference point r. | ||
We will track HV as a function of iteration, wanting to maximize it as fast as possible. To ensure a fair comparison between objectives (so that those with large magnitudes do not arbitrarily dominate HV), we compute HV in a scaled objective space using fixed objective ranges. This is shown in eqn (13),
![]() | (13) |
is the current measured Pareto set, yi = [STi,−Densityi, YSi, Pughi] is the vector of measured objective values for input i, r = [0, −30, 0, 0] is the vector of reference points for each objective, Δ is the vector of objective ranges (max − min), and HV(·,·) is the dominated hypervolume.
![]() | (14) |
is the indicator (Iverson) function. This is the deterministic analogue of the probability of feasibility
introduced in Section 2.5: here, the product aggregates truth values rather than probabilities of passing. If any of the constraint c is not met, the term
and thus the entire product F(x) = 0 for alloy x.
With this definition, the time required to identify the first feasible alloy (TTFF) can be expressed either in terms of iterations in the sequential case, or in the batch case, in terms of total alloys investigated. We take the perspective that the iteration-based metric is more significant, as time is the true limiting factor in practical discovery workflows. When operating in batches, the process already leverages economies of scale, reducing the marginal cost per experiment and emphasizing the importance of minimizing the number of sequential iterations rather than the total number of evaluations. However, for simplicity in this work, we benchmark sequential optimization against sequential constraint satisfaction. For the sequential case, we define TTFF as the iteration where the average cumulative number of feasible queries is greater than or equal to 1.
In our constraint-satisfaction setting, we target the feasible set of alloys with an acquisition function,
| a(x) = PoF(X), | (15) |
Since only a small percentage of alloys (0.05% of the design space) with similar properties are feasible, an exploration term is omitted from the acquisition function.
and reference point r isThe EHVI is the expectation of this improvement under the multivariate normal predictive distribution:
Several forms and approximations exist to calculate EHVI; however for speed (due to the computation expense), we opt to use pEHVI, achieving near-exact accuracy with markedly reduced runtime.22
The pointwise EHVI (pEHVI) computes, for each anchor point
, the EHVI as if y(i) were the only Pareto point, then takes the minimum across anchors:
![]() | (16) |
![]() | (17) |
![]() | (18) |
![]() | (19) |
The solidus temperature and density were calculated in Thermo-Calc using the TCHEA7 thermodynamic database using the property module. The prior models for these properties are rule of mixtures melting temperature and rule of mixtures density, respectively.
Phase stability was evaluated at 600 °C with the Equilibrium module, selected as the target temperature for in-house hot rolling. Alloys predicted to be single-phase BCC at 600 °C were deemed manufacturable (a simplifying assumption used here to illustrate a simple manufacturability constraint i.e. if this constraint is not met, other quantities of interest cannot be queried). We define “single-phase BCC” as a BCC mole fraction ≥99%. The prior for BCC phase stability in this work is
Which is subsequently bound between 0 and 1 with the logistic sigmoid. VEC < 6.87 is an experimental indicator for single phase BCC.23
We used the Maresca–Curtin model for yield strength, which posits edge-dislocation glide as the rate-controlling mechanism in BCC HEAs at elevated temperatures.24 This model provides a conservative (lower-bound) estimate of yield strength for BCC RHEAs25 and is therefore an appropriate ground-truth model for our in silico alloy-design exercise. The yield strength was queried at 700 °C. The associated prior model was the Maresca–Curtin yield strength queried at 25 °C.
As a first-order proxy for ductility, we used the Pugh ratio (B/G), computed from rule-of-mixtures estimates of the bulk and shear moduli. Prior work supports B/G as a reasonable first-approximation indicator of ductility, making it suitable for our in silico design campaigns. We did not have a prior model for ductility in this work and instead relied on a zero-mean prior for the GP.
553 unique candidate alloys.
Within these 40
553 alloys, 22 (0.054%) meet the constraints listed in Table 1, and 12
591 (31.05%) are Pareto-optimal regarding the objectives in Table 2. There are 18 (0.044%) points that are both feasible and Pareto-optimal. These 22 feasible points constitute less than 1% of the initial design space. This is consistent with prior works12,26 that application-relevant constraints drastically narrow the apparent breadth of the high-entropy-alloy design space.
| Property | Criterion |
|---|---|
| Solidus temperature | Tsolidus > 2473 K (2200 °C) |
| Density | ρ < 9.0 g cm−3 |
| Yield strength | σy > 700 MPa |
| Pugh ratio | B/G > 2.5 |
| Phase stability | xBCC ≥ 99% at 600 °C |
| Quantity | Objective |
|---|---|
| Solidus temperature | Maximize Tsolidus [K] |
| Density | Minimize ρ [g cm−3] |
| Yield strength | Maximize σy [MPa] |
| Pugh ratio | Maximize B/G |
| Constraint: xBCC ≥ 99% at 600 °C | |
To initiate the active learning scheme, first an alloy is sampled from the design space and evaluated for BCC phase stability. In our in silico setup, we treat “processable” as single-phase BCC. If the alloy is single-phase BCC, we then measure the remaining properties in Table 1 and update all classifiers with the new results. If it is not single-phase BCC, we stop there and update only the BCC classifier; no other data are collected for that alloy. This mirrors real world scenarios where before investing in property tests, the alloy must be shown to be processable.
Using the trained GPCs, we estimate the probability of passing each constraint for every candidate alloy in the design space. We then multiply these probabilities to obtain the overall probability of feasibility (the chance that an alloy satisfies all constraints at once). This product assumes the constraints are conditionally independent under the model and that the classifier outputs are well-calibrated; if constraints are correlated, the result can be biased, but it remains a simple and effective screening heuristic. See the supplemental section for an investigation of a PoF model that accounts for objective dependence.
We select the next alloy using a constraint-satisfaction acquisition function (eqn (15)) that favors candidates with the highest probability of feasibility. Once this point is selected for querying, the process repeats. This closed-loop is summarized in Fig. 5.
To start the optimization loop, we again sample an alloy and apply the same BCC manufacturability gate. If it fails, we update only the phase-stability (BCC) classifier. If it passes, we observe the objective values.
With the current GPRs for objectives and the BCC GPC for the constraint, we evaluate predictions across the design space. For each alloy, we compute pointwise EHVI (pEHVI; see Section 2.9) and the probability of being single-phase BCC, pBCC. Our acquisition is the product AQF(x) = pBCC(x) × pEHVI(x), which prioritizes candidates that both expand the Pareto front and are likely manufacturable. We pick the alloy that maximizes this quantity and iterate in a closed loop. This process is summarized in Fig. 6.
![]() | ||
| Fig. 6 Closed-loop multi-objective optimization workflow using pEHVI under a single-phase BCC constraint. | ||
Fig. 7 shows dominated hypervolume versus iteration, and Fig. 8 shows the cumulative number of feasible alloys found. As expected in 7, the optimization strategy increases hypervolume more rapidly, whereas in Fig. 8 constraint satisfaction discovers feasible points sooner and in greater numbers. By the end of the experiment, on average, the constraint satisfaction loop identifies all but one of the feasible alloys and its average time-to-first-feasible (TTFF) is 9 iterations. Recall that only 22 out of 41
496 candidates are feasible (0.05% of the design space), making this a needle-in-a-haystack problem solved within a short campaign (half of the feasible alloys found by iteration 23; 21 out of 22 feasible alloys found by iteration 48). In contrast, the optimization strategy struggles to find feasible candidates, performing as poorly as a random search, while the combined model performs between the two strategies (with a TTFF of 11 iterations and an average of 19 feasible alloys found by the end of the campaign).
Constraint satisfaction reframes alloy design around certifiable feasibility rather than unverifiable optimality. In practical alloy development, the ground-truth material response surfaces are never fully known; consequently, global Pareto optimality cannot be proven—it can only be approximated within the sampled design space. At best, one can certify local non-dominance among observed candidates, a fact that limits the decision value of optimization-based strategies in finite-budget campaigns. By contrast, feasibility is auditable: a candidate either satisfies the registered performance thresholds (within uncertainty bounds) or it does not. The difference between feasibility (which we can test and certify) and optimality (which we can only approximate) marks the fundamental limit of what can be known in materials discovery—its epistemic boundary—when our information is incomplete or uncertain.
The observed impact of informative prior mean functions further supports the value of embedding physical intuition into probabilistic classifiers. By incorporating thermodynamic or mechanical priors, we showed that Gaussian Process Classifiers (GPCs) can learn feasibility boundaries more efficiently and focus querying on physically meaningful regions of the design space. This approach naturally extends to multi-fidelity and cost-aware frameworks, in which prior knowledge derived from simulations or relevant prior experiments accelerates feasibility discovery.
Finally, given the unique strengths of constraint satisfaction and optimization, and the fact that the combined model does not fully take advantage of either strategy, suggests that these paradigms should be used in a sequential, complementary framework. A hybrid strategy, beginning with feasibility discovery and transitioning to Pareto refinement, mirrors the logic of stage-gate design processes in engineering practice. In this structure, constraint satisfaction would provide the go/no-go certainties required for decision-making, while optimization explores trade-offs within the feasible subspace. Consistent with entropy-based active learning of constraints, continued constraint focused querying becomes progressively less effective once the feasibility boundary is sufficiently learned. The expected entropy reduction plateaus (“an elbow”), indicating diminishing returns. In this regime, reallocating evaluations to objective optimization is more efficient; accordingly, entropy-based frameworks explicitly trigger the transition to Bayesian optimization once the average entropy reduction over constraint models flattens and drops below a threshold (e.g., 3%).28 The combination of these complementary approaches constitutes a unified, fully-Bayesian workflow that effectively balances exploration and exploitation over the long range within a materials discovery campaign.
• Time-to-first-feasible (TTFF). Constraint satisfaction consistently achieved a lower TTFF than optimization, identifying feasible alloys earlier and in greater numbers. By the end of the campaign, the prior equipped model had, on average, recovered 21 out of 22 feasible candidates (0.05% of the 41, 496 total), effectively resolving a needle-in-a-haystack problem within a limited number of iterations.
• Hypervolume growth. As expected, optimization expanded dominated hypervolume faster, reflecting its objective.
• Hybrid strategy performance. The combined model sits between optimization and constraint satisfaction in terms of performance, not fully taking advantage of either strategy.
• Informative priors. Equipping the GPCs with informative prior means further reduced TTFF and shortened the time required to find all feasible points relative to the baseline constraint-satisfaction loop.
When the priority is to obtain any viable alloy quickly (e.g., early-stage, resource-constrained campaigns), constraint satisfaction is preferable. When the goal shifts to improving trade-offs among properties, multi-objective optimization is the right tool. In practice, a hybrid schedule—using constraint satisfaction to secure the first feasible alloy, then handing off to optimization—balances speed with frontier improvement. When prior knowledge exists, informative priors are a low-cost way to accelerate feasibility discovery.
Limitations include the fact that our feasibility model assumes approximate independence among constraints and well-calibrated classifier outputs; deviations from either can lead to systematic over- or under-estimation of the probability of feasibility (PoF). The BCC constraint serves as a practical proxy for manufacturability in silico, but it may not fully capture the spectrum of processing risks encountered experimentally. Another limitation is that the scope of the work is limited to a sequential evaluation strategy; batch selection methods were not considered. These caveats motivate future work on joint (correlated) constraint models, improved probabilistic calibration, cost-aware and multi-fidelity acquisition strategies, and adaptive hybrid frameworks that transition from feasibility-first discovery to Pareto frontier expansion as evidence accumulates. More broadly, these findings delineate the epistemic limits of optimization in materials discovery: while feasibility can be demonstrated, optimality can only be inferred.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5dd00517e.
| This journal is © The Royal Society of Chemistry 2026 |