Open Access Article
Chase Katz
*,
Ting-Yu Yang
,
Parker King,
Md Shafiqul Islam,
Brent Vela
and
Raymundo Arróyave
Department of Materials Science and Engineering, Texas A&M University, College Station, TX 77843, USA. E-mail: ichasekatz@icloud.com
First published on 5th March 2026
Many materials design attributes that are central to adoption, such as aesthetics, perceived quality, or user-specific preferences, are difficult to quantify directly, making preference feedback a practical proxy for optimization. Here we present a preference-driven Bayesian optimization framework and demonstrate it using ring color as a concrete case study. This case study simulates a scenario in which a jeweler synthesizes a ring with a given chemistry and presents it to a stakeholder, who expresses a preference relative to an incumbent ring color. Our approach, preferential Bayesian optimization (PBO), learns a latent utility function over alloy composition space, with perceived color obtained via a Thermo-Calc optical forward model. Using preference feedback, the framework iteratively proposes new alloy chemistries predicted to better align with user aesthetic preferences. After a limited number of proposed alloys and their associated colors, the user selects the most desirable option, guiding the search toward optimal aesthetic outcomes. We then evaluate the cost of the proposed alloys and identify a cost-aesthetic Pareto front, enabling informed trade-offs between affordability and visual appeal. The proposed framework is readily applicable to materials design problems in which subjective or hard-to-measure attributes play a dominant role in decision-making.
This scale presents a significant financial opportunity for jewelers. In an ideal scenario with unlimited resources, a custom jeweler could fabricate every possible alloy within a given chemical design space and present the full palette of ring colors to the user for selection. However, exhaustive fabrication to explore alloy compositions for a preferred color is costly and time-consuming. The expense is particularly acute because precious metals such as gold and platinum can cost over $130 and $50 per gram,6,7 respectively, meaning that even a handful of prototype rings represents a substantial investment.
Traditional alloy design for rings and other jewelry has historically relied on trial-and-error synthesis and subjective evaluation of properties such as color. Moreover, aesthetic preferences are inherently personal and difficult to quantify, making systematic optimization of ring color for individual users challenging.8 Consequently, there is a need for methods that can efficiently identify optimal aesthetic outcomes while minimizing the number of prototypes required.
More broadly, many materials design problems are ill-posed as the optimization of a fully specified scalar or even a fixed multi-objective function. Key attributes that affect ring preference (aesthetics, tactile feel, surface finish, perceived quality, or designer intent) are not directly measurable, vary across stakeholders, and evolve as candidates are explored. In these settings, the objective is better treated as a latent preference model rather than a known function. Stakeholders can often provide reliable pairwise judgments (A vs. B) even when absolute scoring is inconsistent, making preference data a natural feedback channel for learning and optimization. In this context, “jeweler-in-the-loop” refers to a human-in-the-loop design workflow in which a jeweler or designer provides iterative pairwise preference judgments over candidate ring colors, guiding the optimization process without explicitly specifying a quantitative objective function.
Preference-based optimization formalizes this process by combining a latent-utility model with an acquisition strategy that selects the next comparison to maximize expected utility improvement or information gain.9 This yields an iterative loop that can efficiently explore high-dimensional composition spaces while remaining faithful to subjective or context-dependent criteria. The approach is compatible with conventional objectives and constraints, enabling joint optimization (e.g., preference-informed utility vs. cost, processability, or compliance) within a multi-objective PBO framework. In this work, alloy color is a concrete example of a hard-to-quantify attribute, but the method is intended as a general recipe for materials design problems where preference signals are the most reliable data available.
In this work, we present a preference-driven Bayesian optimization framework to identify ring colors that best match an individual user's aesthetic preferences.8,9 Preferential Bayesian optimization learns a latent utility function over alloy composition space, with color induced by the Thermo-Calc optical forward model, and proposes new alloy chemistries predicted to align with the user. We implement this as an iterative loop in which each query renders two candidate alloy colors and elicits a pairwise preference, using that feedback to update the utility model and drive subsequent proposals.
To demonstrate the proposed preference-based Bayesian optimization method, we perform an in silico study using Thermo-Calc's optical properties module within the Noble Metal Alloys Database (TCNOBL4) as the ground truth.10,11 Each query is analogous to fabricating a physical prototype. Each query retrieves a predicted color from Thermo-Calc's optical Model. The user is then presented with two candidate designs and selects the preferred option. After a fixed number of iterations, the resulting set of candidate alloys is analyzed in a multi-objective setting to identify the aesthetic-cost trade-off frontier, which serves as a decision aid for selecting a composition that balances visual appeal with affordability.
Preference-based Bayesian optimization has been widely studied, including multi-objective and scalable variants.8,9,12–17 Prior work has addressed preferences over explicit objectives, such as preference-order constraints within multi-objective BO frameworks that bias sampling toward preferred regions of the Pareto front, as well as extensions of preferential BO to multi-objective settings by scalarized Thompson sampling. Other studies have focused on algorithmic improvements, including preference exploration strategies for efficient multi-outcome optimization, decision-theoretic acquisition functions such as qEUBO, and improved inference schemes for preferential Gaussian processes that account for posterior skewness. Applications of preferential BO to subjective design problems, such as additive manufacturing parameter tuning, have also been demonstrated.
These prior works primarily focus on developing new acquisition strategies, inference schemes, or scalable variants of preferential Bayesian optimization. In contrast, this work does not introduce a new preferential BO algorithm. Instead, we adopt the established pairwise Gaussian-process preference model and deploy it within a materials-design setting, demonstrating how preference-based BO can be embedded within alloy composition design using a CALPHAD-based optical model.
Prior work has addressed preferences over explicit objectives, such as preference-order constraints within multi-objective BO frameworks that bias sampling toward preferred regions of the Pareto front,13 as well as extensions of preferential BO to multi-objective settings by scalarized Thompson sampling.12 Other studies have focused on algorithmic improvements, including preference exploration strategies for efficient multi-outcome optimization,15 decision-theoretic acquisition functions such as qEUBO,17 and improved inference schemes for preferential Gaussian processes that account for posterior skewness.16 Applications to subjective design problems, such as additive manufacturing parameter tuning, have also been demonstrated.14
In contrast, this work does not introduce a new preferential BO algorithm. Instead, we adopt the established pairwise Gaussian-process preference model and deploy it within a materials-design setting, demonstrating how preference-based BO can be embedded within alloy composition design using a CALPHAD-based optical model.
The novelty of this work lies in the application and integration of established preference-based optimization within a materials-design workflow. To our knowledge, this is one of the first applied design studies to deploy Thermo-Calc's optical properties module within TCNOBL4 for alloy color prediction in an iterative design setting. This deployment enables a preference-driven, human-in-the-loop Bayesian optimization workflow for ring design that integrates a physically grounded optical forward model, human preference feedback, and cost trade-off analysis within alloy composition space. Ring color serves as a concrete demonstration of how preference-based optimization can be used in materials problems where key attributes are subjective or difficult to quantify directly. Together, these contributions establish a practical use case for the Thermo-Calc optical model and a generalizable template for preference-based materials design.
We consider a preference-learning setting in which each candidate alloy is represented by a composition vector
, where each component of x corresponds to the fractional concentration of a constituent element e in the alloy design space. There are d + 1 elements in the alloy system, with x constrained to the composition simplex (non-negative fractions summing to a fixed total). The goal is to infer a latent utility function f(x) from noisy pairwise ring comparisons.9 The training set consists of a collection of items X = {x1, …, xn} together with m observed preferences, where the k-th preference states that xik is preferred to xjk.
Pairwise comparisons are encoded through a design matrix
, where each row corresponds to a single incumbent–challenger preference (ik, jk) and contains a +1 in the column of the preferred item and a −1 in the column of the non-preferred item, with zeros elsewhere, as shown in Table 1. Let
denote the latent utilities at the n candidate alloys, with fi = f(xi). The design matrix maps utilities to comparison differences via
We use a Gaussian process (GP) prior to model the latent utility:
The prior mean m(x) is taken to be constant, providing an uninformative, non-biasing baseline. The covariance function is chosen as a scaled radial basis function kernel,
are lengthscale parameters (optionally distinct across dimensions) and s2 > 0 is an output scale. Kernel hyperparameters were restricted to positive values through the default parameterization and constraints in GPyTorch/BoTorch. The multi-objective model used the BoTorch PairwiseGP, which applies default weakly informative priors to the kernel lengthscale and output scale (the prior distributions and parameters are specified in the SI). The single-objective study used a GPyTorch RBF kernel with a ScaleKernel and no explicit hyperparameter priors. Alloy compositions were represented as normalized fractions in [0, 1], and no manual hyperparameter tuning was performed. Implementation details and hyperparameter settings are described in the SI and documented in the repository.
Observed comparisons are modeled through a probit likelihood.9 Conditional on latent utilities at the compared items, the probability that xik is preferred to xjk is
Because the probit likelihood yields a non-Gaussian posterior, inference proceeds via a Laplace approximation centered at the maximum a posteriori (MAP) utilities fMAP.9 Define the negative log posterior,
| ∇2S(f) = K−1 + W, |
To learn the kernel hyperparameters
, we maximize the Laplace-approximated marginal log likelihood (evidence).9 Because the probit likelihood renders the exact evidence intractable, we expand the log posterior around the MAP estimate fMAP. Let W = −∇2
log
p(C|f)|fMAP denote the negative Hessian of the log likelihood evaluated at the MAP point, and let K be the prior covariance matrix. The Laplace approximation yields
| µ* = m(X*) + k*K−1(fMAP − m(X)), |
Given the Gaussian predictive distribution for the latent utilities, the model can evaluate the probability that one candidate is preferred over another. For two test compositions a, b ∈ X*, let µa, µb denote their predictive means and let
| σab2 = Σ*,aa + Σ*,bb − 2Σ*,ab |
Let
denote the Laplace-approximated posterior at iteration t. Let
| I(x) = max(0, f(x) − ft+), |
Because the posterior over f(x) is Gaussian, EI admits a closed form. Define the standardized improvement
| αEI(x) = (µt(x) − ft+)Φ(z(x)) + σt(x)ϕ(z(x)). |
| f2(x) = −c(x), |
| f(x) = (f1(x), f2(x)) = (u(x), −c(x)). |
Each objective is modeled using an independent GP. The GP posterior at any unobserved point x provides predictive distributions
Let
denote the current set of non-dominated (Pareto optimal) points in the bi-objective space. We choose a reference point r = (r1, r2) that is dominated by all feasible objective values. The dominated hypervolume (in this case, area) is defined as
If a new point f(x) is added, the hypervolume improvement is
Since f1(x) and f2(x) are uncertain under the GP posteriors, we evaluate the Expected Hypervolume Improvement (EHVI):21
For the two-objective case, the Pareto frontier can be written as an ordered sequence of points
| p(1), p(2),…, p(K), |
| ak = p1(k), bk = p2(k−1), |
| p(0) = (r1, p2(1)), p(K+1) = (p1(K), r2). |
A newly sampled point f(x) contributes hypervolume improvement only if it falls inside one of these cells. In such a case, the improvement is exactly the area of the rectangle:
| HVI(x) = (f1(x) − ak)(bk − f2(x)), for f(x) ∈ Ak. |
In our preference-based bi-objective Bayesian optimization framework, the acquisition function (EHVI) is defined over individual candidate points, while the latent utility model requires pairwise comparisons for learning. Thus, at each iteration, we first identify the candidate that maximizes the analytical expected hypervolume improvement, selecting
Because the latent utility is inferred from pairwise feedback, this EHVI-selected challenger is not evaluated alone, but is compared against the current incumbent x★, defined as the design with the highest posterior mean utility under the PairwiseGP model,
The next experiment therefore consists of presenting the pair (x★, xn+1) to the user, who indicates which of the two is preferred. This comparison updates the latent utility posterior, while the cost of the challenger is evaluated directly. This incumbent–challenger mechanism allows EHVI to guide exploration in the multi-objective space, while remaining consistent with the structure of pairwise preference learning.
The electronic-structure origins of the dielectric response for individual elements (e.g., Pt or Al) are not analyzed explicitly here; instead, we treat the Thermo-Calc optical properties model as a ground-truth oracle that deterministically maps alloy composition to visible color for the purposes of preference-based optimization.
The equilibrium phases {αi}i=1N present at a given alloy composition x and their corresponding volume fractions {vi}i=1N were obtained from the Equilibrium Calculator. Here, N denotes the number of thermodynamically stable phases at composition x, αi labels the i-th equilibrium phase, and vi is its volume fraction, satisfying
. These phase identities and volume fractions constitute the phase-resolved inputs required for subsequent optical mixing calculations.
For each phase αi, the complex dielectric function εi(λ) = ε1,i(λ) + iε2,i(λ) was taken from the Thermo-Calc noble-metal optical database as a function of wavelength λ.11,22 For multiphase alloys, these were combined using the generalized Bruggeman effective-medium model to obtain an effective dielectric function εeff(λ) satisfying
![]() | (1) |
The effective complex refractive index ñ(λ) = n(λ) + ik(λ) was computed from εeff(λ) via
![]() | (2) |
![]() | (3) |
Assuming light incident from air onto a semi-infinite alloy surface, the wavelength-dependent reflectance R(λ) was calculated using the Fresnel equations for s- and p-polarized light at incidence angle θ:
![]() | (4) |
![]() | (5) |
sin
θ = ñ2
sin
θt. The unpolarized reflectance was taken as
.24
Visible color was obtained by integrating the reflectance spectrum over the visible range (400–700 nm) in the CIE XYZ framework:25
![]() | (6) |
![]() | (7) |
![]() | (8) |
, ȳ,
are the CIE color-matching functions for the selected standard observer (2° or 10°), and k is a normalization constant such that Y = 100 for a perfectly reflecting surface.25 The resulting tristimulus vector (X, Y, Z) was then converted to the reported color space (CIE Lab or sRGB) using standard CIE transforms.
, was defined as a quinary composition manifold spanned by {Au, Ag, Cu, Pt, Al}, with five composition variables constrained to sum to 100 at%, and sampled on a uniform grid in 5 at% increments. Only quinary alloys were considered; that is, each candidate composition contained at least 5 at% of every element, which substantially increases the combinatorial complexity of the color-composition mapping and provides a stringent test of the Thermo-Calc noble-metal optical database. Enumeration of this constrained 5 at% grid yielded 3876 distinct compositions. For each
, the Thermo-Calc optical properties-noble model was used to compute equilibrium phase fractions and wavelength-resolved optical properties.11 Due to database coverage limitations, complete dielectric and optical solutions were successfully obtained for 933 of the 3876 quinary compositions, as shown in Fig. 1. These 933 alloys constitute the design space that is to be explored with optimization in search of optimal color.
At iteration t, the user is presented with two candidate rings having compositions xt(a) and xt(b). For each candidate, the corresponding color is predicted by the Thermo-Calc optical model and rendered onto a ring visualization to facilitate realistic comparison. The user provides a binary preference,
.
At each iteration, we select the next candidate by
is the candidate composition set. The newly selected composition is paired with a reference candidate (e.g., the current incumbent best) to form a two-ring query shown to the user; the resulting preference is added to
, the Pairwise GP posterior is updated, and the process repeats. In this way, user feedback on rendered ring colors is converted into pairwise constraints on f(x), allowing the model to efficiently converge to compositions that best match the user's aesthetic preference.
Starting from an initial set of candidate rings, the campaign proceeded by iteratively proposing two color-rendered options, eliciting a preference, and updating the pairwise Gaussian process posterior over latent utility. Over 25 iterations, the model progressively concentrated sampling near compositions predicted to yield higher gold-likeness, producing a clear trajectory toward the gold-dominant region of the design space, as seen in Fig. 2. To complement this visualization, the SI reports the optimization quantitatively, showing the R, G, and B channel values of sampled alloys moving toward those of the target gold color over successive iterations. This convergence demonstrates that preference-based Bayesian optimization can efficiently infer and exploit a user's latent aesthetic utility using only pairwise feedback.8,9
These results indicate that the framework converges, as queried colors become increasingly gold-like over successive iterations. To assess the robustness of this behavior, we performed additional runs under varied hyperparameter and initialization settings; the resulting convergence trends were consistent and are reported in the SI. We further evaluated robustness to surrogate-model choices by repeating the optimization with alternative Gaussian-process kernels and a random sampling baseline, with results provided in the SI.
In this case study, we demonstrate a bi-objective optimization over ring color and price. As before, we model the color objective with a pairwise GP, while price is modeled with a standard GP regressor. Because price is relatively easy to predict from rule-of-mixtures estimates of elemental costs, we train the price GP once using all candidate alloys upfront; only the preference GP for color is updated iteratively as new comparisons are collected. Each iteration, the EHVI acquisition function then combines predictions from both GPs to propose new alloys that are expected to improve the current color-price Pareto front, i.e., alloys whose colors the user is likely to prefer at an attractive price point. As before, we initialize the optimization scheme with 2 initial data points (a Pairwise GP needs to begin with 2 points as it needs an initial preference). We then run the optimization campaign for 25 iterations.
To analyze the results of this optimization campaign, we visualize the preference-rank-versus-price Pareto front, as shown in Fig. 3. As expected, more metallic silver-like colors tend to have lower costs, whereas more gold-like colors (i.e., those closer to the preferred target color) are generally more expensive. A key advantage of this approach is that it presents users with a menu of Pareto-efficient options: they can select alloys along the color-price front according to their own willingness to pay for a given color when choosing candidates for further scale-up.20
![]() | ||
| Fig. 3 Pareto front of preference rank versus price per gram, showing the trade-off between a dummy user's learned preferred ring color and the cost of the ring. | ||
On top of this forward model, we implemented PBO to infer a latent utility function over alloy composition from pairwise comparisons. In a simulated-user setting, the framework efficiently retrieved target colors within a limited number of queries, illustrating that pairwise feedback alone is sufficient to drive the search toward compositions that match an individual aesthetic preference. Extending this approach to a bi-objective setting, we incorporated a cost model and used expected hypervolume improvement to construct a color-cost Pareto front. This front provides users or designers with a transparent set of trade-off options between visual appeal and material expense.
Although the present study relies on in silico data and an idealized user model, the results highlight a practical path toward interactive, data-driven alloy design for jewelry. Future work will focus on experimental validation of selected compositions, integrating real user studies, and developing a more accessible graphical interface that connects preference learning, Thermo-Calc simulations, and visualization of color-cost trade-offs. Taken together, these developments point toward a customizable ring design workflow in which users can explore personalized aesthetic choices within physically realistic and economically informed alloy spaces.
The code supporting this article can be accessed at: https://doi.org/10.5281/zenodo.18164808.
Optical color outputs shown in this article were generated using Thermo-Calc with a licensed database that cannot be redistributed publicly. To enable transparency and reuse, the workflow reports and visualizes obfuscated output data, included in the repository. Reproducing the full workflow requires access to Thermo-Calc.
Supplementary information (SI): additional figures, tables, computational details, model parameters, and supporting data related to the preference-based Bayesian optimization workflow and alloy color optimization results. See DOI: https://doi.org/10.1039/d6dd00009f.
| This journal is © The Royal Society of Chemistry 2026 |