Open Access Article
Danila Shiryaeva,
Emil I. Jaffalbc,
Sangjoon Leed,
Balaranjan Selvaratnam
c and
Anton O. Oliynyk
*bc
aPhD-Track Program in Advanced Materials, Institut Polytechnique de Paris, Palaiseau 91120, France
bPh.D. Program in Chemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA. E-mail: anton.oliynyk@hunter.cuny.edu
cDepartment of Chemistry, Hunter College, City University of New York, New York, NY 10065, USA
dDepartment of Materials Science and Engineering, Stanford University, Stanford, CA 94305, USA
First published on 25th May 2026
Machine learning (ML) has become a central component of data-driven materials discovery, yet its practical impact remains heavily dependent on how these predictions are translated into experimentally-realizable outcomes. In this review, we examine ML-guided crystal structure discovery through the lens of recommendations as well as unconstrained generation, to emphasize interpretable workflows embedding chemical intuition, physical constraints, and experimental validation. Surveying standalone ML, hybrid ML-DFT, and machine-learned interatomic potential (MLIP) approaches, we highlight how constrained design spaces, data preprocessing, and validation strategies shape novel discovery success. Drawing on our own experimentally-validated case studies, ranging from supervised to unsupervised learning, as well as recommendation-type explorations, we outline the shift towards interpretable and explainable ML models that guide synthetic decisions, reveal trends that were previously difficult to identify, confirm established patterns, and uncover new ones. Collectively, we highlight the results of interpretable ML, which is more effective when deployed within experimental workflows to bridge learning and chemistry, to enable a reliable discovery pathway to solid state materials.
Data-driven materials discovery in the solid state is commonly organized around two complementary paradigms: forward prediction5,6 and inverse design.7,8 This distinction reflects how machine learning is deployed within a discovery workflow rather than differences in algorithmic form. Forward approaches use machine learning (ML) to approximate mappings from composition or structure to properties, while inverse approaches frame discovery as a decision-making problem aimed at identifying candidates that meet predefined targets under physical and experimental constraints.9,10 These two paradigms and their points of interaction across structure, property, and performance objectives are schematically summarized in Fig. 1.
In the forward paradigm, supervised ML models are trained on large, static datasets, most often derived from density functional theory (DFT), to predict properties such as formation energy,11,12 elastic moduli,13–15 transport coefficients,16–20 or magnetic ordering temperatures.21–23 Given the consistency of the data (e.g., DFT), these reliable models enable rapid screening and ranking within predefined chemical or structural spaces and are widely used as surrogates for first principles calculations in high-throughput studies.24,25 Forward prediction is therefore most effective when the hypothesis space is well-defined; however, commonly, the validation remains computational rather than experimental.26–30
Inverse design, as the name suggests, inverts this logic by starting from a desired property or performance objective and iteratively narrowing the search space through ranked recommendations, which should enable faster material targeting. In this setting, ML models are embedded within optimization or feedback loops, such as Bayesian optimization, active learning, or evolutionary search, where their primary role is to prioritize candidates under uncertainty and limited validation budgets.10,31,32 When the exploration space is large, uncertainty-driven exploration can mitigate biases inherent to supervised learning workflows. These include dataset selection bias, the overrepresentation of well-studied chemistries, and the exclusion of negative or failed experimental outcomes, which collectively restrict exploration to historically sampled regions of materials space.33 The success of inverse workflows is evaluated not by predictive accuracy alone, but by their ability to convert ranked candidates into experimentally confirmed structures or material properties.34,35
While forward prediction and inverse design provide the two principal operational modes of machine-learning-driven discovery, several recent studies have proposed a complementary paradigm referred to as physics-informed machine learning (PIML).36,37 In this framework, machine learning models are explicitly constrained by physical knowledge, which may include governing equations,38 as well as structural,39 compositional,40 chemical,41 and physical42 constraints that define feasible regions of materials space. Rather than relying solely on correlations learned from data, physics-informed approaches incorporate domain knowledge directly into model architectures, descriptors, or training objectives to guide learning toward physically admissible solutions. Such constraints can take many forms, ranging from conservation laws and differential equations to empirical chemical rules or stability criteria derived from materials thermodynamics. The common objective of these approaches is to improve generalization and reduce the risk of discovering candidates that are mathematically plausible yet physically unrealistic. In addition to academic implementations, physics-informed and ML-integrated materials modeling frameworks are increasingly available through commercial platforms, such as Matlantis,43 QuantumATK,44 Schrödinger Materials Science Suite45 and Citrine Platform,46 which integrate machine learning with first-principles or physics-based simulations to enable scalable and experimentally-relevant materials discovery workflows.
In the context of materials discovery, this integration of physical knowledge is particularly important because the accessible design space is strongly restricted by factors that are difficult to capture purely from data.47 Structural stability, phase competition, chemical compatibility, and synthesis pathways impose constraints that may not be explicitly represented in available datasets but nevertheless determine whether a predicted material can exist and be experimentally realized. In practice, inverse design strategies often rely on predefined candidate spaces derived from materials databases, enumerated substitutions, or structure prototypes, which constrain exploration to chemically plausible regions of materials space.48,49 Other approaches attempt to learn the underlying distribution of materials and sample new compositions or structures from continuous latent representations.32,34,50 Regardless of the specific strategy, the feasible design space of solid-state materials remains strongly limited by phenomena such as polymorphism,51–53 metastability,54 and synthesis pathways,55 which ultimately determine whether a predicted compound can be experimentally realized.
For this reason, many recent materials discovery workflows increasingly combine machine-learning models with physical knowledge to guide exploration toward experimentally meaningful regions of materials space.7,56 In such approaches, physical constraints act as an additional source of inductive bias that complements both forward prediction and inverse design strategies. While forward models enable scalable property prediction and inverse workflows support targeted search within large chemical spaces, physics-informed approaches help ensure that the resulting candidates remain consistent with known physical and chemical principles. Together, these strategies form a complementary toolkit for translating machine-learning predictions into experimentally realizable materials discoveries.
Large language models (LLMs) represent an increasingly prominent yet distinct branch of AI-driven materials discovery,57 with applications spanning automated literature mining and dataset construction,58 autonomous chemical research and reaction planning,59 text-guided crystal structure generation,60 multimodal property prediction from atomic structure and natural language,61 and closed-loop inverse design workflows for functional materials such as perovskites.62 While these approaches demonstrate considerable promise in accelerating data aggregation and hypothesis generation, they operate through fundamentally different mechanisms than the constraint-driven, experimentally validated workflows that form the focus of this review, and are therefore not examined in detail here.
In this review, we demonstrate that the most successful machine-learning strategies for materials discovery are not those that generate unconstrained candidate structures, but those that operate within chemically and physically constrained regions of materials space and provide interpretable recommendations that guide experimental exploration. To illustrate this perspective, we examine different classes of recent machine-learning approaches – from data-driven predictive models to hybrid ML–DFT workflows and machine-learned interatomic potentials, and analyze how these frameworks incorporate physical knowledge, handle experimental constraints, and ultimately enable experimentally validated materials discovery. We define successful ML-driven materials discovery as experimentally validated outcomes, including (i) successful synthesis of the predicted phase, (ii) agreement between predicted and measured properties when applicable, and (iii) reproducibility of the reported synthesis pathway, which is frequently not demonstrated in existing studies. These criteria distinguish computational predictions from experimentally actionable discoveries.
In contrast, training data in modern machine-learning workflows are predominantly based on high-throughput DFT repositories,68,69 with the Open Quantum Materials Database (OQMD)70 and Materials Project71 as widely used examples. These resources provide formation energies and relaxed crystal structures at scale, which makes them practical foundations for stability screening and for training structure-based models. In parallel, expert-curated approaches such as ME-AI72 demonstrate that interpretable descriptor discovery can be achieved using small, measurement-based datasets, highlighting a complementary data paradigm to large DFT-dominated repositories.
Interoperability across repositories has become critical as workflows increasingly combine prediction, screening, and validation. The OPTIMADE73 provides a standardized application programming interface for querying structures and metadata across independent databases. AFLOW74 provides complementary infrastructure for symmetry normalization, prototype assignment, and standardized descriptor generation. Together, these tools reduce ambiguity during data aggregation and enable reproducible cross database screening.
Pre-processing choices strongly influence which physical distinctions a model can learn, and which biases it inherits. Structural standardization, symmetry reduction, and removal of duplicate or near-duplicate entries are essential when repositories contain many closely related variants of the same prototype family. Several discovery workflows therefore enforce explicit space group and Wyckoff constraints during structure handling to align candidate evaluation with physically meaningful degrees of freedom.75 A recurring limitation of large DFT repositories is the disconnect between zero-Kelvin stability labels and experimental realizability. Many datasets emphasize formation energy and convex hull stability, while information on metastability, kinetics, and processing-dependent phase selection is sparse. This gap has motivated synthesis-aware strategies that move beyond the static stability screening based solely on zero-Kelvin thermodynamic criteria, including autonomous experimentation35 and temperature-dependent phase stability prediction.76
The authors would like to note that it is often that constrained ML workflows may outperform unconstrained generation, with a particular focus on obeying general chemical physics. Recent work by Amazon's Science team goes into great-depth about the enforcement of more elementary physics-based laws as it relates to deep-learning methods at the industrial level. One case77 handled is on enforcing physics through boundary conditions, exhibiting “a 20-fold performance improvement over previous operator models.” In another case78 dealing with mass conservation, they found that their constrained model “outperforms other ML-based approaches that do not guarantee volume conservation” and had the “smallest relative L2 errors across various values of viscosity ν, by enforcing the exact boundary values.”
Prior academic-based ML-guided discovery studies79 demonstrate that an ML-guided search for Li superionic conductors was 2.7 times more likely to identify fast ion conductors than random exploration (unconstrained generation), and achieved at least a 44-fold improvement in the log-average room-temperature ionic conductivity of selected candidates. The model additionally achieved an F1 score roughly 3.5 times greater than random selection, and substantially outperformed human expert screening in both predictive accuracy and speed. These are only some collective results representing the benefit of constraining workflows to maintain scientific rigor across models, ensuring that chemically-informed strategies improve efficiency and still drive exploration across broad and diverse materials spaces.
In tandem, data sources and preprocessing choices define the effective design space explored by ML models. In experimentally-oriented discovery, data handling functions as an active decision layer that shapes candidate ranking and validation, rather than as a neutral technical step.80–82
One common class of ML approaches focuses on property prediction from composition or structure using supervised learning. Structure-aware neural networks such as crystal graph convolutional neural network (CGCNN) variants predict formation energies, elastic properties, or transport-related quantities directly from relaxed crystal structures, enabling rapid ranking of known or hypothetical materials within fixed design spaces.83,84 Composition-based models are also widely used, particularly when structural information is unavailable, and have demonstrated success in narrowly defined application targets such as magnetocaloric screening and thermoelectric property mapping.85,86
Another class of approaches applies generative or sequence-based models to propose new crystal structures. Previous work demonstrated that deep generative models can reproduce statistical features of crystallographic data without explicit physical constraints.34 More recent autoregressive and diffusion-based models87–89 extend this idea by generating full crystal structures conditioned on symmetry or composition, but validation typically relies on downstream stability filtering rather than intrinsic guarantees of synthesizability. As a result, these models function more reliably as proposal engines than as standalone discovery tools.
Recommendation-focused ML approaches avoid unconstrained generation by ranking candidates drawn from predefined and physically meaningful spaces. A representative example of such a constrained, recommendation-driven workflow is shown in Fig. 3, which summarizes results from the study of Fe2P-type magnetocaloric compounds93–95 where supervised learning is embedded within an iterative pipeline linking literature-derived data, compositional screening, and experimental validation. In this work, the authors constructed a curated experimental dataset comprising 603 samples collected from published studies and new measurements, of which 558 remained after filtering and deduplication. The feature space was intentionally restricted to chemical composition and annealing conditions, reflecting the most consistently reported experimental descriptors. A feed-forward neural network with a single hidden layer of 256 neurons was trained to predict the magnetic transition temperature; the resulting agreement between predicted and experimental values across the full dataset is shown in Fig. 3a, with the model achieving a mean absolute error of approximately 20 K and R2 = 0.89 on held-out data. The trained model was subsequently used to map low-temperature regions of the Mn–Fe–P–Si compositional space, with Fig. 3b presenting the predicted transition temperature landscape for the base Mn–Fe–P–Si system together with experimental compositions overlaid for direct validation. Fig. 3c presents the predicted transition temperature landscape for Co-substituted compositions; however, experimental transition temperatures are not overlaid in this panel, despite the fact that selected compositions were synthesized and characterized in subsequent analysis. Rather than generating unconstrained candidates, the workflow therefore operates as a recommendation engine over a physically admissible design space, with model performance assessed by its ability to identify experimentally realizable compositions.
![]() | ||
| Fig. 3 Machine-learning prediction of transition temperatures in Fe2P-type magnetocaloric compounds. (a) Comparison between experimentally measured transition temperatures and values predicted by the trained neural-network model, illustrating predictive accuracy, (b) predicted transition-temperature map for the MnxFe2−xSiγP1−γ compositional space under fixed annealing conditions, with experimental samples overlaid for validation and (c) corresponding prediction map for Co-substituted compositions, showing the expansion of the low-temperature region upon partial Fe–Co substitution. Adapted with permission from ref. 93. Copyright 2022 Elsevier. | ||
Evidence-based recommenders and matrix factorization methods have been applied to alloy discovery where labelled data are sparse, enabling prioritization of compositions likely to form targeted phases and supporting direct experimental validation.96–98 Similar logic underpins unsupervised and similarity based workflows that progressively narrow candidate sets based on proximity to known functional materials, leading to experimentally confirmed discoveries without explicit property regression.99
Across the literature, ML-only approaches succeed when the prediction target is conservative and closely tied to available data, such as phase existence or relative ranking within a restricted family. Failures are most common when models extrapolate beyond the structural or chemical support of the training set or when generated candidates lack an explicit pathway to synthesis.100–103 These limitations propose the integration of first-principles information or explicit physical constraints, which is addressed in the following section.
Table 1 highlights the recent cases where ML driven ranking directly informed experimental synthesis and characterization, particularly focused on alloys and intermetallic systems. The most common application areas we bring forth here are structural performance, energy-related properties, and especially magnetism (e.g., type of magnetism and transition temperature), as key metrics can be often represented by a single scalar value which is straightforward to model.
| Material class | Application | Predicted properties | Initial training dataset | ML/AI algorithm | Constraints | # Candidates evaluated | Year | Ref. |
|---|---|---|---|---|---|---|---|---|
| Co-based superalloys | Structural | γ′ solvus temperature, γ′ fraction | Literature (417 entries) | SVM, RF, GBoost, XGB, KNN | Thermodynamic | 6 experiments | 2020 | 106 |
| High-entropy alloys (HEA) | Magnetism | Phase formation probability | 4 HEA datasets (∼2000 entries) | Evidence-based recommender | None | 1 experiment | 2021 | 76 |
| Transition-metal borides/carbides | Mechanical | Formation energy, elastic moduli, stability | Materials Project (12 277 entries) |
MEGNet GNN + Bayesian optimization | Structural | 2 experiments | 2021 | 34 |
| Thermodynamic | ||||||||
| MnZnSb intermetallics | Magnetism | Curie temperature | ICSD, PCD (∼200 000 entries) |
RF Regressor | Structural | 4 experiments | 2021 | 107 |
| FCC high-entropy alloys (Ni-based) | Mechanical | Precipitate fraction, morphology | Ni-based superalloy database (not specified) | Artificial NN | Thermodynamic | 1 experiment | 2021 | 108 |
| Kinetic | ||||||||
| Fe–Co–B intermetallics | Magnetism | Formation energy, magnetic anisotropy | Materials Project (28 046 entries) |
CGCNN (1G/2G) + AGA | Structural | 1 experiment | 2022 | 75 |
| Thermodynamic | ||||||||
| High-entropy alloys | Mechanical | Hardness | Literature (370 entries) | SVM + SHAP | Chemical | 3 experiments | 2022 | 105 |
| Half-Heusler materials | Energy | Thermoelectric suitability | Materials Project (456 entries) | Unsupervised clustering | Physical | 2 experiments | 2022 | 10 |
| Fe2P-type intermetallics | Magnetism | Transition temperature | Literature (603 entries) | GB/SVRs, artificial NNs | Structural | 8 experiments | 2022 | 93 |
| Physical | ||||||||
| SnSe materials | Energy | Thermal conductivity | In-house dataset (776 entries) | XGBoost, RF, SVR, k-NN | Chemical | 7 experiments | 2023 | 109 |
| Sn–Ag–Cu–Bi–In–Ti alloys | Mechanical | Strength, ductility | Experimental data + active learning (27 entries) | Active learning with GPR, Bayesian optimization (UCB) | Thermodynamic | 1 experiment | 2024 | 110 |
| Physical | ||||||||
| B2 multi-principal intermetallics | Structural | Phase stability, mechanical performance | In-house dataset (1251 entries) | CVAE + physics-informed ANN | Thermodynamic | 3 experiments | 2025 | 111 |
| Physical | ||||||||
| Laves phases (AB2) | Magnetism | Curie temperature, phase stability | In-house dataset (2060 entries) | RF, GB, NN | Thermodynamic | 9 experiments | 2025 | 94 |
| Chemical | ||||||||
| Novel intermetallics | Structural | Structure-type, latent variables | AB3 CIFs, PCD (2366 entries) | PLS-DA, SVM, XGBoost | Structural | 1 experiment | 2025 | 112 |
| Chemical | ||||||||
| Novel intermetallics | Structural, energy | Structure-type, thermoelectric properties | Multiple datasets (191 236 entries) |
PLS-DA | Structural | 1 experiment | 2025 | 113 |
| Chemical |
Predicted properties are generally framed within established physical theories; phase stability is often linked to formation energy and electronic structure, magnetic ordering to some form of exchange interactions, mechanical strength to microstructure, and so forth. Training data sets are often quite small, and this has been known as a problem in the materials informatics community for some time (compared to big data scaling to millions of data points). Experimental entries are even more limited, consisting of a couple hundred entries, whereas computational databases such as the Materials Project can have many thousands. In terms of the models used, there is quite a good diversity, spanning from support vector machine (SVM), neural network (NN), and ensemble tree models such as random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGBoost). With regard to the data for analysis and model development, although open availability and machine-readable reporting are on the rise thanks to the recent efforts such as findable, accessible, interoperable and reusable (FAIR) standards,104 the bulk of the data is still behind paywalls.
Across the analysed literature, only a small fraction of studies report direct experimental validation of machine-learning predictions, and even among those, negative results are seldom disclosed. This positive-outcome reporting bias, however, is commonplace across academia in general, and this bias hampers the development of models to navigate the experimental synthesis landscape. Another important aspect of progress in machine-learning-driven materials chemistry is the availability of data and code. This availability is highly inconsistent across the studies listed in Table 1. While a small fraction of studies make their data and code readily accessible, in most cases they are only available upon request. While most of the studies presented in Table 1 report only successfully synthesized compounds with predicted properties, some report experimental validation failures as well. In a study on ML-guided design of high-entropy alloys with enhanced hardness,105 two out of three recommended compositions failed to meet their hardness targets upon synthesis, with prediction errors of 58.2% and 39.1% attributed to extrapolation beyond the training domain and insufficient data in the target compositional region, respectively. In a study on accelerated design of Co-based superalloys,106 four out of six ML-filtered candidates were found upon synthesis to contain unwanted secondary phases beyond the desired γ + γ′ two-phase microstructure. In a study combining Bayesian optimization and graph deep learning for the discovery of ultra-incompressible hard materials,34 six out of eight candidate compositions attempted via spark plasma sintering yielded multiphase products rather than the predicted single-phase compounds.
Most experimentally validated studies incorporate explicit domain constraints within the machine-learning workflow. In Fe2P-type magnetocaloric materials, the search is restricted to a single structural family and a curated dataset of 558 experimentally derived compositions, following filtering from 603 reported samples.10 The model is then applied to map transition temperatures within this predefined compositional space, rather than exploring unconstrained chemistries. In high-entropy alloy design, compositions are constrained to physically meaningful ranges of 5–35 at% per element and restricted to a limited set of alloying systems.105 The feature space is further reduced from 142 candidate descriptors to 5 key variables, and candidate generation is guided using inverse projection and high-throughput screening to target high-performance regions.
A dominant hybrid pattern uses machine learning as an energy predictor embedded within structure search or optimization loops. Graph-based models approximate total energies and guide evolutionary or adaptive genetic algorithms to explore large structure spaces efficiently, with explicit DFT calculations reserved for a small subset of low-energy candidates.114,115 Bayesian optimization under symmetry and Wyckoff constraints further replace direct structural relaxation, allowing scalable evaluation of hypothetical crystals while preserving crystallographic constraints.75 Several compounds identified using these approaches were subsequently synthesized and structurally validated.
Hybrid workflows are also used to overcome the limitations of static zero-Kelvin stability screening. Machine learning potentials trained on first principles data enable finite temperature sampling, configurational averaging, and free energy estimation, allowing prediction of synthesis relevant stability windows inaccessible to conventional DFT screening.76,116 These approaches have been applied to complex alloys and ceramics, where entropic stabilization and disorder play a decisive role in phase formation.
Representative examples where temperature-dependent predictions aligned with experimental phase formation are included in Table 2. In contrast to Table 1, which emphasizes ML-guided experimental validation without first-principles calculations, Table 2 highlights workflows in which ML is tightly integrated with DFT, including cases of high-throughput studies, and in some cases, generative modelling. While both tables span roughly the same applications (energy, magnetism, structural prediction) as well as with similar materials classes (intermetallics, alloys), the key distinction here is within the scaling potential, with considerably more hypothetical structures to be screened computationally with DFT.83,117
| Material class | Application | Predicted properties | Computational details | Design space | # Candidates evaluated | Year | Ref. |
|---|---|---|---|---|---|---|---|
| Pnictides, chalcogenides | Thermoelectrics | Power factor (S2σ) | DFT dataset (∼1600 compounds) + active learning gradient boosting regressor | Binary and ternary diamond-like compounds | Computational | 2020 | 119 |
| Mn–Fe–P–Si intermetallics | Magnetism | Curie temperature (TC), ΔSm, ΔThys | Feed-forward neural network (trained on experimental datasets: n(TC) = 503, n(ΔThys) = 465, n(ΔSm) = 660) + DFT validation trends | Mn–Fe–P–Si composition + processing parameters | Computational | 2022 | 95 |
| High-entropy alloys | Low thermal expansion alloys | Thermal expansion coefficient | Active learning loop: autoencoder generative model + ensemble regression + DFT + CALPHAD | Fe–Ni–Co–Cr–Cu high-entropy alloys | 17 experiments | 2022 | 48 |
| Laves phases | Magnetism | Formation enthalpy | Descriptor-based ML (SISSO symbolic regression) trained on >600 DFT-calculated rare-earth compounds + high-throughput screening + experimental XRPD validation | RE(TM1TM2) compositions | 3 experiments | 2022 | 120 |
| Metal oxides, phosphates | Autonomous solid-state synthesis | Reaction yield, phase purity, phase fractions | NLP-based recipe generator + ML temperature model + active learning (ARROWS) + DFT convex-hull screening | 58 predicted targets | 41 experiments | 2023 | 35 |
| Inorganic crystalline materials | Multiple purposes | Formation energy, decomposition energy (DFT stability) | Large-scale GNNs (GNoME)+ active learning + DFT relaxation (VASP) | >109 generated; 2.2M predicted stable; 381k on convex hull | Cross-validation with literature (736 structures) | 2023 | 83 |
| Intermetallic nanoparticles | Oxygen reduction reaction (ORR) | Ordering energy, thermodynamic stability, ORR activity | Active-learning Gaussian process regression trained on DFT formation energies + DFT slab calculations for adsorption energetics | Ternary Pt2CoM intermetallics (M = 16 candidate elements) | 2 experiments | 2024 | 121 |
| Inorganic crystalline materials | Multiple purposes | Stability (DFT hull), magnetic density, symmetry, mechanical and electronic properties | Diffusion-based generative model (MatterGen) + adapter fine-tuning for property constraints + MLFF pre-filtering + DFT relaxation and validation | 607k structures; full periodic table | 1 experiment | 2025 | 117 |
| Rutile-type binary oxides | Oxygen evolution reaction (OER) | OER overpotential (DFT), free-energy barriers | DFT + Gaussian-process Bayesian optimization | 66 binary oxides (6 hosts × 11 dopants) | 1 experiment | 2025 | 122 |
Subsequently, the studies outlined in Table 2 operate over vastly larger design spaces, combining quite unique pieces (such as active learning, DFT relaxation, adsorption-energy calculations) as integral parts of the workflow to refine predictions before validation. In turn, this allows the exploration of new chemical spaces that have not been included in the original training data. Algorithmically, there is much overlap with Table 1, particularly with the use of gradient boosted models, Gaussian processes (GP), and NN. Notable differences however include shifts toward symbolic regressions such as sure independence screening and sparsifying operator (SISSO), graph NNs, and closed-loop discovery frameworks.118 Their success reflects the incorporation of physical knowledge, conservative energetic ordering, and iterative validation, illustrating how physics-informed machine-learning frameworks can bridge the gap between computational prediction and experimental realization.
Another major class of hybrid pipelines combines machine learning screening with adaptive feedback between computation and experiment. In these closed-loop workflows, ML models rank candidates, DFT refines stability and property predictions, and experimental outcomes are used to iteratively update the ranking policy. Such closed-loop discovery schemes are demonstrated in the active-learning framework developed for high-entropy Invar alloys,48 where generative sampling, DFT calculations, thermodynamic modeling, and experimental measurements were integrated into a continuous discovery loop. Starting from a sparse database of roughly 699 known compounds, the workflow iteratively proposed candidates, screened them using physics-informed descriptors, and experimentally validated selected alloys. The inclusion of DFT- and CALPHAD-derived descriptors significantly improved predictive performance, as shown in Fig. 4a, which compares the model training and testing history with and without physics-based descriptors, reducing the testing error from about 19% to 14% compared with composition-only learning. This closed-loop strategy enabled the identification of new high-entropy alloys combining very low thermal expansion coefficients (≈2 × 10−6 K−1) with high configurational entropy, illustrated in Fig. 4b, which maps the thermal expansion coefficient against configurational entropy for both known alloys and compositions discovered through the active learning loop, thereby demonstrating how adaptive ML–DFT pipelines can progressively narrow vast composition spaces under real experimental constraints. This strategy has enabled discovery of new intermetallic and magnetic compounds by progressively narrowing the design space under real validation constraints.31
![]() | ||
| Fig. 4 Closed-loop machine-learning discovery of high-entropy Invar alloys. (A) Training and testing history of the regression model with and without physics-based descriptors, demonstrating that inclusion of DFT- and CALPHAD-derived quantities reduces prediction error and stabilizes learning under sparse data and (B) property landscape showing the thermal expansion coefficient as a function of configurational entropy for known alloys and compositions discovered in this work, highlighting the identification of alloys combining Invar-level thermal expansion with high entropy. Adapted with permission from ref. 48. Copyright 2022 American Association for the Advancement of Science (AAAS). | ||
Across the literature, hybrid DFT and machine learning pipelines consistently outperform standalone ML approaches when experimental validation is required. Their success reflects the use of conservative physical constraints, consistent energetic ordering, and iterative validation rather than unconstrained exploration.
000 atoms, the model enabled temperature-dependent sampling of configurational equilibria across the range 500–2000 K. The simulations predicted phase segregation at low temperature and entropy-stabilized single-phase behavior at high temperature, as illustrated in Fig. 5, which contrasts the spontaneous chemical segregation into multiple carbide-rich domains observed at 500 K (Fig. 5a) with the homogeneous elemental distribution recovered at 2000 K (Fig. 5b), with the layer-resolved concentration profiles in panels (c) and (d) quantifying the suppression of compositional fluctuations upon heating, directly guiding experimental synthesis conditions. The formation of the predicted TiZrNbHfTaC5 high-entropy carbide was experimentally validated through electric arc synthesis, with a single-phase material obtained at 2000 K and confirmed by X-ray diffraction and microscopy analysis.
![]() | ||
| Fig. 5 Temperature-dependent configurational stability of the TiZrNbHfTaC5 high-entropy carbide predicted using a machine-learned interatomic potential. (a) Atomic distribution in a simulated supercell at 500 K, showing spontaneous chemical segregation into multiple carbide-rich domains, (b) corresponding supercell at 2000 K, exhibiting a homogeneous single-phase solid solution stabilized by configurational entropy and (c and d) layer-resolved concentration profiles of metallic species along the simulation cell, quantifying compositional fluctuations in the segregated state and their suppression in the high-temperature phase. Adapted with permission from ref. 86. Copyright 2023 Springer Nature. | ||
Comprehensive reviews position ML interatomic potentials as reactive force fields that support recommendation of low energy structures, metastable phases, diffusion pathways, and deformation mechanisms across metals, oxides, and complex functional materials.123,124 Universal and transferable graph based architectures further expand this role by enabling chemically diverse simulations without refitting for each new system, allowing large scale structure exploration prior to selective first principles verification.125
Beyond structural stability, interatomic potentials enable mechanism level recommendation by resolving atomistic processes that govern macroscopic behaviour. Graph-neural-network-based potentials trained on energies and forces reproduce experimentally-measured lattice parameters, stacking fault energies, and elastic responses, enabling molecular dynamics simulations that reveal dominant deformation and transport mechanisms in chemically complex alloys.126 Data-efficient formulations demonstrate that carefully chosen low-dimensional descriptors can outperform highly expressive representations when training data are limited.127 For example, in a five-component refractory HEA dataset comprising ∼2.9k structures (∼1.5 × 105 atomic environments) spanning the full composition range of Mo–Nb–Ta–V–W alloys, models based on simple two- and three-body, and density descriptors achieved ∼2–3 meV per atom accuracy while reaching acceptable predictive performance using only 25–35% of the training data-roughly a two-to-three-fold improvement in data efficiency relative to SOAP-based representations,128 obtained through a local expansion of the Gaussian-smeared atomic density using orthonormal radial basis functions combined with spherical harmonics. These models can additionally be tabulated to enable molecular dynamics simulations that are two to three orders of magnitude faster, making large-scale simulations of multicomponent alloys computationally tractable. This behaviour reflects the fact that descriptor dimensionality controls both data requirements and sampling efficiency in compositionally complex systems. These studies demonstrate that ML interatomic potentials do not merely accelerate existing workflows but expose which atomic scale mechanisms control phase stability, strength, and ductility.
Interatomic potentials also enable thermodynamic recommendation beyond zero-Kelvin stability. Structure agnostic training combined with free energy integration allows direct prediction of temperature composition phase diagrams without assuming known crystal structures, producing phase boundaries consistent with experimental observations under processing relevant conditions.129 In contrast to generative or purely data-driven recommendation approaches, ML interatomic potentials operate through physically grounded energy landscapes, providing a bridge between first principles theory and experimentally meaningful phase behavior. Although distinct from high-throughput screening databases71 and end-to-end generative design models,117 interatomic potentials define a complementary paradigm in which discovery emerges from exhaustive and physically-constrained exploration rather than direct proposal of candidate materials.
A recurring shortcut in experimentally successful workflows is restriction of the design space prior to learning or screening. Rather than exploring unconstrained chemical space, many studies limit candidates to known prototypes, fixed stoichiometries, or narrow composition ranges, implicitly encoding chemical intuition into the workflow. This strategy reduces extrapolation risk48 and increases experimental conversion rates, but also biases discovery toward incremental extensions of known structure families. Such constraints are rarely framed as assumptions, yet they play a decisive role in reported success rates.
The most explicit realization of tightly constrained, validation-centered discovery workflows is the A-Lab autonomous platform,35 which integrates DFT-based stability screening, machine-learning-guided synthesis planning, robotic execution, and active learning within a closed experimental loop. Starting from targets predicted to lie on or near the convex hull (<10 meV per atom), the system autonomously proposed synthesis recipes, executed them using robotic powder processing and furnace operations, and evaluated products via automated X-ray diffraction analysis. Over 17 days of continuous operation, the platform performed 355 synthesis experiments and successfully realized 41 of 58 target compounds, corresponding to a reported success rate of 71% (Fig. 6). However, synthesis outcomes plotted against predicted decomposition energies show little systematic correlation between thermodynamic stability and experimental success. Rather than functioning as a reliable predictor of synthesizability, zero-Kelvin stability appears primarily as a coarse prioritization heuristic, with outcomes strongly mediated by precursor selection, reaction pathways, and kinetic accessibility.
![]() | ||
| Fig. 6 Experimental outcomes of A-Lab synthesis campaigns plotted against predicted decomposition energies. Each bar represents a target compound, coloured by successful realization, failure, or optimization via active learning, while overlaid markers indicate outcomes of individual synthesis recipes. The inset charts summarize the fractions of successful targets and successful recipes. The distribution demonstrates that experimental success is only weakly correlated with zero-temperature thermodynamic stability, highlighting the importance of kinetic factors and synthesis pathway design in autonomous materials discovery. Adapted with permission from ref. 35. Copyright 2023 Springer Nature. | ||
This disconnect reflects a broader assumption embedded in many ML-guided discovery workflows: zero-Kelvin thermodynamic stability can serve as a proxy for synthesizability. Convex-hull distance is therefore widely used to rank candidates, yet numerous experimentally realized materials are metastable, kinetically trapped, or stabilized only under specific processing conditions. Stability metrics thus capture only one dimension of experimental feasibility while neglecting reaction pathways, competing intermediates, and practical constraints of precursor chemistry. Recognizing this limitation, several recent studies81,82,130,131 incorporate auxiliary recommendation layers, including synthesizability classifiers, synthesis-aware ranking schemes, or heuristic filters derived from experimental data, to bridge the gap between thermodynamic prediction and laboratory realization. These approaches acknowledge that reliable experimental targeting requires integrating thermodynamic, kinetic, and procedural knowledge rather than relying on stability prediction alone.
Finally, validation itself is often selective rather than exhaustive. Experimental follow-up typically focuses on top ranked candidates, while lower ranked or ambiguous predictions remain untested, limiting the ability to assess false negative rates. Closed-loop workflows partially address this limitation by iteratively updating ranking policies based on experimental feedback, but they remain resource intensive and are still applied to narrowly defined systems. Taken together, these observations indicate that experimental validation in ML-guided discovery is inseparable from workflow design choices, shortcuts, and assumptions that must be made explicit when assessing reported success.
Establishing validation strategies and incorporating physical constraints addresses how models should be evaluated and guided. However, an equally critical challenge lies in defining which properties are most meaningful to predict and measure, and how these properties are connected to experimental conditions and materials realization.
:
1 ratio). 974 compounds composed a dataset of experimentally confirmed compounds to exist under ambient temperatures and pressures, crystallizing in 107 unique structure types. Although many of these structure types are rare, 706 adopt one of seven common structure types, with each having at least 30 representatives (Fig. 7a). For modelling, the data were represented in a 974 × 56 matrix of elemental descriptors, and only the 706 compounds belonging to these seven structure types were retained to ensure statistical reliability. Both PLS-DA and SVM techniques were used to train models on this dataset.
![]() | ||
| Fig. 7 (a) Structure types adopted by binary AB phases. (b) Fisher ratio scores for variables selected in the CR-FS procedure. (c) Predicted probability for CsCl-type structures using a machine-learning model based on SVM. Adapted with permission from ref. 101. Copyright 2016 American Chemical Society. | ||
Through a procedure of forward selection and backward elimination for iterative feature selection, our 56 initial descriptors were filtered down to 31, in an unbiased manner (highlighted in Fig. 7b). Model results are displayed in Table 3. The SVM model, in conjunction with our feature selection process, gave excellent performance for classifying the crystal structures of AB compounds. To validate the predictive results, RhCd was synthesized, which at the time of discovery was the first new binary AB compound to be discovered in over 15 years. The synthetic procedure was done by annealing a stoichiometric ratio of the elements at 800 °C after sintering and characterizing it through X-ray diffraction. Among the list of candidates with high prediction probability (Fig. 7c), the RhCd candidate (91.8% probability) was chosen based on the straightforward synthesis procedure and that underexplored systems contained precious metals.
| Model | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|---|---|---|---|
| PLS-DA | 96.5 | 66.0 | 77.1 |
| SVM | 94.2 | 92.7 | 93.2 |
The recommendation for the experimental validation in this study is based on the test set, where the prediction probability is the only factor driving the recommendation. The feature selection, even on the relatively small feature set, improves the model performance and enhances the explainability.
For example, among equiatomic ABC phases, polymorphism between the orthorhombic TiNiSi-type or the hexagonal ZrNiAl-type structures is prevalent and especially common for metal-rich phosphides M2P.135 Analysing 19 compositions that are experimentally reported in both structure types, the model correctly identifies with high confidence (>0.7) the low-temperature polymorph from its high-temperature form. Interestingly, the class probability plot (Fig. 8a), especially in the validation subset, has datapoints misclassified or with the probabilities close to the decision barrier. Typically, this indicates a lower model quality, but with the chemistry knowledge behind the datapoint, it reveals an interesting phenomenon. ML suggested that certain compositions cannot be clearly differentiated and lie in a region of confusion (confidence score ranging between 0.3 and 0.7) (Fig. 8b), suggesting both polymorphs may be observed in a single sample at varying synthetic conditions.
![]() | ||
| Fig. 8 (a) Predicted probability for TiNiSi-type (upper panel) and ZrNiAl-type (lower panel) structures based on an SVM model, with a dashed line marking the decision barrier. (b) Predicted probability for 19 compounds adopting both TiNiSi- and ZrNiAl-type structures. Adapted with permission from ref. 134. Copyright 2017 American Chemical Society. | ||
At first, these intermediate probability values, lying in this confused region, may seem uninformative or even indicative of poor model performance. In reality, they reflect cases where experimental reports show that closely related synthesis conditions can yield two different polymorphs. This is unknown to the software engineer who works outside of the traditional chemistry realm; thus, we think that it is of the utmost importance for software to be not only validated, but also curated within each domain. We cannot further stress that chemical intuition can only come from being part of the field. In such polymorphic systems, one phase is likely thermodynamically favored. While the other persists in a metastable form, its transformation is suppressed. Remarkably, the model appears to encode this subtle energetic competition, capturing behavior that is often difficult to resolve even with first-principles approaches. A striking example is TiFeP, which exhibits one of the most uncertain predictions, 60% likelihood for the TiNiSi-type structure and 40% for the ZrNiAl-type structure. Consistent with this ambiguity, the literature reports conflicting structural assignments. A reinvestigation confirms that TiFeP readily forms as a two-phase mixture, even after extended annealing, likely due to a kinetically hindered phase transformation resulting in the coexistence of both phases rather than simple aforementioned model confusion.
Of a similar flavor, recent work has extended the ability of actually predicting synthesizability from composition alone to the level of specific crystal structures, directly addressing the previous polymorph problem.136 By fine-tuning large language models (LLMs), which have thus far not been highlighted in this text, on text-based descriptions from CIFs, researchers framed synthesizability as a positive-unlabelled (PU) learning task using experimentally realized versus hypothetical structures. Models incorporating these explicitly structural descriptions significantly outperformed stoichiometry-only approaches and were competitive with, or better than, the bespoke GNNs. Even greater gains were achieved by using LLM-derived embeddings as structural representations for a dedicated PU classifier, suggesting that the text-based encodings here capture symmetry, coordination, and environmental information that traditional graph constructions may incompletely represent, particularly for the structurally complex systems where polymorphism is prevalent.137
The logic behind the LLM is quite simple and intuitive, but also interacts with the user's prompt by directly addressing why or why not a particular polymorph is synthesizable. Using GPT-4o138 and associated probabilities, returning “true” to a question can be traced back to a principle common in textbooks. For example, the LLM can respond to a query with “If a new compound exhibits uniform bond lengths, then it likely has high internal strain, indicating that the compound is unstable and unsynthesizable.” Other reasons such as thermodynamic stability alone account for a minority of predictions, reinforcing the experimental reality that energy-above-hull criteria are insufficient proxies for synthetic accessibility. Sensitivity analyses done further show that small perturbations to atomic positions dramatically alter predicted synthesizability, underscoring the delicate structural balance governing polymorph selection. In this way, intermediate probabilities and structural uncertainty, such as those observed in our TiNiSi/ZrNiAl case, may reflect genuine thermodynamic–kinetic competition rather than model confusion, highlighting how interpretable frameworks can augment chemical intuition in understanding and predicting polymorphic crystal structures.
Shifting back our focus to structure prediction of polymorphs, the ParetoCSP2 work did not treat polymorph prediction as a simple global-minimum search problem,139 but instead explicitly interrogated why conventional algorithms fail to recover experimentally-known competing phases. The authors demonstrated here that single-objective evolutionary searches tend to prematurely converge into one symmetry basin, which ends up suppressing structurally distinct but energetically-close polymorphs. By reformulating structure prediction as a multi-objective optimization problem, balancing energy with symmetry diversity, the method preserves multiple structural lineages throughout the search. This design makes the algorithm's behavior interpretable at the population level: one can directly monitor how different symmetry classes/space-group representations are maintained, suppressed, or amplified over generations, rather than observing a black-box collapse into a single dominant structure type.
The study done by the authors showed that recovery of experimentally-reported polymorphs correlates with sustained representation of distinct space groups during optimization, rather than strictly with lowest-energy ranking. In several benchmark systems containing multiple Materials Project71 polymorphs with identical cell sizes, higher-energy but experimentally realized structures only appeared when symmetry diversity constraints were enforced.
This ultimately provided some mechanistic insight into polymorph competition, particularly that structural accessibility emerges from the interplay between near-degenerate energy basins and search-path diversity, rather than absolute energetic ordering alone. Much like the region of confusion between TiNiSi- and ZrNiAl-type phases, ParetoCSP2 makes visible the existence of parallel low-energy funnels and demonstrates that algorithmic diversity control can reveal genuine thermodynamic–kinetic competition, offering a more interpretable picture of why multiple structure types coexist within a single chemical system.
Training data were sourced from the Materials Project (queried April 2018, energy above hull = 0), yielding 19
835 stable compounds mapped onto 14
099 RESs as positive data, against 71
301 UESs drawn from 85
400 total single, binary, and ternary combinations among 80 laboratory-accessible elements. The machine learning model was a permutation-invariant neural network in which descriptors for each of the three constituent elements were independently processed by a shared two-layer sub-network, then max-pooled into a single order-invariant representation before a final classification layer with sigmoid output. The permutation invariance and use of coordination-geometry-derived elemental descriptors, rather than learned embeddings, make the approach interpretable by design: the model captures chemically meaningful elemental features rather than arbitrary statistical correlations, and the predicted reactivity scores are visualized as two-dimensional heatmaps across the full periodic table for any target element. As illustrated in Fig. 9 for Al-containing ternary systems, these maps allow researchers to directly identify element set combinations with high predicted reactivity but no reported compounds, with known Materials Project and ICSD entries overlaid as reference points.
![]() | ||
| Fig. 9 Elemental reactivity map for Al-containing ternary systems predicted by Model [75%], visualized as a two-dimensional heatmap where deeper blue indicates higher predicted reactivity. Red markers denote element sets with known compounds reported in the Materials Project (stars) or ICSD (diamonds and triangles), while unmarked dark blue squares identify unreported but highly reactive element combinations as candidates for novel materials discovery. The Co–Al–Ge system, highlighted as an unexplored high-reactivity candidate, was subsequently confirmed through synthesis of two novel ternary compounds. Adapted with permission from ref. 140. Copyright 2025 American Chemical Society. | ||
Validation against an independent external dataset of 5852 ICSD entries absent from the Materials Project confirmed that predicted reactivity scores correlate with experimental synthesizability. Model [25%] recovered 91% of high-quality ICSD entries as reactive, while Model [75%] improved precision, reaching 24% true positive rate among all predicted reactive UESs compared to 15% for Model [25%], demonstrating a controllable precision-recall tradeoff governed by the negative data threshold. Experimental validation targeted the Co–Al–Ge ternary system, identified as a high-reactivity UES (score 0.987 under Model [75%]) with no prior Materials Project entry. Simple solid-state synthesis yielded two novel compounds, B20-type Co4Ge3.19Al0.81 and Ni2Al3-like Co2Al1.26Ge1.74, confirmed by single-crystal X-ray analysis, a result that would likely have been missed under composition- or structure-constrained search strategies.
![]() | ||
| Fig. 10 Application of unsupervised ML for novel structure prediction, followed by supervised ML confirmation, and experimental validation. Adapted with permission from ref. 112. Copyright 2025 American Chemical Society. | ||
In the unsupervised track, PCA was applied for dimensional reduction, followed by clustering (K-means, DBSCAN, hierarchical), using cluster separability and visualization (e.g., PCA projections and confidence ellipses) to guide parameter selection. This approach identified well-segregated clusters that could be interpreted post hoc as structure families; notably, the PLS-DA scatterplot correctly assigned the TbIr3 compound (CIF input obtained from Rietveld refinement) to the PuNi3-type structure cluster (Fig. 11), demonstrating the success of the recommendation engine and validity of the supervised ML approach. The recommendation engine was then used to propose chemically adjacent candidates for exploratory synthesis, leading to TbIr3 as the top-ranked extension of AIr3 (where A = Y, La–Nd, Sm, and Gd). Experimental validation via arc-melting and annealing (which was independently reproduced in two laboratories) followed by PXRD and Pawley/Rietveld refinement confirmed the presence of TbIr3 in the PuNi3-type structure, while also revealing that multiphase mixtures are difficult to avoid, consistent with kinetically suppressed transformations in this composition range as with the previous ternary problem. Finally, a supervised validation step using the same feature representation showed near-perfect agreement with the experimental assignment, with PLS-DA, SVM, and XGBoost achieving quite high accuracies (Table 4), supporting the robustness of the structure-type classification and the predictive value of combining unsupervised discovery with supervised confirmation. A compilation of related unsupervised works for a variety of intermetallic-related outcomes (i.e., structure clustering, design of materials) is provided below in Table 5. A particular value of the unsupervised models is its ability to provide counter-intuitive recommendations. In the case of TbIr3 prediction, suggesting a different structure from two recently reported polymorphs146 is an excellent case of the bias, which researchers might have against exploring well-studied domains.
| Model | Accuracy (%) |
|---|---|
| PLS-DA | 96.7 |
| SVM | 99.7 |
| XGBoost | 99.9 |
| Goal | Type | Ref. |
|---|---|---|
| Reveal separability of 7 intermetallic structure classes in latent space | Manifold learning (t-SNE) on learned structural embeddings | 147 |
| Identify materials families in Cu–S space, link clusters to structure and properties | Wasserstein distance + clustering | 148 |
| Determine crystal structures directly from powder XRD without prototype bias | Symmetry-constrained model search guided by pattern similarity | 149 |
| Decompose high-throughput XRD datasets into phase end-members | Non-negative Matrix Factorization (NMF) | 150 |
| Discover distinct atomic environments (bulk, defects, surfaces) without labels | Manifold learning + clustering on local descriptors | 151 |
| Explore vast superalloy design space to sample promising candidates | Clustering + similarity-based exploration | 152 |
| Organize Li-containing compounds via diffraction-derived representations | Unsupervised learning on digital XRD features | 153 |
| Identify structural patterns correlated with superconductivity | Descriptor-based clustering/representation learning (SOAP) | 154 |
![]() | ||
| Fig. 12 Recommendation engine workflow. Adapted with permission from ref. 113. Copyright 2025 American Chemical Society. | ||
In contrast to the earlier STEx implementation, which constructed a global PCA chemical space from the full elemental descriptor set, the present workflow first derives a site-resolved PLS-DA space from elements experimentally observed in each crystallographic position of the target structure type. The remaining elements of the periodic table are then projected into this latent space using the extracted variable loadings, enabling expansion of candidate substitutions while preserving structure-type-specific chemical constraints.
A key development here was the implementation of three complementary explainable recommendation modes: (i) an unrestricted search that identifies elements closest to any previously occupied site, (ii) a conservative mode restricted to elements already known to occupy that crystallographic site in related compounds, and (iii) a cluster-based method that limits candidates to the PLS-DA-defined chemical space of the target structure family. Types of recommendations are displayed in Fig. 13. The final candidate is selected using a weighted consensus score across all three methods. Applied to the RE10RuCd3 structure type, this strategy identified Gd10RuCd3 as the top-ranked unexplored composition, despite the complete absence of any prior reports in the Gd–Ru–Cd system in the most recent version of Pearson's Crystal Database. Subsequent synthesis and PXRD/SCXRD confirmed the structure, validating the predictive capability of the engine. This compound also displayed unusually low thermal conductivity.
![]() | ||
| Fig. 13 Types of recommendations: (a) unrestricted method, proximity to all elements, (b) conservative method, proximity to the elements observed in a particular site in a particular structure, (c) cluster method, proximity to elements within the cluster area, (d) combined score result plotted as a recommended compound. Adapted with permission from ref. 113. Copyright 2025 American Chemical Society. | ||
The updated iteration of CRAFT herein includes fixed-site substitution, alternative fixed-site combinations, and exploratory searches within the learned chemical manifold, with optional constraints such as charge balance, thus applied as post hoc filters removing chemically implausible candidates without altering the underlying model logic. This framework represents a generalizable, interpretable approach to chemical recommendation that augments intuition while remaining compatible with experimental feasibility and domain-imposed constraints.
Given the small dataset size, conventional algorithms such as support vector classifier and random forest were benchmarked but performed considerably more poorly than the sure independence screening and sparsifying operator (SISSO) method, which constructs low-dimensional descriptors by combining primary features through algebraic operators across up to 1010 candidate combinations. Classification was achieved in two stages: a coarse separation of open structures (clathrates, layers, channels) from denser 3D networks, followed by finer discrimination among the open classes. A key strength of the SISSO approach is its inherent interpretability – the resulting descriptors take the form of explicit mathematical expressions involving physically meaningful quantities such as differences in electronegativity or melting and boiling points, allowing the structure maps to be read and rationalized in chemical terms rather than treated as black-box outputs.
Experimental validation confirmed three of four predicted targets: RbAlSb2, CsAlSb2 (both adopting layered tetragonal structures in space group P42/nmc), and Rb2Al2Sb3 (monoclinic, Na2Al2Sb3-type, P21/c), while Cs2Al2Sb3 could not be synthesized despite multiple attempts at varying temperatures. All confirmed structures were layered, in agreement with the ML predictions. Reinvestigation of the related compound Cs2In2Sb3 additionally revealed a low thermal conductivity of 0.64 W m−1 K−1 and p-type semiconducting behavior with the Seebeck coefficient reaching 253 µV K−1 at 300 K, demonstrating how ML-guided structural targeting can simultaneously accelerate property discovery in underexplored compositional spaces.
The training dataset was generated entirely from MD simulations using an embedded atom method (EAM) force field on FCC systems of 4000 atoms, with PSO iteratively adjusting elemental concentrations of Cu, Cr, Co, and Fe within 5 to 35 at% each, with Ni as the balancing element. After 137 PSO optimization cycles, a total of 19
728 compositions comprising 59
184 individual structures were generated. Two separate surrogate models were developed on this data. A two-layer stacked ensemble machine learning (SEML) model, combining Bayesian Ridge regression, Stochastic Gradient Descent regression, and a Multilayer Perceptron in its first layer, with a second-layer MLP aggregating their outputs, was trained on elemental concentrations as input features to predict USFE, achieving R2 = 0.87 and MAE below 2.5 mJ m−2. For bulk modulus prediction, a one-dimensional convolutional neural network (1D CNN) was developed, taking as input a spatially ordered 4000-atom array encoding element identity and local neighbour relationships across 20 atomic layers, achieving R2 ≥ 0.98 and MSE ≤ 0.16 GPa2 on both Top10K and Top25K subsets. The structural encoding of the atom array was designed to preserve local chemical short-range order (CSRO), enabling the CNN to capture neighbourhood-level compositional effects rather than relying solely on mean-field concentration descriptors. These two surrogate models were then integrated with three independent optimization algorithms, namely PSO, a Genetic Algorithm (GA), and a reinforcement learning method using temporal difference (TD) learning with SARSA, each exploring the compositional design space over 200 iterations with 128 candidates per iteration.
Explainability was addressed through SHapley Additive exPlanations (SHAP) analysis applied separately to both models, providing post hoc interpretation of feature contributions. As shown in Fig. 14a, SHAP analysis of the SEML model revealed a clear ranking of elemental importance for USFE prediction, with Fe and Ni identified as the dominant positive contributors (average absolute SHAP values of 4.09 and 2.98 mJ m−2, respectively), consistent with their higher intrinsic USFE values of 485.91 and 409.29 mJ m−2 as calculated from the EAM potential for pure FCC structures. The Pearson correlation matrix in Fig. 14b further confirms these trends quantitatively, showing moderate positive correlations of Fe and Ni with USFE (0.55 and 0.44, respectively), while Cu, Cr, and Co displayed negative correlations of −0.21, −0.20, and −0.27, attributed to weaker metallic bonding and associated electron density changes at the stacking fault interface. For the CNN model, atom-wise SHAP values demonstrated that more than 99% of Cu and Cr atoms carried negative SHAP contributions to the bulk modulus, while over 70% of Fe, Ni, and Co atoms contributed positively. Critically, the CNN SHAP analysis further revealed that the sign of an element's contribution was not determined solely by its identity but also by its local chemical environment: Co and Ni atoms surrounded predominantly by same-species neighbours were more likely to show positive SHAP values, indicating that local clustering and CSRO directly modulate the effective mechanical contribution of individual atoms.
![]() | ||
| Fig. 14 SHAP analysis of the SEML model for USFE prediction. Panel (a) shows elemental feature importance ranked by the average absolute SHAP value, where red and blue indicate high and low elemental concentrations, respectively. Panel (b) presents the Pearson correlation matrix confirming that higher Fe and Ni concentrations positively correlate with USFE, while Cu, Cr, and Co show negative contributions. Adapted with permission from ref. 157. Copyright 2025 Springer Nature. | ||
Five candidate compositions were selected for experimental synthesis by arc melting, three derived from the PSO-guided MD training data and two from the ML and DL optimization outputs. All five produced single-phase face-centred cubic structures confirmed by powder X-ray diffraction. Measured hardness for all new MPEAs exceeded 3.0 GPa compared to 1.49 GPa for the equimolar CuCrCoFeNi reference, and measured Young's moduli ranged from 179.86 to 197.74 GPa, in qualitative agreement with MD predictions despite known discrepancies arising from strain rate differences and force field limitations. The composition Cu5.35Cr5.37Co29.55Fe31.21Ni28.52 achieved the highest experimental Young's modulus of 197.74 ± 6.03 GPa, consistent with the workflow's prediction that low Cu and Cr fractions favor enhanced stiffness.
| This journal is © The Royal Society of Chemistry 2026 |