Open Access Article
Vinamr Jain
a,
Zhilong Wang
a and
Fengqi You
*abc
aCollege of Engineering, Cornell University, Ithaca, New York 14853, USA. E-mail: fengqi.you@cornell.edu
bCornell University AI for Science Institute, Cornell University, Ithaca, New York 14853, USA
cCornell AI for Sustainability Initiative (CAISI), Cornell University, Ithaca, New York 14853, USA
First published on 15th November 2025
The development of solid-state electrolytes (SSEs) is critical for enabling safer, high-energy-density batteries. However, the discovery of new inorganic SSEs is hindered by vast chemical search spaces, complex multi-property requirements, and limited experimental data, especially for multivalent systems. This review presents the first systematic framework mapping five interconnected challenges in SSE discovery to emerging AI solutions, providing a strategic roadmap for practitioners. We comprehensively survey machine learning pipelines from data resources and feature engineering to classical models, deep learning architectures, and cutting-edge generative approaches. Key breakthroughs include: (1) machine learning interatomic potentials enabling microsecond-scale molecular dynamics simulations at near-DFT accuracy, revealing non-Arrhenius transport behavior and overturning established transport mechanisms; (2) advanced neural network architectures achieving unprecedented accuracy in ionic conductivity prediction across diverse chemical spaces, including transformer-based and graph neural network approaches; (3) generative models successfully proposing and experimentally validating novel SSE compositions through diffusion-based design frameworks; and (4) autonomous closed-loop discovery platforms integrating ML predictions with experimental synthesis, achieving order-of-magnitude efficiency gains over traditional approaches. Unlike previous reviews focused on Li-ion systems, we explicitly address the critical data gap for multivalent conductors (Mg2+, Ca2+, Zn2+, Al3+) and provide concrete strategies through transfer learning and active learning frameworks. We bridge conventional computational methods (DFT, molecular dynamics) with modern ML techniques, demonstrating hybrid workflows that overcome individual limitations. The review concludes with actionable recommendations for multi-objective optimization, explainable AI implementation, and physics-informed model development, establishing a comprehensive roadmap for the next generation of AI-accelerated solid-state battery materials discovery.
Wider impactThis review addresses the critical intersection of machine learning and solid-state electrolyte development, a field experiencing unprecedented growth with hundreds of publications emerging in recent years. Key developments discussed include the evolution from classical ML screening approaches to sophisticated deep learning architectures like graph neural networks, the emergence of ML interatomic potentials enabling large-scale dynamics simulations, and the transition toward generative models for de novo materials design. The field's significance extends beyond academic interest: solid-state electrolytes are essential for next-generation batteries that promise enhanced safety, energy density, and sustainability for electric vehicles and grid storage applications. The rapid pace of innovation has created both opportunities and challenges: while ML has accelerated SSE discovery timelines from decades to years, the proliferation of disparate approaches, limited data availability for non-lithium systems, and lack of standardized evaluation metrics have hindered systematic progress. This review's forward-looking perspective on autonomous discovery platforms, physics-informed generative models, and integrated experimental-computational workflows will shape the field's trajectory toward predictive materials design. By providing strategic directions for addressing current limitations, from developing universal descriptors to establishing closed-loop discovery systems, this work positions the materials science community to realize the transformative potential of AI-driven SSE innovation, ultimately accelerating sustainable energy storage technology development. |
All-solid-state electrolytes are being intensively explored as a next-generation solution to overcome the limitations of liquid electrolytes.10–12 By replacing the flammable liquid with a non-combustible solid, SSE-based batteries promise vastly improved safety and thermal stability.13 Moreover, the mechanical rigidity of inorganic SSEs can suppress dendrite propagation, potentially allowing the pairing of high-capacity lithium metal anodes with high-voltage cathodes for higher energy density cells.14 SSE materials fall into two broad classes: inorganic crystalline or glassy ceramics (oxide or sulfide based) and solid polymers (or polymer–ceramic hybrids).15,16 Inorganic SSEs such as oxide “garnet” Li7La3Zr2O12 and sulfide Li10GeP2S12 have achieved room-temperature Li+ conductivities on the order of 10−3–10−2 S cm−1,17–19 approaching those of liquid electrolytes. Polymer SSEs (e.g., PEO-based systems) offer flexibility and facile processing, but typically display lower ionic conductivities (∼10−8–10−6 S cm−1 at ambient temperature) and often require heating to 60–80 °C to reach optimal conduction.20–22 Each SSE family has its own challenges: ceramic electrolytes can suffer from grain-boundary resistance and brittle interfaces, whereas polymer electrolytes tend to have narrower electrochemical stability windows and lower transference numbers.23,24 Ongoing research is addressing these issues (e.g., novel glassy sulfide compositions and composite electrolytes) to realize the full safety and performance advantages of SSEs.25,26
Prior to the rise of ML, researchers relied on first-principles computations and atomistic methods have been widely used to predict phase stability and Li+ chemical potentials, and to calculate migration barriers via nudged elastic band (NEB) pathways for candidate electrolytes.27,28 These calculations yield valuable atomistic insights – for example, clarifying ion conduction mechanisms in fast-ion conductors and screening thermodynamically stable electrolyte/electrode simulations to guide SSE discovery and optimization.29–31 DFT calculations combined with other computational approaches have proven valuable for materials discovery.32,33 Molecular dynamics (MD) simulations (both classical and ab initio (AIMD)) are another important tool, enabling the computation of ionic diffusivities and conductivities in SSE frameworks.34 Indeed, AIMD simulations on prototypical superionic solids like Li10GeP2S12 and cubic Li7La3Zr2O12 have reproduced experimental ionic conductivities, confirming the capability of simulations to evaluate candidate SSE performance.35 However, DFT and MD are computationally intensive and scale poorly to the enormous compositional space of solid materials.36 High-throughput DFT screening is typically limited to evaluating hundreds of candidates at best, after preliminary filtering by simpler models.37 This bottleneck has motivated the emergence of ML approaches in electrolyte research, which can learn complex composition–structure–property relationships from data and make rapid property predictions.38 For instance, ML interatomic potentials trained on DFT data can act as surrogates to rapidly estimate ion migration barriers or perform MD simulations at a fraction of the cost.39,40 More broadly, regression and classification models have been trained to predict SSE ionic conductivity or stability from compositions and structures, enabling fast screening of thousands of unexplored chemistries.41,42 Early studies using data-driven models have already identified new Li-ion conductors that were missed by intuition or limited DFT searches,43,44 underscoring the promise of ML in accelerating materials discovery.
Despite this progress, several key research gaps and challenges remain, which form the motivation for this review. A fundamental hurdle is the limited availability of comprehensive datasets, particularly for solid conductors beyond well-studied Li+ systems, such as those for multivalent ions (Mg2+, Ca2+, Zn2+, Al3+).45,46 This scarcity impedes the ability of supervised ML models to generalize effectively.47,48 Relatedly, a significant concern is the limited transferability of models, as those trained on known compounds, may perform poorly when extrapolated to novel crystal structures or to different ion chemistries.47 Furthermore, designing practical materials requires a holistic, multi-objective approach. While most studies have focused on optimizing a single property like ionic conductivity,49–51 practical SSEs must simultaneously satisfy multiple criteria, including a wide electrochemical stability window and sufficient mechanical strength to suppress dendrite formation.
Another challenge is the “black-box” nature of many advanced ML models, which limits their utility when they cannot provide insights into the underlying factors governing material properties.52,53 Finally, there is a pressing need to move beyond the passive screening of predefined candidate materials toward proactive, generative design. This requires employing generative algorithms to propose novel electrolyte compositions and structures54–56 and developing closed-loop “predictive synthesis” pipelines, which iteratively couple ML predictions with DFT validation and experimental feedback to accelerate the discovery of new materials.57,58 Addressing these five interconnected challenges – data limitations, multi-criteria optimization, interpretability, model generalization, and generative design – is crucial for unlocking the next wave of breakthroughs in solid-state electrolyte development.
This review addresses several critical gaps that distinguish it from existing literature on ML-driven SSE discovery. While previous reviews have largely focused on cataloguing ML techniques applied to battery materials broadly or examining specific electrolyte systems59 within traditional experimental and computational frameworks, we provide the first systematic framework that maps specific challenges in SSE discovery to emerging AI solutions, offering a strategic roadmap for practitioners. Most existing reviews emphasize Li-ion systems exclusively, whereas we explicitly address the critical data scarcity for multivalent ion conductors and provide concrete strategies for extending ML approaches to these underexplored but technologically important systems. Importantly, we bridge the gap between traditional computational methods (DFT, MD, KMC) and modern ML techniques, demonstrating how hybrid workflows can overcome individual limitations while leveraging complementary strengths. Rather than merely surveying available techniques, we provide actionable guidance for data collection priorities, validation strategies, and implementation of explainable AI methods specifically tailored to solid-state electrolyte discovery. Finally, we emphasize emerging paradigms like autonomous discovery platforms and physics-informed machine learning that represent the next frontier in AI-accelerated materials discovery, going beyond conventional property prediction to enable true generative design of novel SSE materials.
We begin by examining the traditional computational methods that have historically guided SSE discovery, including NEB, molecular dynamics, and kinetic Monte Carlo simulations. We then detail the data resources and feature engineering strategies critical to enabling ML in this domain, followed by a survey of classical and deep learning models, including graph neural networks and ML-based interatomic potentials. We explore how these models have been applied to predict key properties (such as ionic conductivity, phase stability, and electrochemical compatibility), perform high-throughput screening to discover promising SSE candidates, and model ion diffusion mechanisms. Next, we address key challenges in ML-driven SSE discovery, including data scarcity, limited model transferability, and multi-objective optimization. We then discuss emerging solutions such as active and transfer learning, explainable AI, and physics-informed models. Finally, we highlight opportunities for autonomous discovery through generative design, ML interatomic potentials, and closed-loop pipelines integrating computation and experiments. Through this synthesis, we aim to clarify the evolving role of machine learning in SSE development and highlight strategic directions for the field's continued advancement.
The method has evolved from characterizing single materials to enabling high-throughput discovery. Early work mapped anisotropic Li-ion diffusion pathways in β-Li3PS4,61 while automated path search methods have efficiently evaluated activation energies.62 Automated high-throughput DFT workflows integrated with materials databases like the Materials Project, AFLOW, OQMD, and NIST-JARVIS have transformed materials discovery, allowing systematic exploration of thousands of potential SSE compositions with standardized protocols for convergence and property extraction.63 Recent integration of NEB into high-throughput workflows enables screening of entire material classes like antiperovskites.64 Modern implementations incorporate ML-guided path initialization using graph neural networks to generate superior initial guesses, dramatically improving convergence rates and reducing spurious local minima,65 alongside adaptive sampling techniques with Gaussian process regression for efficient high-dimensional configuration space exploration. NEB can be combined with different levels of theory. DFT-NEB provides high accuracy but is computationally expensive, while classical NEB using empirical potentials offers computational efficiency at the cost of accuracy dependent on force field quality. Critical implementation challenges are discussed in detail in the SI, Section S1.1.
Recent methodological advances have significantly enhanced KMC capabilities for materials simulations. Adaptive kinetic Monte Carlo (aKMC) methods such as the kinetic activation-relaxation technique (k-ART)66 and self-evolving atomistic kinetic Monte Carlo (SEAKMC)67 eliminate the need for pre-defined event catalogs by identifying transitions on-the-fly, enabling simulations of complex disordered systems. Accelerated techniques including the mean rate method and first passage time analysis have been developed to overcome kinetic trapping in superbasins,68 extending the accessible timescales for materials with complex energy landscapes. Applications include active learning integration with KMC to explore SEI formation reaction barriers69 and ab initio-based KMC investigating polyanion mixing effects on Na-ion transport in NASICON electrolytes.70 Implementation considerations are discussed in the SI, Section S1.2.
The primary limitation is that accuracy hinges entirely on force field quality and transferability—the “force field bottleneck”. Classical force fields do not explicitly treat electrons, precluding description of electronic phenomena like charge transfer or bond breaking/formation unless specialized reactive force fields are used. Applications include studying ion transport in polymer–argyrodite interfaces using newly developed OPLS-AA based force fields,71 analyzing how Li vacancies or interstitials in β-Li3PS4 enhance conductivity by facilitating three-dimensional diffusion pathways,72 and examining Li+ transport in dilithium ethylene dicarbonate (Li2EDC), a primary SEI component.73 Software packages and implementation considerations are provided in the SI, Sections S1.3–S1.5.
However, AIMD is extremely computationally expensive. This restricts simulations to small system sizes (typically a few hundred atoms) and very short physical timescales (picoseconds to a few nanoseconds). Consequently, to observe sufficient diffusion events for calculating transport properties, AIMD simulations of SSEs are often run at very high temperatures, with room-temperature properties extrapolated via the Arrhenius relation, which can be unreliable if diffusion mechanisms change, or phase transitions occur. The accuracy of AIMD also remains dependent on the approximations within the underlying DFT calculation (e.g., the exchange–correlation functional). Applications include investigating lithium-ion diffusion in garnet-type materials74 and studying chemical processes at the Li/Li6PS5Cl interface at different temperatures.75 Sampling considerations are discussed in the SI, Section S1.6.
Materials project (MP): the most prominent open-source database with DFT-calculated properties for hundreds of thousands of inorganic compounds.63 MP provides formation energies, band gaps, elastic tensors, and crystal structures—all accessible via the web interface and API. Its integration with pymatgen76 and matminer77 facilitates automated data retrieval and feature generation for ML workflows. MP is frequently used to identify Li-containing structures as initial SSE candidates.
Inorganic crystal structure database (ICSD): contains over 300
000 experimentally determined crystal structures,78 providing reliable crystallographic information that serves as a starting point for DFT calculations or structural descriptor generation.
AFLOW, OQMD, and NIST-JARVIS: these repositories offer additional DFT-calculated properties across millions of materials. AFLOW provides extensive electronic, thermodynamic, and mechanical properties via its REST API (AFLOWLIB).79 OQMD focuses on thermodynamic stability through formation energies relative to the convex hull.80 JARVIS offers comprehensive properties including elastic tensors, dielectric constants, and phonon properties for tens of thousands of materials.81
Other computational repositories: additional databases contribute to the materials data ecosystem. The computational materials repository (CMR) aggregates electronic structure data from various projects, including C2DB and QPOD.82 Materials cloud supports reproducible computational workflows and integrates with AiiDA for provenance tracking.83 The crystallography open database (COD) aggregates over 520
000 crystal structures of organic, inorganic, and metal-organic compounds.84 GNoME, developed by DeepMind, has used deep learning to predict the stability of over 2 million inorganic crystals.85 The Alexandria database provides DFT-calculated properties for millions of materials and is used to train large-scale ML models.86
• LiIon dataset: an expert-curated collection focusing on lithium-ion conductors, containing 820 entries from 214 literature sources.87 Each entry includes chemical composition, an assigned structural label (e.g., garnet, LISICON), and AC impedance-measured ionic conductivity at specific temperatures. With 403 unique compositions having near-room-temperature conductivity data, it has been instrumental in training ML classifiers (like CrabNet) to distinguish between high and low conductivity compositions.87
• OBELiX dataset: a more recent effort specifically designed for benchmarking ML models for SSE conductivity prediction. It comprises approximately 600 synthesized solid electrolyte materials with experimentally measured room-temperature ionic conductivity, along with composition, space group, lattice parameters, and, for about half the entries, full crystallographic information files (CIFs).88
• Literature-mined datasets: several studies have employed natural language processing (NLP) and text mining techniques to automatically extract relevant data (e.g., ionic conductivity values, synthesis parameters, structural types) directly from the vast body of scientific literature. While powerful for data aggregation, these approaches face challenges related to the heterogeneity of reported data, inconsistencies in experimental conditions, and the accuracy of automated extraction.89 An example includes the work by Shon and Min (2023), which extracted over 4000 conductivity measurements from nearly 1500 papers.90
The landscape of data resources reveals a complementary relationship between large-scale computational databases and smaller, targeted experimental datasets. Computational databases like MP, AFLOW, OQMD, and JARVIS provide the necessary breadth for initial high-throughput screening, enabling the filtering of millions of hypothetical compounds based on fundamental properties like thermodynamic stability (formation energy, energy above hull), electronic insulation (band gap), and potentially relevant structural or mechanical characteristics. However, accurately predicting ionic conductivity, the key performance metric for an SSE, directly from first principles is computationally demanding, often requiring expensive MD simulations. This is where curated experimental datasets like LiIon and OBELiX become critical. Although smaller in size, they contain the direct experimental measurements needed to train and validate ML models specifically designed to predict ionic conductivity. This often leads to a multi-stage ML workflow: initial screening using models trained on large computational datasets to identify stable and electronically suitable candidates, followed by conductivity prediction for the down-selected candidates using models trained on experimental data. Table S2 provides a summary of prominent datasets commonly used in machine learning studies for solid-state electrolyte research, including their primary data sources, key material properties covered, accessibility, and relevant references. The development of accurate and efficient machine learning interatomic potentials (MLIPs, discussed in Section 3.4) represents a significant effort to bridge this gap, aiming to enable faster calculation of dynamic properties like ionic conductivity for the vast number of candidates identified through computational screening.
• Compositional descriptors: these features are derived solely from the material's chemical formula (stoichiometry) and the intrinsic properties of its constituent elements. Examples include average atomic mass, mean electronegativity, variance of atomic radii, elemental fractions, and specific stoichiometric ratios. They are computationally inexpensive to generate but ignore the crucial influence of atomic arrangement and bonding. For instance, one study utilized a set of 145 “Chemical Descriptor” features based on stoichiometry and elemental properties.93 While simple, compositional descriptors alone can sometimes yield reasonable predictive performance, particularly for classification tasks or when combined with more sophisticated algorithms.
• Structural descriptors: these capture information about the geometric arrangement of atoms in the crystal lattice. They can range from simple parameters like lattice constants, cell volume, space group number, and packing fraction to more complex representations like radial distribution functions (RDFs), coordination numbers, bond angles, polyhedral volumes, local atomic environment motifs (e.g., using Voronoi analysis), and topological indices. Structural descriptors are vital as many key SSE properties, including ionic conductivity pathways and mechanical stability, are intimately linked to the crystal structure. Generating these features typically requires crystallographic information (e.g., from CIF files obtained via ICSD or MP) and specialized analysis tools. Examples include employing Voronoi tessellation features to improve graph neural networks,94 or using smooth overlap of atomic positions (SOAP) descriptors to represent local atomic environments.95
• Electronic descriptors: these features quantify aspects of the material's electronic structure, which governs electrical conductivity, electrochemical stability, and chemical bonding. Common examples include the electronic band gap (Eg), position of the valence and conduction band edges, density of states near the Fermi level, work function, electron affinity, ionization potential, and measures of bond ionicity or covalency. Electronic descriptors are crucial for screening potential SSEs, as ideal candidates must be good ionic conductors but poor electronic conductors (i.e., possess a wide band gap) and exhibit stability within the battery's operating voltage window. These descriptors are often derived from computationally intensive DFT calculations.
• Physicochemical/thermodynamic descriptors: this broad category includes various calculated or tabulated physical and chemical properties. Examples relevant to SSEs include formation energy, energy above the convex hull (Ehull) for thermodynamic stability assessment, density, ionic radii, melting point, and mechanical properties like bulk modulus (K) and shear modulus (G). These descriptors relate to a material's stability, processability, and mechanical robustness against issues like dendrite penetration. Formation energy and Ehull are standard outputs from DFT databases (MP, OQMD) used for initial stability screening, while mechanical moduli, predicted using ML or DFT, are critical for assessing dendrite suppression capabilities.
• Kinetic/dynamic descriptors: these features aim to capture aspects related to ion transport dynamics. Examples include activation energy barriers for ion migration (Eb or Ea), diffusion coefficients (D), attempt frequencies, and properties derived from phonon calculations (e.g., vibrational density of states, phonon band structure features). These descriptors are most directly related to ionic conductivity (σ), often following an Arrhenius-type relationship
. However, they are typically challenging and computationally expensive to obtain, requiring methods like NEB calculations for migration barriers or extensive MD simulations for diffusion coefficients. Recent work has shown that phonon-related features derived from DFT phonon calculations can be important predictors for ionic conductivity in ML models.96
The different categories of descriptors, along with their generation methods and significance, are summarized in Table 1.
| Descriptor category | Specific descriptor example | Information encoded | Generation method | Pros/cons |
|---|---|---|---|---|
| Note: CIF = crystallographic information file; DFT = density functional theory; MD = molecular dynamics; NEB = nudged elastic band; ML = machine learning. | ||||
| Compositional | Average electronegativity | Elemental chemical bonding tendency | Formula-based | Simple; ignores structure |
| Elemental fractions | Stoichiometry | Formula-based | Simple; basic composition info | |
| Structural | Volume per atom | Packing density, free volume | Structure analysis (CIF) | Relates to ion mobility/stiffness; Requires structure |
| Space group number | Crystal symmetry | Structure analysis (CIF) | Captures overall symmetry; coarse descriptor | |
| Radial distribution function (RDF) | Average local atomic density around a central atom | Structure analysis (CIF) | Detailed local structure; computationally more intensive | |
| Coordination number | Number of nearest neighbours | Structure analysis (CIF) | Local bonding environment: definition can vary | |
| Electronic structure | Band gap (Eg) | Energy required to excite an electron | DFT | Key for electronic conductivity; Computationally expensive |
| Formation energy | Thermodynamic stability relative to elemental phases | DFT | Fundamental stability metric; requires calculation | |
| Energy above hull (Ehull) | Thermodynamic stability relative to competing phases | DFT | Better stability indicator than formation energy; requires phase diagram data | |
| Physicochemical | Ionic radii | Effective size of ions | Tabulated/formula | Relates to packing and channel size; simple approximation |
| Shear/bulk modulus (G, K) | Resistance to shear/volume deformation | DFT/ML prediction | Key for mechanical stability (dendrites); requires calculation/prediction | |
| Kinetic/dynamic | Migration barrier (Ea, Eb) | Energy barrier for ion hopping | DFT (NEB)/MD | Directly relates to conductivity; computationally very expensive |
| Phonon properties | Lattice vibrational characteristics | DFT (phonon Calc.) | Relates to ion dynamics/stability; computationally expensive | |
• Regression: used to predict continuous target variables.
• Algorithms: simple linear regression, polynomial regression, kernel ridge regression (KRR), support vector regression (SVR), Gaussian process regression (GPR).
• Applications: predicting ionic conductivity (log
σ), activation energies, elastic moduli (K, G) for mechanical stability assessment, and formation energies. For example, Ahmad et al. used gradient boosting regressor (GBR) and KRR, trained on structural features, to predict shear and bulk moduli for over 12
000 inorganic solids in a screening study for dendrite suppression.98 Zhao et al. used GPR-based Bayesian optimization to guide the experimental synthesis of LATP electrolytes towards optimal ionic conductivity.99
• Classification: used to assign materials to discrete categories.
• Algorithms: logistic regression (LR), naive bayes (NB), support vector machines (SVM), decision trees (DT).
• Applications: Xu et al. (2020) used logistic regression to classify SICON compounds as poor or good superionic conductors based on elemental descriptors.47 Chen et al. (2021) employed support vector machines to analyze relationships between manufacturing conditions and solid-state electrolyte film performance for evaluation and optimization.100 Adhyatma et al. (2022) applied a tree-based LightGBM model to classify doped LLZO compounds by their ionic conductivity levels (high or low).101
• Ensemble methods: these techniques combine predictions from multiple individual models (base learners) to improve overall performance and robustness, and reduce overfitting. They often achieve state-of-the-art results on tabular data.
• Algorithms: random forest (RF), gradient boosting machines (GBM, including variants like XGBoost and LightGBM).
• Applications: RF and GB variants are frequently employed for both regression (predicting conductivity, formation energy) and classification (high/low conductivity, stability) in SSE research. For instance, Pereznieto et al. (2023) utilized a random forest algorithm to analyze experimental data and discover new potential Na-ion solid electrolytes exhibiting high ionic conductivity.102 Kim et al. (2023) implemented an ensemble model of gradient boosting algorithms to classify over 3500 NASICON structures, successfully identifying promising Na superionic conductor candidates with high accuracy.103 Tang et al. (2024) applied an XGBoost algorithm to predict key properties such as band structure and stability, which enabled the screening and identification of 194 ideal solid-state electrolyte candidates from over 6000 structures.104 Zhang et al. (2024) developed random forest models alongside neural networks to predict ionic conductivity in NASICON materials and to identify influential factors, highlighting the role of Na stoichiometric count.105
• Clustering: unsupervised learning algorithms group similar data points together without relying on predefined labels.
• Algorithms: k-means, agglomerative clustering, hierarchical density-based spatial clustering of applications with noise (HDBSCAN).
• Applications: Park et al. (2024) used HDBSCAN to cluster over 12
000 Na-containing materials based on structural properties, identifying 12 groups and revealing shared characteristics in high-conductivity clusters.106 Laskowski et al. (2023) applied agglomerative clustering to ∼26
000 Li-containing structures to identify promising superionic conductor candidates for further screening.95 Gallo-Bueno et al. (2022) used unsupervised outlier detection models to automatically classify computed Li-argyrodite crystal structures based on their structural distortion.107
The successful application of classical ML algorithms is heavily dependent on the process of “feature engineering” – the careful selection, transformation, and combination of descriptors to best represent the underlying material physics relevant to the target property. The frequent high performance reported for ensemble methods like random forest and gradient boosting variants (XGBoost, LightGBM)108–111 underscores the difficulty in capturing the complex, often non-linear, structure-property relationships in SSEs using single, simpler models acting on these hand-crafted features. Ensemble methods offer robustness by averaging out errors from individual base learners (like decision trees) and implicitly handling feature interactions, making them well-suited to the high-dimensional and potentially noisy descriptor spaces common in materials informatics. However, their complexity can sometimes make direct physical interpretation of the learned relationships challenging compared to simpler models like linear regression.
Despite these interpretability challenges, classical ensemble methods remain preferable in scenarios with limited training data where deep learning models would overfit, or when transparent decision-making is critical for materials design insights. For instance, Decision tree models can readily identify feature importance rankings,106 while XGBoost provides built-in interpretability tools that can reveal which structural descriptors most strongly influence ionic conductivity predictions.112–114 These advantages make classical approaches particularly valuable in early-stage SSE discovery when datasets are small or when researchers need to understand and communicate the physical basis underlying model predictions to experimental collaborators. Unsupervised clustering techniques, such as HDBSCAN, provide a valuable alternative or complementary approach.106 By grouping materials based on similarities in their descriptor vectors (often structural features derived from large computational databases), clustering can reveal inherent patterns and identify promising material families even when labeled target data (like experimental conductivity) is sparse. This capability allows researchers to leverage the vastness of computational datasets to guide exploration before focusing on more data-intensive supervised prediction tasks. This reliance on feature engineering and the success of complex ensembles sets the stage for deep learning approaches (Section 3.3), which aim to automate the feature learning process itself.
The simplest deep learning architecture, feedforward neural networks (FNNs) or multi-layer perceptrons (MLPs), consists of an input layer, one or more hidden layers, and an output layer, processing information in one direction. They operate on pre-defined descriptors similar to classical algorithms (Fig. 2a) and have been used as components within ensemble models, baseline comparisons, or for property prediction based on manually selected features in SSE research.88,105,115
Graph neural networks (GNNs) represent a more sophisticated approach, naturally operating on graph representations of materials where atoms are nodes and bonds or interatomic proximity define edges. This allows GNNs to learn representations that explicitly incorporate atomic connectivity and local chemical environments, automatically identifying features relevant to predicting material properties. Capturing crystal structure nuances, such as periodicity and 3D geometry (SE(3) invariance/equivariance), is crucial for effective GNN design. Crystal graph convolutional neural network (CGCNN) represents crystals as graphs and uses convolutional layers to aggregate information from neighboring atoms and bonds to learn atom-level features, which are then pooled to predict material properties (Fig. 2b). It has been applied to predict thermodynamic stability and mechanical properties of SSEs.116,117 Improved versions like iCGCNN incorporate Voronoi tessellation information and explicit many-body interactions to enhance performance.118 Materials graph network (MEGNet) extends the graph network concept by including global state variables (like temperature or pressure) alongside atomic (node), bond (edge), and global features, allowing for more versatile property predictions (Fig. 2b). MEGNet and related architectures like M3GNet119 have been trained on large datasets (e.g., Materials Project) for broad applicability in materials property prediction and can be applied to predict SSE stability or mechanical properties.120 SchNet employs continuous-filter convolutional layers to model quantum interactions in atomistic systems without using explicit graph representations, and has been used to predict formation energies of bulk crystals and potential energy surfaces.121 The field continues to evolve rapidly, with newer architectures like ALIGNN (atomistic line graph neural network),122 k-NAGCN (k-nearest atom graph neural network),123 and transformer-based models like CrystalFramer (which introduces dynamic, attention-based coordinate frames)124 continuously advancing accuracy and representational power for crystal structures.
Distinct from structure-based approaches, some deep learning models prioritize elemental composition, offering advantages when structural information is unavailable, computationally expensive to obtain, or for rapid initial screening across vast chemical spaces. ElemNet learns material properties directly from elemental compositions represented as fractional counts, bypassing structural information for rapid composition-based screening.125 CrabNet, a transformer-based model using attention mechanisms, operates primarily on compositional data but implicitly learns interactions between elements126 (Fig. 2c). It demonstrated success when trained on the LiIon dataset for classifying compositions by their likelihood of exhibiting high lithium-ion conductivity.87 More broadly, transformer architectures—inspired by their success in natural language processing and relying heavily on self-attention mechanisms—can capture long-range interactions within crystal graphs or learn complex relationships between constituent elements, as seen in CrabNet126 and CrystalFramer.124 Transformer architectures are also being used to develop powerful interatomic potentials like GPTFF.127
While most ML models predict properties of given materials (forward problem), generative models solve the inverse problem: generating novel material structures likely to possess desired properties. Techniques like generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models are being explored for materials discovery.55,128 These models learn the underlying distribution of known stable materials and can sample this distribution or be conditioned to generate new candidates meeting specific criteria (e.g., high stability, target band gap, specific crystal structure). MatterGen, a diffusion model operating on 3D crystal geometry, has demonstrated the ability to generate novel, stable materials with target properties by learning from large databases like MP and Alexandria.56 Such approaches hold significant promise for generating entirely new SSE candidates beyond modifications of known structures. Other generative approaches like SHAFT utilize hierarchical generation based on symmetry constraints.129
This capability is particularly transformative for SSE research. Simulating ion transport dynamics – the diffusion pathways, diffusion coefficients (D), activation energies (Ea), and ultimately ionic conductivity (σ) – requires tracking atomic motion over long timescales (nanoseconds or more) and large system sizes (thousands of atoms) to capture statistically relevant events and collective motion. Ion transport in SSEs involves rare events such as defect formation, migration, and collective rearrangements that occur over vastly different timescales: while individual atomic hops happen on picosecond timescales, macroscopic diffusion processes and phase transformations relevant to battery operation occur over seconds to minutes. Such simulations are often computationally prohibitive using traditional AIMD. MLIPs overcome this limitation, enabling routine MLMD simulations that provide direct insights into the mechanisms governing ionic conductivity in complex SSE materials.
Several MLIP frameworks have been applied to study SSEs:
• Gaussian process regression and sparse GPR (SGPR) approaches: traditional GPR methods provide a Bayesian framework for learning interatomic potentials with built-in uncertainty quantification, but their O(n3) computational scaling with dataset size becomes prohibitive for large training sets. SGPR addresses this limitation through low-rank approximations using reduced “inducing sets” of representative local environments, achieving computational scaling comparable to linear methods while retaining the probabilistic advantages of GPR.130 SGPR has been successfully applied to survey Li diffusivity across hundreds of ternary crystals and create transferable universal potentials for complex electrolytes like Li10GeP2S12.131,132
• Gaussian approximation potential (GAP): based on Gaussian process regression. A near-universal GAP was developed for the Li–P–S (LPS) material class, enabling studies of conductivity in both crystalline (e.g., Li3PS4, Li7P3S11) and glassy phases and revealing the importance of anion dynamics.133
• Deep potential molecular dynamics (DeePMD/DeePMD-kit): a deep neural network-based potential that has seen wide application.134 It has been used to model Li diffusion in amorphous Li3PO4,135 superionic conductors like Li10GeP2S12 (LGPS) and Nb-doped garnets, and importantly, to perform microsecond-long simulations revealing the lack of a significant “paddle-wheel” effect from polyanion rotations on Li diffusion in crystalline Li7P3S11 and Li2B12H12 at room temperature.136
• Crystal Hamiltonian graph network (CHGNet): a GNN-based universal MLIP pre-trained on the extensive Materials Project trajectory dataset, uniquely incorporating electronic charge and magnetic moment information.106 It has been demonstrated for charge-informed MD simulations of Li intercalation (LixMnO2) and Li diffusion in garnet SSEs.137
• M3GNet (materials 3-body graph network): another GNN-based universal potential trained on the Materials Project database, designed for broad applicability in structural relaxation and dynamics simulations.119
• GPTFF (graph-based pre-trained transformer force field): a recent transformer-based force field trained on a massive dataset (billions of force components), aiming for high accuracy and generalizability across diverse inorganic systems.127
MLMD simulations driven by these potentials have provided crucial insights, such as identifying non-Arrhenius diffusion behavior in LGPS,135 elucidating specific diffusion pathways,137 and quantifying the impact of structural features like defects or anion dynamics on conductivity.133 The significant speed-up factors highlight the potential of MLIPs to dramatically accelerate the computational assessment of ionic transport.138
The progression from classical ML to deep learning marks a significant evolution in the computational toolkit for SSE discovery. GNNs, in particular, represent a paradigm shift away from manual feature engineering towards automated learning of structure–property relationships directly from the atomic graph representation. This allows models to potentially uncover more complex and subtle correlations than might be captured by human-designed descriptors. However, these advances come with important practical considerations. GNN architectures like CGCNN and MEGNet require high-quality crystal structure files (CIFs) with precise atomic positions as inputs, as they construct graph representations directly from atomic arrangements and bonding information.117,120 The incorporation of both atomic and bond-level descriptors introduces numerous hyperparameters, necessitating larger training datasets (typically >103 samples) and substantial computational resources compared to classical ML approaches that rely on pre-computed scalar descriptors.139 In contrast, SGPR-based approaches can achieve comparable accuracy with smaller training datasets due to their efficient use of training data and adaptive sampling strategies, making them particularly suitable for data-scarce regimes where generating extensive DFT training sets is computationally expensive.130,140
Perhaps even more impactful is the development and application of MLIPs. While classical ML and standard GNNs often focus on predicting static properties (stability, band gap, moduli) or rely on computationally expensive methods (AIMD, NEB) to infer dynamics, MLIPs provide a computationally tractable route to directly simulate the crucial dynamic processes governing ionic conductivity. This enables the field to move beyond predicting prerequisites for good conductivity towards simulating and understanding the transport phenomenon itself over timescales reaching microseconds—a significant computational achievement.141 However, MLIPs require careful validation to ensure transferability across different thermodynamic conditions and structural motifs, as their accuracy is fundamentally limited by the quality and coverage of the underlying DFT training set. Additionally, the computational overhead of generating sufficient training data for MLIPs can be substantial, particularly for complex multi-component systems. Despite these advances, current MLMD simulations still remain far from capturing the experimentally relevant timescales (seconds to minutes) over which macroscopic ionic transport and device-relevant processes occur, and bridging to true experimental scales may require hybrid approaches combining MLMD with adaptive KMC methods.
Models trained predominantly on computational data face inherent challenges when predicting experimentally observed ionic conductivities due to systematic discrepancies between DFT calculations and experimental measurements. Effective validation strategies require testing against independent experimental datasets rather than computational holdouts, implementing cross-validation with available experimental data, and developing calibration methods that account for temperature-dependent Arrhenius behavior and experimental measurement uncertainties.142 For SGPR-based approaches, the inherent uncertainty quantification provides additional validation capabilities by identifying regions where model predictions may be unreliable, enabling more robust assessment of model confidence and guiding iterative improvement through active learning protocols.140 Furthermore, hybrid training approaches that incorporate both computational and experimental data during model development can significantly improve predictive accuracy for experimental properties. As computational materials discovery matures, adopting rigorous experimental validation protocols will be critical for establishing ML models as reliable tools for guiding experimental synthesis efforts. Generative models represent a further step, shifting the focus from predicting properties of existing or hypothetical materials to designing entirely new structures optimized for target performance.
Furthermore, the emergence of large-scale, pre-trained models signifies a trend towards developing more universal and transferable tools in materials informatics. Models like MEGNet, M3GNet, CHGNet, and GPTFF, trained on vast and diverse datasets such as the Materials Project calculation database, encapsulate a broad understanding of chemical bonding and structural stability across the periodic table. This pre-training allows these foundational models to be potentially fine-tuned for specific downstream tasks, such as predicting properties within a particular class of SSEs, using smaller, task-specific datasets. This strategy leverages the massive amounts of existing computational data to build general knowledge, which can then accelerate research on specific material systems by reducing the burden of generating extensive training data for every new problem. Nevertheless, practitioners should be aware that even pre-trained universal models may require domain-specific fine-tuning and validation, particularly when applied to novel chemistries or extreme conditions not well-represented in the original training data. The success of these approaches ultimately depends on careful consideration of data quality, model selection criteria, and rigorous benchmarking against experimental observations. This approach promises to significantly enhance the efficiency of ML-driven materials discovery pipelines.
The foundational work by Sendek et al. (2017) established the viability of ML-driven conductivity screening through a logistic regression classifier trained on 40 lithium-containing compounds.143 Despite the limited training set, their model effectively distinguished fast from slow Li-ion conductors using atomistic descriptors including Li–Li coordination numbers, sublattice bond ionicity, and anion coordination environments. The practical validation of this approach emerged when high-throughput screening of 12
000 Materials Project compounds identified 21 fast-conductor candidates, with subsequent DFT-MD simulations confirming superionic behavior in several materials, notably Li3InCl6, which achieved experimental verification.143,144 This early success demonstrated that even simple ML models, when coupled with physically meaningful features, could effectively navigate vast chemical spaces.
Building on these classification successes, recent efforts have focused on regression-based conductivity prediction with enhanced accuracy. The comparative analysis by Mishra et al. (2023) systematically evaluated eight predictor models including random forest regressor, support vector machine, and shallow neural networks using activation energy, operating temperature, and lattice parameters as features.110 Their findings highlighted the superior robustness of ensemble methods like random forest, while demonstrating that model stacking prevents overfitting, a critical insight for conductivity prediction where data scarcity remains a persistent challenge.
The transition toward more sophisticated approaches is exemplified by studies targeting specific electrolyte chemistries with optimized algorithms and novel descriptors. Jaafreh et al. (2024) developed a targeted framework for Mg-ion electrolytes by leveraging phonon density of states (PhDOS) data to calculate “total phonon band center” as a conductivity proxy.145 Their systematic comparison of Extra Random Trees, Gradient Boosting, and Extreme Gradient Boosting algorithms revealed that extra random trees achieved superior performance (R2 = 0.964), enabling predictions across ∼9000 Mg compounds. The chemical insights derived from this model, particularly the identification of Mg–Se systems as exhibiting the lowest median band centers (27.5 meV) compared to Mg–S (40.5 meV) and Mg–O (55.5 meV), demonstrate how ML can simultaneously accelerate screening and provide mechanistic understanding.145
Addressing the critical data gap for multivalent systems, Dong et al. developed a generalizable ML framework specifically designed for screening Na, Mg, and Al garnet electrolytes.146 Utilizing carefully designed chemical descriptors, their XGBoost models achieved 94% accuracy for thermal stability and 89% for band gap prediction across 43
732 compounds. The framework identified 1764 compounds meeting both thermal stability and electronic criteria, which were further filtered to yield 44 economically viable candidates with high performance potential. Interpretability analysis revealed that mean electronegativity is the most critical factor for thermal stability, while atomic radius range governs band gap properties, providing actionable design principles for multivalent conductor development.
Kharbouch et al. (2024) achieved exceptional accuracy for ionic conductivity prediction (R2 = 0.85) for LLZO-type garnets through meticulous data curation and hyperparameter optimization using CatBoostRegressor with Optuna framework tuning.147 Their emphasis on rigorous preprocessing, including stoichiometric verification and KNN imputation, underscores the critical importance of data quality in achieving reliable conductivity predictions.
Recent developments have integrated pre-trained graph neural network potentials to generate physics-informed descriptors. Maevskiy et al. (2025) employed M3GNet to analyze potential energy surfaces under frozen framework approximation, deriving heuristic descriptors correlated with lithium mobility.148 This approach achieved efficiency gains of approximately 50× faster than MLIP-driven MD and >3000× faster than AIMD, with eight out of ten highest-ranked materials confirmed as superionic conductors through first-principles calculations.148 The significance of this work lies in its demonstration of how powerful, pre-trained “foundation” models can be adapted to generate specialized, physically meaningful features for predicting properties like ionic conductivity, enabling rapid and reliable large-scale screening.
Models trained predominantly on computational data face inherent challenges when predicting experimentally observed ionic conductivities due to systematic discrepancies between DFT calculations and experimental measurements. Effective validation strategies require testing against independent experimental datasets rather than computational holdouts, implementing cross-validation with available experimental data, and developing calibration methods that account for temperature-dependent Arrhenius behavior and experimental measurement uncertainties.142 Furthermore, hybrid training approaches that incorporate both computational and experimental data during model development can significantly improve predictive accuracy for experimental properties.149 As computational materials discovery matures, adopting rigorous experimental validation protocols will be critical for establishing ML models as reliable tools for guiding experimental synthesis efforts.
The critical importance of accurate structural sampling for stability predictions is demonstrated by Ataya et al., who revealed that conventional Coulomb methods fail to identify the most stable, low-energy LLTO configurations after DFT geometry relaxation.150 This structural misrepresentation led to overestimated electrochemical stability windows (3.1 V versus the correct 2.5 V), with prediction errors reaching 0.67 eV. To address this sampling challenge, the authors developed a SOAP-KRR machine learning model trained on only 40 DFT-relaxed structures that accurately predicts energy rankings, providing a computationally efficient alternative for sampling disordered materials.150
Complementing these structural considerations, comprehensive screening approaches have emerged that integrate stability assessments within broader materials discovery pipelines. Chen et al. (2025) developed a hierarchical screening strategy starting with 20
717 Li-containing compounds from the Materials Project database.51 Their multi-stage process applied thermodynamic stability and electronic band gap pre-screening, followed by ML classification and regression models trained on 468 samples to identify high-conductivity candidates. After electrochemical stability window assessment and AIMD validation, this approach identified three promising candidates (Li3BiS3, Li5BiS4, and Li10ZnP4S16) with high room-temperature ionic conductivities, low activation energies, and favorable interfacial compatibility with common cathodes.51
The relationship between composition, structure, and electrochemical performance has been further elucidated through targeted studies of specific electrolyte families. Kireeva et al. investigated garnet-structured solid electrolytes by combining experimental data analysis with machine learning, identifying an optimal lattice constant range of 12.950–12.965 Å for maximum ionic conductivity in LLZO-type garnets.151 Their quantitative regression models using SVM, LSTM, GP, and XGBoost algorithms revealed that Li and La content, atomic scattering factors at the C site, and Shannon ionic radii of dopants were the most influential parameters affecting ionic conductivity, providing quantitative guidance for compositional optimization.151
Early applications of graph neural networks for mechanical property prediction established the feasibility of high-throughput screening approaches. Ahmad et al. employed a CGCNN trained on 2041 crystal structures with DFT-calculated elastic moduli to predict mechanical properties for over 12
000 inorganic solids.98 These ML-predicted moduli were then integrated with the Monroe-Newman stability parameter (χ) framework to assess dendrite initiation propensity at Li metal/SSE interfaces, identifying over 20 mechanically anisotropic interfaces involving six solid electrolytes predicted to suppress dendrite growth.98
The challenge of limited training data has been systematically addressed through active learning strategies that optimize data acquisition. Choi et al. trained a LightGBM model on 14
238 elasticity structures, initially achieving modest performance (R2 = 0.633 for shear modulus prediction).152 However, their active learning approach, which iteratively added materials with high prediction uncertainty to the training set, improved the R2 score to 0.802 with only 1600 strategic additions compared to 2800 required for random selection.152 This efficiency gain highlights the critical importance of intelligent data acquisition strategies, particularly given the computational expense of DFT elasticity calculations.
Building on these methodological advances, comprehensive screening workflows have emerged that integrate mechanical property prediction with other critical SSE characteristics. Sun et al. developed a two-stage ML workflow starting with LGBM-based mechanical property screening of 5329 LLZO-derived candidates, followed by superionic conductor classification and AIMD validation.50 This hierarchical approach successfully identified 10 new tetragonal-phase materials combining superior mechanical properties with high ionic conductivity.50
The interpretability of mechanical property predictions has been enhanced through feature analysis techniques that provide physical insight into structure-property relationships. Wang et al. developed an optimized LGBM model achieving R2 ≈ 0.86–0.87 for both shear and bulk modulus prediction using 8920 Materials Project samples.153 Their integration of SHAP analysis revealed that volume per atom and valence band maximum are critical predictors, while extrapolation experiments to datasets containing elements (Mg, Al, K, Ni) absent from training demonstrated that model transferability to new chemical spaces can be significantly improved with strategic addition of diverse samples.153
The scale and sophistication of modern HTVS campaigns are exemplified by ultra-large screening efforts that combine multiple ML models in hierarchical filtering approaches. Chen et al. (2024) demonstrated this approach by screening over 32 million candidates for solid-state electrolytes.154 Structure candidates generated via iso-valent substitutions were reduced to ∼589
000 stable materials using ML potentials (M3GNet) for thermodynamic phase stability assessment. Subsequent funnel-based screening applied ML models for band gap (>3 eV) and electrochemical stability filters, followed by higher-accuracy DFT calculations, yielding 18 final candidates with new compositions. The top candidates, the NaxLi3−xYCl6 series, were synthesized and experimentally validated, confirming both structure and conductivity predictions.154
Complementing these massive screening approaches, targeted studies of specific material families have employed sophisticated multi-property optimization strategies. Lee et al. (2025) computationally screened 4375 hypothetical Na-based argyrodites using DFT calculations to evaluate energy above hull, formation energy, band gap, and electrochemical stability window.155 Their 4-dimensional Pareto sorting technique narrowed the field to 15 top candidates, with AIMD simulations ultimately identifying five promising virtual compositions, including Na6SiS4Cl2 and Na7.75SiS5.75Cl0.25.155 This approach demonstrates how multi-objective optimization can efficiently navigate complex property trade-offs in materials design. Similarly employing multi-dimensional optimization, Lee et al. (2024) combined genetic algorithms with Bayesian optimization using GPR surrogate models to screen 18
133 hypothetical antiperovskite electrolytes. Their active learning framework reduced the computational burden to just 144 strategically selected DFT calculations while constructing a 4-dimensional Pareto frontier for thermodynamic stability, band gap, electrochemical window, and ionic conductivity, ultimately identifying 22 promising candidates with seven exhibiting superior room-temperature conductivity (>4 mS cm−1).156
The integration of experimental insights with computational screening has enabled more targeted materials design strategies. Sewak et al. trained a logistic regression model on 170 experimental NASICON materials, using PCA to identify 9 key features governing ionic conductivity.157 The model revealed that low dopant electronegativity and increased Li occupancy at M2 sites are critical for high conductivity, insights that guided dopant selection for the LiGe2(PO4)3 system. Bond valence sum energy calculations further screened dopants by migration barrier estimation, leading to the design of Li2Mg0.5Ge1.5(PO4)3 with a DFT-validated migration barrier of 0.261 eV.157
Advanced ML architectures have been developed specifically for ionic conductivity screening, leveraging physics-informed descriptors to enhance prediction accuracy. Xie et al. performed high-throughput screening of nearly 50
000 Li-containing compounds using bond-valence kinetic Monte Carlo simulations, identifying 329 materials meeting stability and conductivity thresholds.158 Their graph convolutional network, trained to predict conductivity directly from bond valence energy landscapes, outperformed models learning from atomic structure alone and accelerated screening of 979 additional candidates generated via isovalent substitution, identifying 239 potential superionic conductors.158
Specialized neural network architectures have also emerged for targeted chemical space exploration. Wan et al. (2024) developed DopNetFC, which outperformed conventional ML approaches including random forest and GBDT for screening atom substitution schemes.159 Applied to over 2208 potential substitutions in Li10GeP2S12, the most promising ML-identified candidates were validated through multi-step DFT calculations assessing thermodynamic, electronic, and mechanical stability.159 This approach demonstrates the effectiveness of task-specific neural architectures for exploring well-defined chemical modification spaces.
Multivalent conductor screening has been advanced through comprehensive ML platforms addressing critical data gaps beyond Li-ion systems. Wang et al. developed AI-IMAE based on CGCNN, a platform providing real-time activation energy predictions across nine ionic species (Li+, Na+, Mg2+, Zn2+, Al3+, Ag+, Cu2+, F−, O2−) with ∼105× speedup over traditional methods.160 Screening 144
595 compounds identified 316 SSE candidates and 129 cathode materials across the different ionic species. Similarly, Cai et al. used XGBoost algorithms to screen spinel structures for Mg/Zn cathodes, achieving 91.2% prediction accuracy and identifying six candidates (MgNi2O4, MgMo2S4, MgCu2S4, ZnCa2S4, ZnCu2O4, ZnNi2O4) with ionic diffusion coefficients >1 × 10−9 cm2 s−1 and volume expansions <22%.161 These targeted approaches demonstrate ML's potential for accelerating discovery in underexplored multivalent systems.
This capability has profound implications. MLIPs allow for the simulation of complex SSE systems, such as amorphous phases, grain boundaries, and interfaces, which are often intractable with AIMD due to their size and disorder. Furthermore, the extended simulation times accessible with MLIPs are crucial for capturing rare diffusion events, accurately calculating diffusion coefficients, and observing collective ionic motion, leading to unprecedented insights into ion transport pathways and the role of structural dynamics. Beyond these mechanistic studies, MLIPs also enable the high-throughput computational screening of vast design spaces to accelerate the discovery of entirely new SSE materials (Fig. 4).
The theoretical foundation for this field was established by Behler and Parrinello (2007), who introduced high-dimensional neural network potentials using symmetry functions to describe local chemical environments in a rotationally and translationally invariant manner.162 This pioneering approach laid the groundwork for modern MLIPs that enable DFT-accuracy simulations at significantly reduced computational cost.
Applications of MLIPs in SSE research have progressed from validating known properties to discovering new transport phenomena and challenging established mechanisms. Gigli et al. (2024) exemplified this evolution by investigating charge transport in all known phases (α, β, and γ) of Li3PS4 using three separate potentials trained on different DFT reference levels (PBEsol, r2SCAN, and PBE0).163 Their large-scale (768-atom) and long-timescale (up to 6 ns) simulations revealed that superionic behavior results from a structural transition from γ to mixed α-β phases, driven by thermal activation of correlated PS4 flips that reduce Li-ion diffusion activation energy by up to 6-fold.163 Crucially, they refuted the “paddle-wheel” mechanism by demonstrating that PS4 flip timescales (nanoseconds) and Li-ion hopping (picoseconds) are separated by orders of magnitude, while also showing that the commonly used Nernst-Einstein approximation underestimates conductivity by more than a factor of two.163
The power of MLIPs in elucidating complex transport behaviors extends to understanding non-Arrhenius temperature dependence in garnet systems. Dai et al. (2022) studied LixLa3Zrx−5Ta7−xO12 garnets using MLIPs trained on DFT-MD trajectories, achieving superior accuracy compared to other computational models.164 Their simulations revealed that ionic conductivity follows Vogel–Tammann–Fulcher rather than Arrhenius behavior, with maximum conductivity occurring at Li content between 6.6 and 6.8.164 This work demonstrates how MLIPs can capture subtle temperature-dependent transport phenomena that require extensive sampling.
MLIPs have proven particularly valuable for studying amorphous systems and interfaces, where structural disorder demands large simulation cells and long equilibration times. Seth et al. (2025) investigated Li+ transport in amorphous LiPON and at Li||LiPON interfaces using a neural equivariant interatomic potential (NequIP) trained on over 13
000 DFT structures.165 Their simulations accurately reproduced experimental room-temperature conductivity in bulk LiPON while revealing that interfacial transport is one order of magnitude slower than bulk transport.165 Similarly, Yang et al. (2025) combined AIMD with DeePMD MLIPs to study amorphous LixAlOγCl3+x−2y electrolytes, revealing that Li+ transport is facilitated by Cl atom rotation within tetrahedral frameworks and that oxygen doping enhances glass-forming ability while reducing mobile Cl atoms, requiring optimization of the O/Cl ratio for maximum conductivity.166
The integration of MLIPs with materials discovery workflows has enabled the exploration of composition–structure–property relationships across extended chemical spaces. Guo et al. (2022) demonstrated this approach by mapping the phase diagram of glass-ceramic lithium thiophosphate electrolytes using neural network potentials coupled with genetic algorithms to explore amorphous structures along the (Li2S)x(P2S5)1−x composition line.167 Through unsupervised structure-similarity analysis, they identified that local Li environments resembling superionic β-Li3PS4 are energetically favorable around x ≈ 0.725, leading to the design of a new candidate composition with predicted ionic conductivity exceeding 10−2 S cm−1.167
Beyond solid-state electrolytes, MLIPs have also provided valuable insights into ionic transport mechanisms in battery electrode materials. Ha et al. (2022) demonstrated the application of SGPR-accelerated molecular dynamics to investigate the effect of aluminium doping on Li-ion transport in Li-excess layered oxide cathodes.168 Their nanosecond-timescale simulations of Li1.22Ru0.61Ni0.11Al0.06O2 revealed that Al-doping reduces the Li-ion diffusion activation energy from 0.48 eV to 0.40 eV, demonstrating enhanced ionic transport alongside improved structural stability. This reduction in activation energy resulted in approximately twice the Li-ion diffusion coefficient at elevated temperatures. The study showed how strategic dopant selection can simultaneously optimize both transport properties and electrochemical stability, with strengthened Al–O bonding suppressing oxygen oxidation while facilitating Li-ion mobility.
Despite their transformative potential, MLIP-based MD simulations require careful validation to ensure reliable predictions, particularly given inherent uncertainties in force predictions and energy errors.169 Best-practice validation strategies extend beyond simple energy and force comparisons to include systematic benchmarking against AIMD for key properties such as diffusion coefficients, phase stability, and thermal transport.170 Uncertainty quantification through ensemble methods, gradient-based approaches, or committee models provides essential error estimates during simulations, enabling active learning protocols that iteratively improve MLIP reliability.171,172 Furthermore, domain-specific validation tests, including rare event prediction and long-timescale dynamical properties, are crucial for establishing confidence in MLIP extrapolation beyond training domains.173 As the field matures, standardized validation protocols and uncertainty reporting will be essential for establishing MLIP credibility in high-stakes materials discovery applications.
Table 2 summarizes these seminal contributions, illustrating how MLIPs have advanced our understanding of ion dynamics in SSEs.
| Study/MLIP development (primary citation) | MLIP type/focus | SSE system(s) investigated | Key insights into ion dynamics/mechanisms | Significance/impact |
|---|---|---|---|---|
| Behler and Parrinello (2007)162 | HDNNPs using atom-centered symmetry functions | Bulk silicon (as proof-of-concept for general condensed matter systems) | Decomposes total energy into local atomic contributions, enabling simulations of arbitrarily sized systems with DFT accuracy by learning the potential energy surface (PES) | Foundational theoretical and methodological work that established the modern framework for atomistic MLIPs, making large-scale, long-timescale simulations of SSEs feasible |
| Guo et al. (2022)167 | ANN potential combined with a genetic algorithm (GA) for AI-aided sampling | Glass-ceramic lithium thiophosphate (LPS) systems: (Li2S)x(P2S5)1−x | Discovered that local Li environments similar to the superionic β-Li3PS4 phase are energetically favored around composition x ≈ 0.725. Mapped the amorphous phase diagram and identified miscibility gaps | Demonstrated a powerful workflow combining MLIP-accelerated sampling and structural analysis to design novel, high-conductivity amorphous SSE compositions |
| Gigli et al. (2024)163 | GAPs trained on multiple DFT levels (PBEsol, r2SCAN, and PBE0) | All known polymorphs (α, β, γ) of lithium thiophosphate (Li3PS4) | Showed superionic behavior is driven by a structural transition activated by correlated PS4 flips, not a “paddle-wheel” effect. The Nernst-Einstein approximation underestimates conductivity by over a factor of 2 due to strong ionic correlations | Resolved a long-standing controversy over the transport mechanism in Li3PS4 and highlighted the necessity of using higher-accuracy functionals (PBE0) and correlation-aware analysis for predictive simulations |
| Dai et al. (2022)164 | Artificial neural network (SIMPLE-NN) using atom-centered symmetry functions | Lithium garnet oxides: LixLa3Zrx−5Ta7−xO12 | Revealed that ionic conduction in garnets follows a non-Arrhenius temperature dependence, better described by the VTF equation. Calculated Haven ratio of 0.1–0.4 indicates strong concerted motion of Li-ions | Provided a highly accurate potential for the garnet family, resolving ambiguity around the optimal composition for conductivity (x = 6.6 to 6.8) by combining simulations with experimental data |
| Seth et al. (2024)165 | NequIP, an E(3)-equivariant GNN | Amorphous lithium phosphorus oxynitride (LiPON) and Li | LiPON interface | Accurately modelled the amorphous LiPON structure and bulk Li+ conductivity. Found that Li+ transport across the Li |
| Yang et al. (2025)166 | DeePMD | Amorphous oxychloride electrolytes: LixAlOγCl3+x−2y | Uncovered that Li+ transport is facilitated by the rotation of Cl atoms within a structural skeleton of Al-chains. Found that O-doping enhances amorphization (enabling Cl rotation) but reduces mobile Cl atoms, creating an optimal O/Cl ratio for conductivity | Elucidated a novel transport mechanism in an emerging class of amorphous oxychloride SSEs and provided a clear design principle based on balancing glass-forming ability with mobile anion concentration |
| Ha et al. (2022)168 | SGPR with on-the-fly training | Al-doped Li-excess layered oxide cathodes: Li1.22Ru0.61Ni0.11Al0.06O2 | Demonstrated that Al-doping reduces Li-ion diffusion activation energy from 0.48 eV to 0.40 eV, enhancing ionic transport while strengthened Al–O bonding suppresses oxygen oxidation and improves structural stability | Demonstrated how dopant-induced electronic structure modifications can simultaneously enhance ionic transport and suppress degradation mechanisms, providing design principles for stable high-energy-density electrode materials with improved Li-ion mobility |
The data quality problem compounds this scarcity. SSE datasets aggregate information from disparate experimental protocols, computational methods with varying theoretical rigor, and literature reports lacking standardized metrics.179 This heterogeneity introduces systematic noise, missing values, and conflicting measurements that undermine model reliability. The absence of centralized, standardized databases for multivalent SSE properties forces fragmented, redundant curation efforts across research groups,87 impeding collaborative progress.
000 Na-containing materials, revealing that high-conductivity candidates consistently shared specific structural characteristics, such as the abundance of certain polyhedral motifs (XO4 tetrahedra), and the presence of spacious ion channels.106 This finding suggests a path for methodological transfer to beyond-lithium systems. While the optimal structural features for a Mg2+ conductor will differ from those for Na+, the types of descriptors identified as critical such as coordination environments, polyhedral packing, and framework connectivity, are likely to be fundamentally important across different ion systems. An effective strategy, therefore, involves using unsupervised learning on large Li+ or Na+ datasets to identify these critical feature classes, which can then guide the engineering of more targeted descriptors for the subsequent supervised modeling of multivalent systems.
000 lithium-containing compounds and subsequently annotated the resulting clusters using a limited set of experimental conductivity measurements.95This methodology successfully identified a cluster exhibiting high probability for superionic conduction, which led to the experimental confirmation of Li3BS3 as a novel ionic conductor. The success of this approach provides a template for a targeted discovery pipeline in underexplored chemical spaces, such as those for multivalent conductors. Such a workflow would involve first clustering the vast space of hypothetical multivalent host structures using reliable structural descriptors. Following this, a small and diverse set of compounds from different clusters could be strategically synthesized to serve as initial “seed” labels. Subsequent experimental efforts could then be prioritized on the unlabeled materials within or adjacent to clusters containing the most promising initial results, thereby maximizing the value of each experiment and accelerating the identification of novel beyond-lithium SSEs.
Successful liquid electrolyte platforms like the Electrolyte Genome181 demonstrate the value of systematic property correlation mapping and automated screening workflows beyond simple high-throughput calculation. These liquid-phase systems also offer opportunities for cross-domain learning: ion transport patterns in liquid and polymer electrolytes including solvation dynamics, coordination environment effects, and structure-transport correlations can inform descriptor engineering and mechanistic understanding for solid electrolytes, particularly for data-scarce multivalent systems where liquid-phase computational studies are more prevalent. Adapting these methodologies to solid-state systems could establish not only standardized data specifications but also automated multi-property optimization pipelines that integrate atomic-scale MLIP predictions with mesoscale grain boundary and interface modelling.
The synergy between HTP-DFT and ML creates a self-reinforcing cycle: computational data trains ML models, which subsequently accelerate screening by reducing computational bottlenecks.
This approach has demonstrated practical success in optimizing doping strategies for LLZO electrolytes.57 By combining ML models with uncertainty quantification, the active learning framework efficiently navigated the vast compositional space, identifying promising dopant combinations while minimizing required simulations and experiments.57
However, the effectiveness of these data-centric approaches depends critically on establishing clear prioritization criteria for data collection efforts. Future experimental and computational campaigns should prioritize: (1) multivalent systems with intermediate ionic radii (Mg2+, Zn2+) that bridge the gap between monovalent and highly charged species, (2) materials exhibiting mixed ionic-electronic conductivity where transport mechanisms remain poorly understood, and (3) interfacial properties and degradation pathways that are systematically underrepresented in current databases. Computationally, emphasis should be placed on generating temperature-dependent transport data and correlated ionic motion descriptors, as these are essential for capturing the non-Arrhenius behavior observed in many superionic conductors yet remain scarce in existing datasets. The choice among these strategies or, more likely, a combination thereof will depend critically on the specific SSE system under investigation, the target property, and the nature of the available data. For instance, while transfer learning might be effective for predicting properties of Na-ion conductors based on Li-ion data due to their chemical similarities, discovering novel multivalent conductors might necessitate more extensive de novo data generation via HTP-DFT, guided by active learning, to capture their unique physics. A universal solution to data scarcity is improbable; instead, a versatile toolkit of these data-centric approaches is essential for continued progress. Table 3 summarizes the key data challenges encountered in the application of ML to SSE discovery and outlines potential mitigation strategies.
| Data challenge | Impact on ML model development | Key mitigation strategies and supporting evidence |
|---|---|---|
| Overall scarcity for SSEs | Poor generalization, difficulty modelling complex phenomena, bias towards well-studied systems | HTP-DFT data generation,180 development of curated databases,87 active learning,57 semi-supervised learning95 |
| Specific scarcity for multivalent ion conductors | Inability to model distinct physics (e.g., stronger coulombic interactions, sluggish diffusion) accurately, poor extrapolation from Li-ion systems | Targeted HTP-DFT for multivalents, transfer learning,47 physics-informed ML,183,184 unsupervised learning for feature discovery106,185 |
| Data heterogeneity/quality (multi-source, noise, missing values) | Reduced model reliability, inconsistent predictions, difficulty in training robust models | Rigorous data curation & preprocessing,179 standardized data reporting protocols, Robust ML algorithms tolerant to noise |
| Small sample sizes for truly novel chemistries | High risk of overfitting, poor predictive power for unexplored chemical spaces | Generative models for candidate proposal,186 transfer learning from broader chemical domains,187 LOGO-CV for realistic performance assessment188 |
• High ionic conductivity (σ): typically targeted to be ≥10−4 S cm−1 at room temperature, approaching or exceeding that of liquid electrolytes, to enable high power densities.
• Wide electrochemical stability window (ESW): the electrolyte must remain stable against both highly reducing (anode) and highly oxidizing (cathode) potentials, ideally >5.5 V vs. Li/Li+ for high-voltage applications.
• Good electrode compatibility: minimal chemical and electrochemical reactivity with both anode (especially Li metal) and cathode materials to prevent detrimental interfacial layer growth and impedance rise.
• Sufficient mechanical strength and appropriate moduli: the SSE should possess adequate mechanical robustness to suppress lithium dendrite penetration and withstand the stresses induced by electrode volume changes during cycling, while also maintaining good interfacial contact.
• High Li+ transference number (tLi+): ideally close to unity, indicating that Li+ ions are the primary charge carriers, which minimizes concentration polarization and improves rate capability.
• Other considerations: factors such as ease of processing, scalability, low cost, and environmental impact also play crucial roles in practical viability.
These requirements, however, must be contextualized within the distinct challenges posed by different battery chemistries. Li-ion systems prioritize dendrite suppression and require stable solid electrolyte interphases (SEI) compatible with graphite anodes, necessitating optimization for both mechanical strength and interfacial stability.189 Na-ion systems face fundamentally different constraints, requiring compatibility with hard carbon anodes due to graphite's incompatibility with Na+ ions, which shifts the optimization focus toward different voltage windows and interfacial chemistries.190 Mg-ion systems naturally avoid dendrite formation due to the divalent nature of Mg2+, but face critical challenges from sluggish ion transport kinetics caused by strong solvation effects and higher activation energies, requiring optimization strategies that prioritize conductivity enhancement over mechanical dendrite suppression.191 Al-ion systems present additional complexity, demanding electrolytes compatible with limited cathode options while managing the high charge density effects of trivalent Al3+ ions.192 Silicon-based Li systems introduce further complications through large volume changes (>300%) that destabilize conventional SEIs, requiring electrolytes optimized for mechanical flexibility and stable interfacial reformation rather than static interfacial stability.193
The interplay between these properties is complex; materials with very high ionic conductivity might exhibit poor mechanical properties or a narrow electrochemical stability window. Traditional single-objective ML approaches, predominantly focused on maximizing ionic conductivity,101,109,194 fail to capture these trade-offs and produce materials unsuitable for practical applications. A critical limitation lies in the lack of frameworks that account for the distinct physics governing different ionic species and their corresponding electrode compatibility requirements. Additionally, the computational expense of evaluating multiple properties for every candidate material during multi-objective optimization searches can be substantial, even when using ML-based surrogate models for property prediction.
Emerging approaches promise to advance beyond feature importance quantification toward mechanistic discovery. Causal machine learning methods can distinguish genuine causal relationships from spurious correlations in structure–property data, revealing which structural modifications directly influence ionic conductivity versus those that merely correlate.204 Symbolic regression techniques, which search for explicit mathematical equations governing material properties, offer an alternative path to interpretability by automatically discovering closed-form expressions that relate compositional and structural descriptors to transport properties or rediscover interatomic potentials.205 These physics-discovering approaches could uncover governing equations analogous to how the Arrhenius relation describes temperature-dependent conductivity, potentially revealing universal scaling laws across different ionic systems.
This iterative refinement, guided by explainability, produces models that are both accurate and grounded in scientifically meaningful parameters, representing a shift from ML merely predicting outcomes to actively contributing to fundamental understanding of solid-state ionics.
Despite these promising developments, successful implementation of XAI in SSE research requires awareness of key methodological limitations. SHAP values exhibit instability in highly correlated feature spaces typical of materials datasets, where structural descriptors often show strong interdependencies.206 LIME's local approximations may inadequately represent global model behavior, particularly problematic for complex structure-property relationships.207 Both approaches assume feature independence, which conflicts with the intrinsically coupled nature of atomic positions, coordination environments, and bonding in crystalline materials. Best practices include validating XAI outputs through multiple complementary methods, examining feature correlation matrices before interpretation, and systematically cross-checking computational insights against experimental observations and established physical principles.
The core issue is that ML models excel at interpolation within their training data domain but struggle with extrapolation to chemically distinct regions. Conventional cross-validation techniques, which randomly split data into training and test sets, often overestimate a model's true extrapolative power because test sets usually contain materials chemically similar to training data. More rigorous “leave-one-group-out cross-validation” (LOGO-CV), where entire chemical families are held out for testing, has demonstrated that conventional ML methods can fail when predicting properties of completely novel compound classes.188 This presents a critical concern for SSE discovery, where the goal is often to identify entirely new material families with breakthrough properties. While universal interatomic potentials like M3GNet are trained on vast databases (e.g., the Materials Project) and aim for broad applicability across diverse chemical spaces,208 achieving reliable extrapolation remains a frontier challenge.
• VAEs learn a compressed, continuous latent representation of materials, from which new candidates can be generated by sampling points in this latent space and decoding them back into material structures or compositions. Noh et al. applied a VAE-based framework to the inverse design of solid-state materials, efficiently exploring chemical compositional spaces to generate novel candidates with desired properties.55
• GANs employ a two-network architecture: a generator that creates new material candidates and a discriminator that tries to distinguish these synthetic candidates from real materials in a training dataset. Through this adversarial training, the generator learns to produce increasingly realistic and potentially novel materials.
• Diffusion models are an emerging class of powerful generative models that operate by learning to reverse a gradual noise-adding process. They have shown significant promise for generating high-quality samples in various domains, like crystal structure generation.212 The MatterGen model, for example, can generate stable, diverse inorganic materials and can be fine-tuned to steer generation towards specific property constraints, including chemistry, symmetry, and various physical properties, with one generated structure successfully synthesized and validated.213
A hybrid approach combining a VAE with a genetic algorithm, termed the evolutionary variational autoencoder for perovskite discovery (EVAPD), has been developed to discover new perovskite materials.196 This framework leverages the VAE's ability to generate diverse candidates from a learned latent space and the GA's strength in optimizing these candidates based on a defined fitness function (e.g., predicted stability). Such hybrid generative approaches hold considerable potential for SSE discovery if adapted with relevant property targets.
The success of these generative models is critically dependent on the quality and relevance of the design rules or property targets they are given. If these targets are ill-defined, incomplete (focusing only on ionic conductivity without considering stability or synthesizability), or do not capture all essential practical constraints, the generated candidates may be theoretically interesting but practically irrelevant or impossible to realize. The ability of models like MatterGen to be fine-tuned for a broad range of property constraints, and its subsequent experimental validation, underscores the importance of multi-faceted and accurate guidance for generative design.213
Several pioneering efforts exemplify this approach:
• The CAMEO system is a real-time, closed-loop autonomous materials exploration platform that uses Bayesian active learning integrated with synchrotron beamline experiments for on-the-fly phase mapping and property optimization.216
• The “Electrolytomics” initiative describes an AI-guided approach that combines data science, robotic experimentation for validation, and computation, leading to the discovery and experimental confirmation of novel high-performance liquid electrolytes.217
• A computational-experimental pipeline successfully combined AI models, physics-based simulations on cloud HPC for large-scale screening, and subsequent experimental synthesis and characterization to discover promising new SSE compositions like NaxLi3−xYCl6.218
• The DiffMix model, a differentiable GDL model, has been used to guide robotic experimentation for optimizing fast-charging liquid battery electrolytes, achieving significant conductivity improvements in a few experimental steps.211
• An integrated high-throughput robotic platform combined with active learning has been developed to accelerate the discovery of optimal liquid electrolyte formulations. This approach efficiently identifies high-solubility redox-active molecules by evaluating a small fraction of candidates, demonstrating the effectiveness of closed-loop frameworks in materials discovery.219
• Iterative training of universal MLPs, where DFT calculations are performed on structures where the MLP shows high uncertainty, also represents a form of closed-loop learning to refine the potential across a wide chemical space.210
Fully autonomous closed-loop systems, often termed “self-driving laboratories”, represent the apex of accelerated materials discovery. However, their widespread adoption for SSE research faces significant hurdles. Beyond the continued advancement of ML algorithms and robotic platforms, a major challenge lies in the development of standardized, automatable, and rapid synthesis and characterization protocols suitable for a diverse range of solid-state chemistries. The synthesis of inorganic solids often involves high temperatures, controlled atmospheres, and multi-step processes that are not as easily automated as liquid-phase formulations. Furthermore, critical to the success of these frameworks is the implementation of robust validation workflows that prevent costly experimental efforts on unfeasible materials. Effective validation protocols should include thermodynamic stability screening via DFT hull distance calculations, with chemistry-dependent thresholds based on the metastability scales established for different material classes,220 kinetic accessibility assessment through thermodynamic upper bounds such as the amorphous limit for polymorph synthesizability,221 and rapid experimental validation using automated characterization techniques222 such as XRD phase identification and impedance spectroscopy.223,224 These multi-tier filters ensure that generative models guide experimental efforts toward genuinely promising candidates rather than thermodynamically unstable or synthetically inaccessible compositions.
The cost and complexity of establishing and maintaining such highly integrated experimental and computational platforms, combined with the need for standardized validation protocols, require substantial investment and interdisciplinary expertise.
Table 4 provides a comparative overview of different generative model approaches and their potential in the context of novel SSE discovery.
| Generative model type | Core working principle | Strengths for SSE design | Limitations/challenges in SSE context | Key examples/potential |
|---|---|---|---|---|
| Variational autoencoders (VAEs) | Learns a continuous latent representation of data; new samples generated by decoding points from this latent space | Smooth latent space allows for interpolation and generation of similar but novel structures/compositions; can be conditioned on properties | Quality of reconstructed/generated materials can be an issue; ensuring chemical validity and stability of generated crystal structures | Inverse material design55 |
| Generative adversarial networks (GANs) | A generator network creates candidates, and a discriminator network tries to distinguish them from real data; adversarial training improves generator | Capable of generating highly novel and diverse candidates; can learn complex data distributions | Training can be unstable (mode collapse); ensuring generated crystal structures are physically realistic and stable is challenging | Crystal structure prediction;128 Inverse design of materials (MatGAN)54 |
| Evolutionary algorithms (EAs)/genetic algorithms (GAs) | Population-based search; applies operators (mutation, crossover, selection) guided by a fitness function (target properties) | Robust global search capabilities; can explicitly handle multiple objectives and complex constraints (e.g., stability, synthesizability) | Can be computationally expensive if fitness evaluation (e.g., DFT calculation) is slow for each candidate; defining effective representations and evolutionary operators for crystal structures | Crystal structure prediction (XtalOpt);214 Guiding phase field exploration for Li-ion conductors215 |
| Diffusion models | Learns to reverse a noise-adding process; new samples generated by iterative denoising from a random starting point | Can generate very high-quality, realistic samples; emerging as state-of-the-art in many generative tasks | Can be computationally intensive for sampling; developing effective conditioning mechanisms for specific material properties and crystal symmetries | General crystal structure generation;212 MatterGen (fine-tuneable generative model)213 |
| Hybrid models (e.g., VAE-GA) | Combines strengths of different generative approaches, e.g., VAE for generation and GA for optimization | Potential to overcome limitations of individual methods; e.g., VAE explores broadly, GA refines promising candidates | Increased model complexity; requires careful integration of components | EVAPD for perovskites196 |
| Integrated closed-loop frameworks | ML proposes candidates → computational validation (DFT) → experimental synthesis/characterization → feedback to refine ML models in iterative cycles. | Combines theoretical prediction with experimental validation; continuous model improvement; reduces experimental waste through guided exploration | Requires substantial infrastructure investment; standardized synthesis protocols needed; complex integration of computational and experimental platforms; slower iteration cycles | CAMEO system;216 Electrolytomics;217 NaxLi3−xYCl6 discovery;218 DiffMix for electrolyte optimization211 |
The data scarcity challenge is particularly acute for multivalent systems (Mg2+, Ca2+, Zn2+, Al3+), where solid-state battery research remains in its early stages both experimentally and computationally. Beyond the stark quantitative disparity with Li-ion SSE databases containing thousands of compounds while Mg2+, Ca2+, Zn2+, and Al3+ conductors each number in the tens to low hundreds,225 these systems exhibit fundamentally different physics that cannot be addressed through simple data augmentation. Multivalent ions face stronger Coulombic interactions with the host lattice due to their higher charge densities, leading to sluggish diffusion kinetics and significantly higher activation energies compared to monovalent systems.177 The migration mechanisms differ qualitatively: while Li+ transport often proceeds via direct hopping between tetrahedral sites, Mg2+ migration typically requires concerted structural relaxation or even temporary coordination changes to overcome the strong cation-anion binding. Additionally, defect chemistry and strain accommodation mechanisms vary substantially—multivalent dopants introduce different charge compensation schemes and elastic distortions that alter migration pathways in ways not captured by Li-based training data. These mechanistic distinctions mean that ML models trained predominantly on Li-ion data lack the physical descriptors and feature representations necessary to capture the governing principles in multivalent systems, creating a critical bottleneck for advancing beyond lithium-ion technologies that cannot be resolved by transfer learning alone without substantial new data generation and physics-informed constraints.
Encouragingly, the research landscape is actively addressing these challenges. Strategies such as transfer learning, unsupervised learning, and advanced data augmentation techniques are being developed to combat data limitations. Physics-informed machine learning and the pursuit of universal descriptors and interatomic potentials aim to enhance model transferability and generalization. Explainable AI methods are beginning to shed light on the complex structure-property relationships learned by ML models, fostering trust and guiding scientific intuition. Furthermore, generative models, including VAEs, GANs, EAs, and diffusion models, are showing increasing promise in proposing novel SSE candidates from scratch, while sophisticated multi-objective optimization algorithms are helping to navigate the intricate trade-offs inherent in materials design. The most transformative advances, however, are emerging from hybrid frameworks that tightly integrate ML predictions with high-fidelity computations (like DFT) and, crucially, experimental validation, often within automated, closed-loop “predictive synthesis” pipelines.
This review provides several distinctive contributions that advance the field beyond existing literature. We present the first systematic mapping of five interconnected challenges with corresponding emerging solutions, providing a strategic roadmap for practitioners. Unlike previous reviews that predominantly focus on Li-ion systems, we emphasize the critical data gap for multivalent systems and provide specific strategies for addressing this limitation through transfer learning and physics-informed approaches. We uniquely bridge conventional computational methods with cutting-edge ML techniques, demonstrating how hybrid workflows can overcome individual limitations while leveraging complementary strengths. Rather than merely surveying techniques, we provide actionable recommendations for data collection priorities, validation strategies, and best practices for applying explainable AI methods to materials discovery.
To further propel AI-accelerated SSE innovation, future research should prioritize several key directions. The development of next-generation multi-objective optimization algorithms that can simultaneously optimize ionic conductivity, electrochemical stability, mechanical properties, and synthesizability while incorporating real-world constraints represents a critical need. Physics-informed universal models that embed fundamental physical laws governing ionic transport and electrochemical stability directly into model architecture require immediate attention. These must learn temperature-dependent behavior, incorporate many-body interactions, and predict interfacial stability through first-principles constraints.
Robust uncertainty quantification methods for ML predictions, particularly when extrapolating to novel chemical spaces, represent another urgent priority. Cross-domain transfer learning protocols must be established to enable knowledge transfer between different ion types and between computational and experimental domains. Several fundamental research questions require immediate investigation: How can we systematically quantify and improve model transferability across different crystal structure families and ionic species? What are optimal strategies for incorporating experimental uncertainty into ML training datasets? How can we develop models that predict long-term degradation and interfacial evolution beyond static property prediction?
The practical implementation of these advances requires immediate action across multiple fronts. A concerted community-wide effort is essential to build FAIR226 databases that encompass multivalent systems and include comprehensive interfacial property data with standardized metadata. The integration of automated synthesis platforms specifically designed for SSE discovery represents a transformative opportunity, requiring real-time characterization capabilities and automated feedback loops. Comprehensive validation workflows for generative models must include thermodynamic stability screening, kinetic accessibility assessment, and rapid experimental validation using automated characterization techniques.
Future experimental and computational campaigns should prioritize multivalent systems with intermediate ionic radii, materials exhibiting mixed ionic-electronic conductivity, and interfacial properties that remain underrepresented in current databases. The establishment of industry-academic partnerships will be crucial for scaling promising discoveries to commercial applications, while advanced generative models must be refined to ensure chemical validity, thermodynamic stability, and practical synthesizability of the proposed candidates.
The path forward for revolutionizing SSE development lies in a deeply synergistic approach where machine learning realizes its transformative potential through intimate integration with fundamental domain knowledge from physics and chemistry, rigorous computational modeling, and iterative experimental validation. As these integrated intelligence frameworks mature, particularly those enabling autonomous closed-loop discovery, the pace of innovation in solid-state electrolytes is poised for significant acceleration, bringing the promise of safer, more energy-dense, and longer-lasting battery technologies closer to reality.
| This journal is © The Royal Society of Chemistry 2026 |