Diab
Khalafallah†
*ab,
Fuming
Lai†
c,
Hao
Huang
c,
Jue
Wang
a,
Xiaoqing
Wang
d,
Shengfu
Tong
*c and
Qinfang
Zhang
*a
aSchool of Materials Science and Engineering, Yancheng Institute of Technology, Yancheng, 224051, P. R. China. E-mail: qfangzhang@gmail.com
bMechanical Design and Materials Department, Faculty of Energy Engineering, Aswan University, P.O. Box 81521, Aswan, Egypt. E-mail: diab_khalaf@energy.aswu.edu.eg
cSchool of Sustainable Energy Materials and Science, Jinhua Advanced Research Institute, Jinhua, 321015, Zhejiang, China. E-mail: sftong@nju.edu.cn
dCollege of Materials and Chemical Engineering, Chuzhou University, Huifeng West Road 1, Chuzhou 239000, China
First published on 12th June 2025
Electrochemical energy conversion and storage have attracted widespread interest as green and sustainable technologies. In particular, research on water electrolysis and supercapacitors (SCs) has experienced significant growth, focusing on novel electrodes/electrocatalysts with prominent performances. Recently, computational frameworks employing machine learning (ML) algorithms have revitalized the targeted design of advanced nanomaterials as electrodes/electrocatalysts with tunable electronic configurations and superior reactivity. Descriptor-based analysis has proven efficient in elucidating the structure–property (e.g., activity, selectivity, and stability) relationships, addressing the complex interactions between the catalytic surface and reactant species and predicting enormous data sets. In this contribution, we present an overview of ML-driven electrode/electrocatalyst design, highlighting several novel algorithms and descriptors. The latest advancements in ML approaches are presented to efficiently screen a wide range of metal-based materials. Leveraging recent achievements, this review describes the application of ML for the discovery of active and durable nanomaterials, including identifying active sites, manipulating compositions at the atomic level, predicting the structure/performance, and optimizing thermodynamic properties as well as kinetic barriers. Moreover, recent milestones and state-of-the-art progress in ML integration strategies-materials informatics to stimulate the design of highly efficient electrode/electrocatalyst systems for the hydrogen evolution reaction (HER), oxygen evolution reaction (OER), and SCs are discussed. Finally, we highlight potential future directions for uncovering the revolutionary potential of ML in boosting sustainability and prediction efficiency in the electrochemical energy conversion and storage sector. This review intends to reinforce the junctions between industry and academia and merge endeavors from fundamental understanding to technological execution.
Efficient electrocatalysts are key components in electrochemical water splitting, enhancing reaction kinetics and improving energy efficiency.17–20 Cost-effective, highly stable, and extremely reactive compounds significantly contribute to electrocatalytic systems.21–25 Complexes based on precious metals remain the most effective electrocatalysts for numerous electrolysis processes, including HER (e.g., platinum “Pt”-based catalysts), OER (e.g., ruthenium “Ru”-, rhodium “Rh”-, and iridium “Ir”-based catalysts), ORR (e.g., Pt-based catalysts), and carbon dioxide reduction reaction (CO2RR, e.g., gold “Au”- and silver “Ag”-based catalysts).26–30 Nevertheless, the scarcity of resources and high cost of noble metal electrocatalysts severely hinder their widespread applications. Therefore, it is necessary to explore cost-effective and high-efficiency catalysts as alternatives to precious metal-based electrocatalysts. Electrochemical supercapacitors (ESCs) represent a novel category of energy storage technologies characterized by an extended cycle life and high energy density.31–35 They can swiftly accumulate charges electrostatically or faradaically at the electrode/electrolyte contact. ESCs are classified into three primary categories, electrostatic double-layer capacitors (EDLCs), which store charge through an electrostatic mechanism; electrochemical pseudo-capacitors, utilizing faradaic electron charge transport; and hybrid SCs. Carbon, transition metal oxide and hydroxide, and conducting polymer-based materials have been commonly introduced as electrodes for SC applications. However, although candidate electrodes/electrocatalysts are developed by trial and error in traditional methodologies, notable problems include their high preparation costs, low efficiency, and extended manufacturing durations. Furthermore, rapidly achieving enhanced electrodes and electrocatalysts from a diverse range of materials remains challenging, hindering widespread breakthroughs in sustainable green energy technologies. The reactivity and catalytic centers of metal- or alloy-based electrodes/electrocatalysts with various facets and Miller indices are significant.36 Furthermore, defects, vacancies, and localized dopants are often recognized as co-catalytic sites.23,37
Solid electrocatalysts and heterogeneous electrocatalysis significantly contribute to chemical engineering, renewable energy, and catalytic water decomposition. Understanding the reaction pathway in electrolysis is highly challenging due to its complex structure and close interaction between the catalytic surface and molecules during the reaction. Accordingly, researchers have conducted fundamental studies to tailor structure–activity relationships, which can provide hypotheses for the development of novel electrocatalysts. However, the in situ screening of the structure of catalysts under the reaction conditions are challenging due to the constraints of characterisation approaches. In this case, atomic simulations utilizing quantum mechanical (QM) computation tools have gained prominence in supplementing experimental analyses and generating invaluable information to strengthen catalyst research.38–40 Over the past decade, artificial intelligence (AI)-assisted computational methodologies and their core principles have attracted widespread attention in both heterogeneous electrolysis and solid electrocatalysts, elucidating the reactive catalytic surfaces during operation, describing the underlying reaction mechanisms and rate-determining steps, predicting the reaction kinetics, uncovering insights for enhanced electrode/electrocatalyst design, and presenting prospects for well-established water electrolyzers.41–46 Advancements in computational technology enable high-throughput computations and theories to facilitate the logical creation of advanced electrocatalysts via the inverse engineering of the potential structure algorithm of materials at low cost, independent of professional input.
ML is a viable method for automating the development, processing, and interpretation of intricate electrode/electrocatalyst datasets, exhibiting enhanced attributes compared to conventional statistical methods. ML can effectively replace density functional theory (DFT) calculations, substantially minimizing expenses and shortening the development cycle.47–55 Moreover, ML algorithms identify sophisticated data-driven models to ascertain critical correlations between the characteristics of electrodes/electrocatalysts and the overall electrochemical efficiency (e.g., activity, specific capacitance, rate capability, stability, and selectivity).56–60 This progression has led to the prosperous establishment of efficacious design and screening guidelines for heterogeneous solid electrocatalysts with distinct properties. For instance, Singh et al. employed ML-driven high-throughput screening of metal atom (M = Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn)-intercalated g-C3N4/transition-metal dichalcogenide (TMD = MoS2, MoSe2, MoTe2, WS2, WSe2, WTe2) heterostructures to boost the HER activity.61 The ML models and comprehensive computations elucidated the correlation between catalyst characteristics and HER activity, as well as identified HER active spots. The strategically confined metal atoms within the heterostructure can significantly optimize the electrocatalytic activity. Qin and colleagues examined the ML random forest algorithm model to predict the OER activity of hydroxide catalysts with extensive doping capacity. The newly synthesized extremely reactive hydroxide catalyst Ni0.77Fe0.13La0.1 was experimentally developed and systematically optimized, achieving an ultralow overpotential of 226 mV at 10 mA cm−2.62 Additionally, Zhang et al. investigated the electronic structure and dynamic properties of electric double layer (EDL) microporous carbon via ML force field accelerated molecular dynamics. The obtained findings indicated that the electrode potential for the Na+ intercalation process can be reduced by improving the solvation configuration of ions at the micropore/electrolyte interface.63 The field of sustainable energy conversion and storage is developing at a rapid rate. As a result, the number of studies pertaining to water catalysis and electrochemical systems published has increased dramatically to date. This area of research should not forfeit the scope of a comprehensive evaluation of significant advancements and methodologies.
This review presents critical and thorough guidelines for materials scientists and engineers to enhance their understanding of ML, particularly in the development of high-efficiency nanomaterials for use as electrodes/electrocatalysts in HER, ER, and SCs. It distinctly emphasizes material-centric insights, illustrating the relation between material systems and their extensive applications, while tuning the understanding by elaborating the essential, unique material discoveries. Specifically, we delve into prevalent ML algorithms and elucidate significant descriptors obtained from theoretical simulations or experimental outcomes, which serve as inputs for modeling various nanomaterials. This will enable materials researchers, engineers, and chemists to determine the critical criteria for predicting the overall performance of nanomaterials. Specifically, this will enable hands-on practice for electrode/electrocatalyst researchers lacking ML knowledge by guiding their techniques in dataset processing accordingly. In this context, the ML applications will be more accessible, assisting scientists, chemists, and engineers in conducting and understanding ML techniques efficiently. Leveraging the recent literature, our in-depth discussions delineate current advancements in the ML-aided design of electrodes/electrocatalysts for HER, OER, and SCs across varying proficiency levels. We also highlight future opportunities and aspects for the expansion of this rapidly evolving research domain, while identifying challenges that may reveal potential innovative solutions. Ultimately, this article examines the transformative impact of ML on the design of electrodes/electrocatalysts for HER, OER, and SCs, while also providing practical guidance tailored for nanomaterials scientists. It will inspire reader-accessible, material-based focus and support the paradigm shift to data-driven research methodologies in this domain and beyond.
In recent years, the emergence of LLMs, which employ advanced neural network architectures to analyze and produce human-like text, has elevated AI capabilities to unprecedented levels. ML algorithms can be classified into many categories based on the characteristics of the data they utilize and the types of problems they address. The key categories of ML algorithms include supervised, unsupervised, semi-supervised, reinforcement, and deep learning.66 A summary of the most commonly used ML algorithms in electrocatalysis and their comparative performance is presented in Table 1.61,67–70 RF performs robustly on small to medium electrocatalyst datasets with minimal hyperparameter tuning, while CNNs learn complex non-linear relationships but require much larger data volumes and careful regularization to avoid overfitting. SVR remains highly effective in data-scarce settings, and Gaussian process regression (GPR) matches or exceeds the accuracy of support vector regression (SVR), while also providing principled uncertainty estimates. However, both SVR and GPR suffer from cubic training complexity as the dataset size increases. Gradient-boosting decision trees (GBDT) strike a compromise by often delivering accuracy comparable to more complex models with built-in feature-importance measures for interpretability, although they demand longer training times and more extensive hyperparameter optimization.
Algorithm | Typical performance (MAE/R2) | Application scenario | Advantages | Disadvantages | Ref. |
---|---|---|---|---|---|
Random forest (RF) | MAE ≈ 0.118 | ΔGH adsorption-energy prediction, catalyst screening. | • Simple to implement, naturally resistant to overfitting. | • Poor at extrapolating beyond training distribution. | 61 |
R 2 ≈ 0.957 | • Stable on small–medium datasets. | • Limited interpretability of complex patterns. | |||
GBDT | MAE ≈ 0.25 | Single-atom/carbon-based SAC catalysis HER activity and stability prediction. | • Balances predictive performance with interpretability. | • Long training times. | 67 |
R 2 ≈ 0.87 | • Robust to heterogeneous feature sets. | • Requires extensive hyperparameter tuning. | |||
CNN | R 2 ≈ 0.93 | Overpotential prediction, multi-property modeling. | • Powerful at fitting highly non-linear relationships. | • Requires large, labeled datasets. | 68 |
• Deep layers extract complex features. | • Prone to overfitting; needs careful hyperparameter tuning. | ||||
SVR | RMSE ≈ 0.24 eV | Band-gap prediction on small datasets. | • Excellent in high-dimensional, small-sample settings. | • Runtime scales poorly with dataset size. | 69 |
• Flexible via kernel choice. | •Sensitive to hyperparameters | ||||
GPR | RMSE ≈ 0.14 eV | GW-level band-gap prediction of functionalized MXenes. | • Built-in uncertainty quantification. | • Training cost ∝ O(N3). | 70 |
MAE ≈ 0.11 eV | • Highest accuracy on limited data. | • Kernel hyperparameters are critical. | |||
R 2 ≈ 0.83 |
Supervised learning algorithms are engineered to establish a correspondence between input data and associated output labels.71 This educational procedure employs labeled training data to develop models capable of predicting outcomes on non-observed data. Prominent supervised learning methods include linear regression, logistic regression, support vector machines (SVM), decision trees, random forests, and k-nearest neighbors (KNN). Linear regression predicts a continuous value from independent variables by determining the optimal line that minimizes the error between predicted and actual values. Logistic regression is used for binary classification tasks, calculating the probability of a given input belonging to a designated class by a sigmoid function. SVM seek to identify the best hyperplane that distinguishes data points of disparate classes, hence maximizing the margin between them. Decision trees are fundamental models that split data into nodes according to feature values, whereas random forests constitute an ensemble of decision trees that enhance the predictive accuracy by mitigating overfitting. KNN are non-parametric algorithms that determine the class of a certain data point by identifying the predominant class among its nearest neighbors.
Unsupervised learning algorithms function on unlabeled data, discerning intrinsic patterns and structures. These techniques are frequently designed for clustering and dimensionality reduction tasks.72 Prominent unsupervised learning algorithms include k-means clustering, hierarchical clustering, principal component analysis (PCA), and autoencoders. K-means is a widely utilized clustering technique that divides the dataset into k clusters according to the distance between data points and cluster centroids. Hierarchical clustering constructs a hierarchy of clusters, which can be represented as a dendrogram, facilitating study at multiple degrees of granularity. PCA is a dimensionality reduction method employed to decrease the number of features in a dataset, while preserving the maximum variance, hence facilitating visualization and minimizing computational expenses. Autoencoders are a type of neural network employed for unsupervised representation learning; they compress data into a latent space, and subsequently rebuild it, allowing the acquisition of essential properties of the input data. Semi-supervised learning occupies a position between supervised and unsupervised learning.73 It leverages a minimal amount of labeled data in conjunction with a substantial volume of unlabeled data to construct more efficient models. This method is advantageous when acquiring labeled data is costly or labor-intensive. Graph-based approaches and self-training are often utilized in semi-supervised learning.
Reinforcement learning (RL) draws inspiration from behavioral psychology, wherein an agent acquires knowledge through interaction with the environment and receives rewards or penalties.74 The objective is to acquire a policy that optimizes cumulative reward over time. Prominent algorithms in reinforcement learning include Q-learning, deep Q-network (DQN), and policy gradient techniques. Q-learning is an off-policy reinforcement learning method that evaluates the value of executing a specific action in a given state, depending on the rewards obtained. DQN enhances Q-learning by employing deep neural networks to estimate Q-values for various actions, enabling its application in intricate, high-dimensional situations. Policy gradient approaches such as REINFORCE and proximal policy optimization (PPO) directly optimize the policy by estimating gradients that enhance the probability of actions, resulting in substantial rewards.
Deep learning, a branch of ML, emphasizes the utilization of multi-layered neural networks to acquire intricate data representations.75 Convolutional neural networks (CNNs),76 recurrent neural networks (RNNs),77 long short-term memory (LSTM) networks,78 and transformers79 are prominent designs employed in many applications, including image classification, speech recognition, and natural language processing. LLMs have significantly transformed how machines comprehend and produce human-like text. These models are constructed using sophisticated deep learning architectures, particularly transformer-based neural networks. In the following section, we examine LLMs and their importance in the domain of AI. Transformers, presented by Vaswani et al. in 2017,80 provide the backbone of modern LLMs. Its architecture is founded on the principle of self-attention, enabling the model to concurrently focus on various segments of an input sequence. This mechanism is very effective for managing long-range dependencies in text. Transformers have replaced RNNs and LSTMs in many natural language processing (NLP) tasks due to their superior ability to capture contextual relationships.
Generative pre-trained transformers (GPT)81 are a category of LLMs created by OpenAI. These models undergo pre-training on extensive text corpora by unsupervised learning, followed by fine-tuning for particular tasks. The GPT series, comprising GPT-2,82 GPT-3,83 GPT-4,84 and GPT-4o,85 have exhibited exceptional proficiency in text production, language translation, question answering, and more tasks. GPT-3, with 175 billion parameters, signifies a substantial progression in natural language processing. It is capable of zero-shot, one-shot, and few-shot learning, making it exceptionally adaptable for many applications. GPT-3 has been used in numerous real-world scenarios such as chatbots, content creation, and programming support. GPT-4o, the most recent and sophisticated model in the series, is anticipated to feature trillions of parameters, surpassing the scale of GPT-4. Utilizing state-of-the-art training methodologies and an even larger and more diverse dataset, GPT-4o is anticipated to outperform in various domains, including sophisticated reasoning, real-time data assimilation, and multi-modal functionalities (e.g., image, video, and text processing). This model is expected to demonstrate exceptional fluency in both natural and technical language, hence improving its capacity to aid in intricate research, innovative problem-solving, and advanced professional activities. Bidirectional encoder representations from transformers (BERT), created by Google, is an important LLM that has profoundly influenced NLP.86 Unlike GPT, which is autoregressive and processes text unidirectionally, BERT is bidirectional, allowing it to simultaneously examine both left and right contexts. This bidirectional methodology enables BERT to understand the complete context of a word within a sentence, making it exceptionally proficient for tasks such as sentiment analysis, named entity recognition and question answering. LLMs have been utilized across various domains, ranging from customer service and healthcare to creative writing and software development. Nonetheless, these models also come with challenges such as the need for extensive computational resources, the risk of producing biased or detrimental information, and the complexities associated with assuring interpretability and transparency.
Alongside quantum mechanical insights, thermodynamic considerations are essential for improving materials used in SCs. SCs accumulate energy by electrostatic charge at the electrode–electrolyte interface, with their performance dictated by the thermodynamic relationship between electrode potential and ion concentration, as articulated by the Nernst equation.93 Recent advancements have focused on refining the porosity and surface chemistry of these materials to facilitate ion transport, therefore augmenting energy storage.94–96 Pseudocapacitive materials such as transition metal oxides and conducting polymers have garnered interest owing to their capacity to store charge through faradaic redox reactions, providing substantially greater capacitance than the conventional EDLC mechanism of typical carbon-based materials.97 The amalgamation of these pseudocapacitive materials with carbon-based electrodes has yielded hybrid SCs exhibiting enhanced energy and power densities, paving the way for developing more efficient energy storage solutions.
Solid-state physics has greatly enhanced the development of materials for electrocatalysis and energy storage by manipulating their electrical conductivity and structural characteristics. Conductivity is a crucial determinant of electron transfer efficiency in electrochemical reactions, especially in water splitting, where efficient electron transport is vital for optimal catalyst performance. Metal oxides such as ruthenium oxide (RuO2) and iridium oxide (IrO2) are frequently employed as OER catalysts owing to their exceptional conductivity and stability in harsh electrochemical environments.98–101 However, at elevated anodic potentials, the dissolution of IrO2 may result in performance impairment (Fig. 2A).
![]() | ||
Fig. 2 (A) Free-energy profile (ΔG) of Ir dissolution from the IrO2(110) surface under OER conditions. Reproduced with permission from ref. 98. Copyright 2022, the American Chemical Society. (B) Polarization curves, (C) Tafel plots, (D) Nyquist plots, and (E) chronopotentiometry measurements at 10 mA cm−2 of La-doped RuO2 and RuO2. Reproduced with permission from ref. 99 copyright 2020, Elsevier. |
Surface-bound intermediates, such as IrO2OH and IrO3, generated during the dissolution process markedly affect the catalytic activity, underscoring the intrinsic trade-offs between activity and stability. These observations highlight the significance of stability optimization in catalyst design. Doping procedures have been utilized to improve the conductivity and durability of catalysts in response to these problems. For instance, La-doping of RuO2 has demonstrated a reduction in charge transfer resistance, leading to enhanced reaction kinetics and prolonged performance under OER conditions (Fig. 2B–E). Likewise, the design of supercapacitor (SC) electrodes has benefited from solid-state principles, especially through the advancement of nanostructured materials, including nanowires, nanosheets, and nanorods. These structures enhance the surface area and ion accessibility, promoting accelerated charge/discharge rates and greater energy storage capacities. Furthermore, two-dimensional (2D) materials, such as graphene and molybdenum disulfide (MoS2), demonstrate distinctive electronic characteristics and high surface areas, making them versatile options for water electrolysis and SCs.102 Their structural flexibility enables functionalization, permitting the further optimization of their surface chemistry for improved electrochemical interactions. These integrated methodologies offer theoretical and empirical insights into the advancement of efficient and durable materials for energy applications.
The primary advantage of combining ML with DFT lies in leveraging the high accuracy of DFT computations with the fast inference and generalizability of ML models. The complete workflow for this strategy is illustrated in Fig. 3, which depicts the end-to-end process from DFT data generation and ML model training to iterative optimization through active learning. Researchers typically begin by generating reliable initial datasets using DFT, which include electronic structures, adsorption free energies, and reaction kinetic parameters.104,105 Based on this dataset, essential physical and chemical features are extracted, such as elemental composition, electronegativity, atomic radius, and coordination environment. Then, these features are used to train high-performance ML models, including random forest, support vector regression, and deep neural networks.106,107 These models capture complex nonlinear relationships between material descriptors and catalytic performance, enabling the accurate predictions of a wide range of unexplored materials.2
![]() | ||
Fig. 3 Workflow for automating the discovery of theoretical materials using a combination of ML and DFT: (A) conventional experimental workflow based on testing and analysis; (B) automated DFT-based workflow incorporating scaling relationships and structure selection; (C) manual DFT screening process relying on expert intuition; and (D) ML-driven automated workflow including motif selection, structure generation, and candidate prediction within a closed-loop system. Reproduced with permission from ref. 36, copyright 2018, Springer Nature. |
Subsequently, trained ML models are applied to predict the properties of large candidate material libraries. This enables the rapid identification of promising candidates, while significantly reducing the number of costly DFT computations required.108
The predicted top-performing candidates are further validated by DFT to ensure the reliability of the predictions. The new DFT data generated during this validation phase are used to update and retrain the ML models, establishing a closed-loop active learning framework that progressively enhances the model accuracy and generalization.109 In addition to computational improvements, experimental feedback is incorporated to validate ML predictions and synthesize promising candidates. Then, experimental results are used to refine both the DFT computations and ML models. This closed-loop interaction between theory and experiment substantially increases the success rate of materials discovery and accelerates advancements in electrocatalyst research.
ML methodologies have significantly contributed to revealing concealed relationships among structure, composition, and catalytic efficacy. By combining ML with DFT computations, researchers can efficiently evaluate an extensive array of candidate catalysts, predict their adsorption energies, and enhance their electronic architectures. ML algorithms have effectively discerned synergistic effects in multi-metal systems, wherein the optimum composition promotes the electronic conductivity and optimizes the intermediate adsorption energies.114,115 These studies expedite the discovery of high-performance catalysts and elucidate essential structural characteristics, including bond lengths, defect density, and ionization energies, that influence catalytic activity. The standard procedure for ML-guided catalyst tuning involves three main stages including data generation, model training, and feature importance analysis, as depicted in Fig. 4.114 The advancement of defect-engineered catalysts, especially those including oxygen and cation vacancies, has significantly enhanced catalyst optimization. Oxygen vacancies significantly reduce the reaction barrier by improving the adsorption of intermediates, as evidenced in systems such as CoFe oxides and NiFe-LDHs.116,117 These materials demonstrate enhanced electronic conductivity and tunable electron transport resulting from vacancy-induced modifications in their electronic structure. Multi-vacancy systems containing both cation and anion defects offer increased electrochemically active surface areas and diminished overpotentials, facilitating elevated current densities and prolonged stability under the reaction conditions. Analyses powered by ML have consistently predicted and validated these defect configurations, demonstrating their potential to further improve the catalytic performance.118
![]() | ||
Fig. 4 ML approach for catalyst optimization: (A) workflow of the ML process including data generation, model training and testing, and feature analysis; (B) comparison between DFT-calculated Gmax values at η = 0.3 V and predictions from the best-performing GBR model after four-fold cross-validation; (C) feature importance analysis of gradient boosting regressor (GBR) model for Gmax at η = 0.3 V. Reproduced with permission from ref. 114 copyright 2024, Elsevier. |
Appropriate ML algorithms and beneficial high-throughput screening of materials are essential for surmounting complexity hurdles and systematically suggesting extensive search dimensions. Lunger and colleagues introduced the ML approach to precisely predict the per-site characteristics of perovskite oxides for the OER, utilizing site-projected O 2p-band, Bader charges, and d-band centers, where faceting and elemental substitution were found to alter the local electronic structure.122 To achieve this objective, scientists developed a graph-based neural network model to investigate site-dependent characteristics, and subsequently predict the binding energies of the OER intermediates. They asserted that per-site descriptors throughout the dataset exhibited considerable variation based on surface coordination and composition, and could be frugally predicted from the structure. The site-specific features that exhibit a linear correlation with OER binding energies could be adopted to modulate OER energetics. Through a comparison of the OER energetics from the created dataset with prior studies, the authors manifested the potential to tailor per-site characteristics of the active site via their theoretical framework. A covalency competition model guided by a random forest algorithm was developed to predict highly efficient OER electrocatalysts. A dataset of more than 300 spinel oxides was evaluated for structural and elemental characteristics.123 Using this ML model, researchers identified a promising spinel, [Mn]T[Al0.5Mn1.5]OO4, as an exceptionally effective OER catalyst (Fig. 5).
![]() | ||
Fig. 5 (A) Comparative experimental and calculated electrocatalytic activities of spinel oxides; (B) random forest algorithm predicted Max(DT, DO) against DFT-calculated Max(DT,DO). Inset B displays the deviation between both models. (C) random forest model screened Max(DT, DO) of M0.5 (M = Zn, Al, Li, and Cu)-substituted CoCo2O4, MnMn2O4, and FeFe2O4. Reproduced with permission from ref. 123, copyright 2020, Springer Nature. |
Chen and coworkers employed three distinct ML methods to explore high-performing HER catalysts. Various parameters, including catalyst, additive, electrolyte type, and support, were analyzed to determine their impact on the overpotential. The data analysis revealed that Pt and Mo metals at a ratio of 0.5 were the most reactive constituents, whereas heteroatomic N, S and nickel foam served as the ideal non-metallic elements and support, respectively.124 Considering this, the ML model predicted the HER performance of N, S-doped Pt@Mo2S3 in alkaline electrolyte, demonstrating a minimal overpotential of 33 mV. A high-throughput screening of 2D transition-metal dichalcogenides (TMDs) was conducted to identify high-performance HER electrocatalysts, considering the thermodynamic stability between phases, vacancy formation energy, hydrogen adsorption energy, and zero bandgap. In comparison to Pt(111), monolayer VS2 and NiS2, transition metal ion vacancies in ZrTe2 and PdTe2, as well as chalcogenide ion vacancies in CrSe2, TiTe2, VSe2, and MnS2 exhibited superior HER catalytic activity.125 The ML model provided an attractive catalytic descriptor for quantitatively evaluating the HER activity of 2D TMDs based on their local electronegativity and valence electron number, indicating a novel approach for catalyst engineering and activity optimization.
Another strategy for creating high-performance catalysts is to optimize their active sites and geometric structures.48,53 However, catalytic material simulations and experiments generally rely on conjecture, which makes them expensive, time-consuming, and ineffective. In the investigation of boosting the catalytic performance through structural design, Ryan et al. constructed a deep neural network based on existing crystal structures, signifying its usefulness as an ML tool for examining huge crystallographic datasets.48 The model was applied to the Mn–Ge system after being trained on 51723 binary and ternary crystal structure templates. As a result, 36 more compositions in the Li-Mn–Ge ternary system were discovered, together with four compositions with high-probability structures. This method facilitates the synthesis and discovery of novel materials, especially for complex multi-element systems.
Using only readily available intrinsic properties, a universal and interpretable descriptor model (called ARSC) was adopted to unify activity and selectivity prediction for multiple electrocatalytic reactions (i.e., O2/CO2/N2 reduction and O2 evolution reactions). This model decoupled the atomic property (A), reactant (R), synergistic (S), and coordination effects (C) on the d-band shape of dual-atom sites, verifying the importance of the d-orbital overlap degree in reactions at dual-atom locations. Instead of performing more than 50000 DFT high-throughput computations, the authors could quickly identify highly active dual-atom locations for a variety of reactions and products owing to this description. The universality of this model was supported by abundant documented research and subsequent experiments, anticipating that Co–Co/Ir–Qv3 is the optimal bifunctional oxygen reduction reaction (ORR)/OER catalyst. As a result, the as-synthesized Co–Co/Ir–Qv3 experimentally achieved a minor overpotential of 330/340 mV at 10 mA cm−2 for OER, in addition to a remarkable half-wave potential of 0.941/0.937 V for ORR.55
To date, the origins of the activity of single-atom catalysts (SACs) are still obscure, which makes their rational design problematic. In this case, Liu et al. applied supervised learning to examine synchrotron spectroscopic data and interpret the catalytic mechanism of graphene-supported Co SACs for HER.53 In line with DFT predictions, the ML model manifested that the active centers were the Co edge locations, which include zigzag and armchair topologies. Impressed by these observations, researchers created edge-rich Co SACs, which at high current densities ultimately performed better than standard Pt/C catalysts. This study highlights the potential of ML in catalyst characterisation and structure-performance correlation analysis.
Additionally, Fe@MoS2 was proposed as a potential electrocatalyst candidate for nitric oxide reduction reaction by merging machine learning techniques with DFT, obtaining a low limiting potential of −0.52 V. Under local alloying circumstances, Fe@MoS2 exhibited high resistance to electrochemical corrosion with desirable thermodynamic stability. More significantly, by exploiting the inherent steric barrier of the defective MoS2 caused by the surrounding S atoms, competing complex electroreduction reaction pathways were directed in the anticipated direction. Using an effective random forest regression (RFR) model algorithm, the relationship between the atomic configurations and the macroscopic characteristics of the materials was evaluated.56
2D TM carbides and nitrides have emerged as potential HER electrocatalysts owing to their plethora of active surface functional groups and tunable electrical conductivity. However, their single active site and poor reaction kinetics restrict their extensive usage in catalytic reactions. Thus, to tackle this issue, TM dopants can change the catalytic characteristics and produce highly reactive compounds. Given this, a series of TM atom-doped Ti3CNO2 and Zr2HfCNO2 was subjected to high-throughput computations to examine the effects of the local structure and corresponding electronic structure modifications on their HER properties. Additionally, the site identification features were used to train a multisite prediction model that predicted the trend of the hydrogen adsorption Gibbs free energy (ΔGH*), reaching a final model accuracy of R2 = 0.97. The results demonstrated the role of Nb, Sc, Rh, W, Ti, and V dopant atoms in boosting the catalytic activity of M′2M′′CNO, yielding ΔGH* < 0.2 eV for more than 38 M′2M′′CNO2, respectively. Hence, this study proved the importance of dopant atoms in reinforcing the catalytic activity of M′2M′′CNO2.60
Using computational algorithms, ML analysis for SCs estimates the performance of various compounds based on their inherent characteristics. The reversible redox reactions of pseudo-capacitive components enhance the specific capacitance and overall efficiency. Besides, heteroatomic dopants further boost the charge storage capability of carbonaceous framework electrodes by providing additional active sites through synergistic interactions. With the help of ML, Wang et al. identified oxygen (O)-rich active porous carbon electrodes for aqueous SCs using an artificial neural network (ANN) model. The overall percentage of nitrogen (N) and O dopants served as the structural characteristics in this investigation, while the surface areas of micropores and mesopores operated as the structural parameters. In both 6 M KOH and 1 M H2SO4 electrolyte, the SC performance of N/O co-doped activated carbon-based electrodes was gathered using the training database. The N/O co-doped activated carbon electrode in 1 M H2SO4 had a large capacitance, as predicted by the ANN. This capacitance was caused by the combined contribution of 1502 m2 g−1 micropore surface area, 687 m2 g−1 mesopore surface area, 20 at% O doping, and 0.5 at% N doping. The excessive O doping in 1 M H2SO4 could considerably boost the specific capacitance, which was ascribed to the advantageous electrode surface wetting and regulated electronic conductivity.34
![]() | ||
Fig. 6 Closed-loop data-driven electrocatalyst design workflow. Reproduced with permission from ref. 127, copyright 2024, the Royal Society of Chemistry. |
Generative adversarial networks trained on the ICSD database expand the chemical search space by proposing new inorganic compositions; over 92% of generated structures satisfy validity checks, and more than 84% maintain charge neutrality, thereby enriching candidate libraries.128 Models pretrained on DFT data are refined with limited experimental measurements via transfer learning, while active learning algorithms identify the most uncertain predictions quantified by Monte Carlo Dropout or ensemble variance to guide additional experiments or high-level calculations. This targeted sampling strategy can reduce overall prediction error by approximately 20% with only a 10% increase in data volume.36 Robust model evaluation employs stratified k-fold cross-validation to preserve the proportions of material classes and performance ranges across folds. Class imbalance is addressed through SMOTE oversampling or cost-sensitive weighting, and descriptor reduction techniques such as LASSO regression or principal component analysis eliminate redundant or collinear features, thereby improving the interpretability and stability.129
![]() | ||
Fig. 7 (A) GBDT model. Reproduced with permission from ref. 67 copyright 2022, the Royal Society of Chemistry; (B) optimized atomic configuration of single-atom-anchored MXenes. Neither Cr nor Mn were investigated for single atom implanting while C was not studied for the surface termination. Reproduced with permission from ref. 131, copyright 2022, Wiley-VCH. |
Presently, ML approaches enable the construction of reliable surface Pourbaix diagrams for realistic nanoparticle sizes. Bang et al. developed a bond-type embedded crystal graph convolutional neural network trained on DFT adsorption-energy differences for O and OH on Pt nanoparticles up to 6525 atoms.132 Their BE-CGCNN model reproduced experimental surface-phase boundaries for Pt–O and Pt–OH coverages within 0.1 eV and delivered Pourbaix diagrams of real-scale nanoparticles at a fraction of the DFT cost. Data-driven kinetic prediction has also been applied in accelerated degradation testing. Wang et al. combined a mechanistic OER degradation model with Bayesian data assimilation of current–time data. By assimilating only the first 300 h of electrolysis, their framework accurately predicted the catalyst lifetime approaching 1000 h with less than 10% error, reducing a multi-thousand-hour test to a fraction of the time.133
Highly reactive oxide electrocatalysts typically demonstrate instability due to rapid ion exchange and defect formation. These materials may undergo dynamic compositional and structural alterations under severe operational conditions. Thus, balancing their performance and stability has emerged as a significant area of research. In this regard, Jeong et al. proposed the weighted mean of cation electronegativities to characterize the covalency of AB2O4 spinel oxides, confirming its role as a valuable shortcut for catalyst design and a descriptor for assessing stability, catalytic activity, and predicting reaction mechanisms (Fig. 8A–G).49 Compositions demonstrated good stability with an
value exceeding 0.96. Those with a value below this threshold revealed partial breakdown, which extremely impairs their performance. Conversely, compositions with an
value close to 0.96 could fulfil ideal requirements, providing suitable structural flexibility to maintain their active sites and stability. The adsorbate evolution mechanism (AEM), which includes the reaction intermediates HO*, O*, and HOO*, was proposed for oxide electrocatalysts facilitating the OER in alkaline media through four sequential single-electron charge-transfer steps.
![]() | ||
Fig. 8 (A) Predicted stability and catalytic activity of spinel oxide electrocatalysts versus![]() ![]() ![]() |
Reaction kinetics, which include the reaction rates and rate constants, are essential for optimizing electrocatalytic processes. Conventional methods of kinetic modeling typically necessitate comprehensive experimental datasets and depend on microkinetic models that demand considerable manual intervention. These techniques are constrained in their capacity to consider the intricate interdependencies of environmental parameters, including surface charge distribution, pH, and electric fields, which influence catalyst activity.138 ML mitigates these issues by automating the prediction of the reaction kinetics via data-driven methodologies. Yue et al.139 conducted first-principles calculations to examine the influence of defect charges on the electrocatalytic performance of transition metal (TM = Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Ru, Rh, Pd, Ag) single atoms coupled with a PtSe2 monolayer (TM@PtSe2), identifying the rate-determining phases of OER and ORR (Fig. 9). The findings manifested that the Pt-rich environments could strengthen the confinement of TM atoms on PtSe2 and confirmed the stability of 29 types of TM@PtSe2 in various charge states. The Pd˙@PtSe2 (ηOER/ORR = 0.31/0.43 V) and Pd × @PtSe2 (ηOER/ORR = 0.36/0.74 V) configurations demonstrated both low formation energy and superior electrocatalytic performance due to their ultralow overpotential, reduced formation energy, and stable structure across a wide Fermi level range. The charge states of TM@PtSe2 significantly influenced the establishment of bifunctional OER/ORR catalysts with lower overpotentials by optimizing the interaction intensity between the TM@PtSe2 catalytic schemes and oxygenated intermediates. The ML models elucidated the nature of activity in OER/ORR processes. Primary descriptors such as ionization energy, electronegativity, number of TM d-electrons, d-band center, and electron affinity were utilized to assess the adsorption behavior. This highlights the impact of defect charges on electrochemical processes, offering a theoretical framework for exploring efficient bifunctional OER/ORR electrocatalysts.
![]() | ||
Fig. 9 Reaction mechanisms of OER and ORR on transition metal-based catalysts. Reproduced with permission from ref. 139, copyright 2024, the Royal Society of Chemistry. |
Xu et al.140 highlighted the utilization of ML techniques to simulate reaction rate constants for HER and OER processes. Their research illustrated how ML algorithms can capture variations in catalyst performance across different environmental circumstances, offering theoretical insights for catalyst improvement. Moreover, ML methods have been employed to automate the enumeration of reaction pathways, therefore considerably diminishing the dependence on human perception. The application of ML in reaction kinetics transcends mere prediction of the rate constants. Graph neural networks (GNNs) have been employed to model the interactions between reactants and the catalyst surface, considering structural and electrical aspects that affect the reaction kinetics.141 These models have demonstrated significant efficacy in HER and OER investigations, where surface morphology and electronic environments are critical for catalytic performance.
ML techniques such as neural networks, support vector machines (SVMs), random forests, and graph neural networks (GNNs) are powerful tools for predicting electronic properties from extensive datasets. Among them, GNNs are particularly adept at modeling intricate atomic interactions and electronic characteristics (e.g., bandgap and DOS). Recent findings indicate that oxygen functional groups (OGs) markedly improve the catalytic efficiency of M–N4–C SACs for HER and OER. Specifically, incorporating functional groups such as OH, COOH, and C–O–C into M–N4 structures (e.g., Fe or Co centers) modifies essential features such as the d-band center and electron affinity, which are vital for promoting the catalytic activity (Fig. 10A and B).143 The integration of ML with DFT allows an in-depth examination of these effects, yielding significant insights into HER/OER activity and aiding in the development of high-performance catalysts. Moreover, manipulating metal centers and substituents together with doping 5d transition metals into graphitic carbon nitride underscores the significance of the d-band center, M–O/M–H interactions, charge states, and defect formation energies in enhancing bifunctional OER/ORR catalysts (Fig. 10C and D).144 Likewise, combining DFT and ML for TM-doped C3B monolayers has proven effective in identifying key descriptors such as the number of d electrons, electronegativity, first ionization energy, and atomic radius, which are vital for enhancing the HER and OER activities (Fig. 10E and F).145 By doping 3d metals (Ti, V, Cr, Mn, Fe, Co, Ni, Cu, and Zn), 4d metals (Zr, Nb, Mo, Ru, Rh, Pd, and Ag), and 5d metals (Ta, W, Re, Os, Ir, Pt, and Au) with diverse structural and electronic properties in a C3B monolayer structure, numerous metal boride catalysts were created and their possible use for HER/OER further investigated. Fe-, Ag-, Re-, and Ir-doped C3B showed outstanding HER performances with their ΔG*H value approaching 0.00 eV. Alternatively, the Ni- and Pt-doped C3B exhibited remarkable OER activities with an overpotential of less than 0.44 V among the TM atoms considered. In conjunction with their low overpotentials for the HER process, the adopted Ni/C3B and Pt/C3B were suggested as possible bifunctional electrocatalysts for water splitting. These achievements demonstrate the collaborative potential of ML and DFT in accelerating catalyst development and discovery.
![]() | ||
Fig. 10 Integration of ML and DFT in catalyst design. (A) and (B) Impact of oxygen functional groups (OH, COOH, and C–O–C) on the d-band center and catalytic performance of M–N4–C catalysts. Reproduced with permission from ref. 143 copyright 2024, the Royal Society of Chemistry. (C) and (D) Effect of 5d transition metal doping on OER and ORR overpotentials and key electronic properties. Reproduced with permission from ref. 144. Copyright 2024, Elsevier. (E) and (F) Key descriptors influencing HER/OER activities in TM-doped C3B monolayers identified through ML and DFT. Reproduced with permission from ref. 145 copyright 2023, the American Chemical Society. |
Materials databases such as OQMD,146 Materials Project,147 and AFLOWlib148 are essential resources that offer extensive information on the electronic structures of thousands of materials. When combined with ML algorithms, these databases enable the identification of materials with optimal electronic structures for specific applications, such as strengthening the catalytic properties for water splitting and improving the performance of SCs. For example, studies on dual transition metal Janus-MXenes have shown that ML-assisted models can accurately determine key factors influencing their ORR and OER performance, such as cohesive energy, phonon dispersion, and electronic stability. This method not only reduces research costs but also accelerates the discovery of high-performance catalysts.149 The application of ML in materials discovery has advanced the boundaries of materials innovation, enabling the rapid development of more efficient and sustainable energy technologies. Bandgap prediction and optimization constitute a significant domain for ML in materials design. The bandgap not only dictates the conductivity of a material but is also intrinsically linked to its catalytic efficiency and energy storage capability. By employing ML models, researchers can rapidly predict the bandgap based on attributes such as atomic composition, crystal structure, and electron density. This approach has proven to be a robust alternative to conventional DFT-based techniques, which frequently encounter challenges such as high computational cost and underestimation of the bandgap. Priyanga et al.150 emphasized the capability of random forest models in predicting the nature of band gaps in perovskite oxides (ABO3) based on chemical composition, ionic radius, ionic character, and electronegativity. A Random Forest algorithm predicted the relationship between the bandgap characteristics and the composition of the perovskite oxide, achieving an accuracy of approximately 91%. The confusion matrices produced for various random states (Fig. 11A–E) demonstrated the reliability of this model in differentiating between direct and indirect band gaps, offering a robust framework for understanding the bandgap properties in complex oxide systems. ABO3 containing alkali metals such as Li, Na, K, and Rb at either the A or B site exhibits the greatest potential for possessing a direct bandgap characteristic. Regardless of whether they occupy the A or B site, these perovskite oxides will exhibit a direct bandgap. The incorporation of alkaline earth elements such as Be and Mg, or d metals such as Sc and Fe, significantly enhanced the likelihood of anticipating the direct band gap. According to the presented ML approach, NaPuO3 and VPbO3 materials were identified as promising candidates for solar cells owing to their optimized band gaps.
![]() | ||
Fig. 11 Comprehensive analysis of ML models for bandgap prediction. (A)–(E) Confusion matrices of random forest for perovskite oxides. Reproduced with permission from ref. 150 copyright 2022, Elsevier. (F) and (G) SVR model performance on bandgap predictions of inorganic solids. Reproduced with permission from ref. 151 copyright 2018, the American Chemical Society. (H)–(O) GPR-based bandgap prediction of functionalized MXenes. Reproduced with permission from ref. 70 copyright 2023, the American Chemical Society. |
Zhuo et al.151 demonstrated that SVR after the initial classification of metals and nonmetals can yield predictions markedly closer to experimental values than traditional DFT calculations. The 3896 experimentally reported band gaps in their training set were made up of 2458 distinct compositions derived from measurements of diffuse reflectance, resistivity, surface photovoltaic, photoconduction, and UV-vis. Their research revealed a robust correlation between ML-predicted bandgaps and both experimental and DFT-calculated values, highlighting the effectiveness of ML in bandgap prediction (Fig. 11F and G). The SVR-based ML approach could forecast the bandgap utilizing a descriptor set based on the elemental properties of constituent components, which pertained to the relative location of the atom in the periodic table, its electrical structure, and its physical characteristics. These results underscore how the SVR model can surmount the conventional limitations of DFT, such as the underestimation of bandgap values. Building upon these advancements, Rajan et al.70 applied kernel ridge (KRR), SVR, GPR, and bootstrap aggregating regression algorithms on functionalized MXenes. A database with computed optimum structural and electrical attributes was established to accomplish the goal. Besides, a metal-semiconductor categorization model with 94% accuracy was designed. Readily accessible parameters of MXene, including boiling and melting points, atomic radii, phases, and bond lengths, served as input features, attaining GW-level precision in bandgap predictions. In particular, the GPR model forecasted the bandgap within seconds with the lowest RMSE value of 0.14 eV. The models were more accessible given that they did not necessitate prior information of the Perdew–Burke–Ernzerhof bandgap and the positions of conduction band minima and valence band maxima as predictive variables. The mean boiling point and standard deviation of melting point demonstrated a robust positive correlation with the GW bandgap, revealing the influence of constituent elements. This work effectively leveraged primary and compound features, as illustrated in scatter plots comparing the ML-predicted bandgaps with the true GW values. Consequently, the harvested outcomes underscore the accuracy and reliability of GPR in tackling the challenges of bandgap prediction across diverse material systems (Fig. 11H–O).
![]() | ||
Fig. 12 Relationship between ΔGOH* and overpotential for (A) OER and (B) ORR and (C) correlation among ΔGOH*, ΔGO*, and ΔGOOH*; (D) linear relationship between charge transfer (Ne) and ΔGOH*; (E) calculated d-band centers of different MBenes and (F) scaling relationship between εd and ΔGOH*. Reproduced with permission from ref. 152. Copyright 2023, Elsevier. |
Interpretable ML models offer further insights into variables affecting adsorption processes. Techniques such as Shapley additive explanations (SHAPs) and feature importance analysis help identify key features such as the d-band center and quantity of d-orbital electrons, which are essential for comprehending and enhancing the catalytic activity. Liu et al.154 established that these descriptors are crucial for elucidating the adsorption behavior of bifunctional electrocatalysts, thereby offering a theoretical basis for the design of high-performance catalysts. This understanding has enabled the rational design of novel materials with tailored adsorption properties, hence improving the catalytic efficiency. The integration of ML models with high-throughput testing and autonomous laboratories has significantly accelerated catalyst discovery. Through the use of a closed-loop optimization strategy, ML predictions are validated and progressively refined via experimental feedback, therefore considerably decreasing development time. The synergy between ML, DFT, and experimental methods establishes a robust framework for optimizing the adsorption energies and enhancing the performance of electrocatalysts. The advances in ML-assisted DFT simulations for evaluating adsorption energies in bifunctional catalysts highlight the importance of adopting a hybrid methodology for catalyst improvement, bridging computational predictions with experimental implementations.155
ΔGH* is a crucial descriptor for evaluating the activity of HER electrocatalysts, given that it indicates the binding strength of hydrogen on the catalyst surface, a vital element for optimal catalytic performance. Research has consistently demonstrated that an ideal ΔGH* value close to zero achieves equilibrium between hydrogen adsorption/desorption. Fig. 13A and B illustrate that the correlation between ΔGH* and HER activity in transition metal-doped metal phosphides exemplifies this principle, indicating that catalysts with ΔGH* values approaching zero demonstrate an enhanced HER performance by optimizing adsorption/desorption processes.165 Research has repeatedly demonstrated that an ideal ΔGH* value approaching zero achieves equilibrium between hydrogen adsorption/desorption, as illustrated in the trends of TM-doped MXenes in Fig. 13C.158 In transition metal atom-intercalated g-C3N4/TMD heterostructures, DFT-calculated ΔGH* has been employed to identify excellent hydrogen adsorption sites. Jyothirmai et al.61 illustrated the utilization of RFR with a low MAE of 0.118 eV and a high R2 of 0.957 for precise ΔGH* prediction, markedly improving the accuracy and efficiency of HER activity prediction (Fig. 13D–F). This study demonstrated the substantial influence of chalcogenide choice and electron configurations on the stability of heterostructures and the interaction properties of substrates by introducing ML-driven high-throughput screening of TM atom-intercalated g-C3N4/MX2 (M = Mo, W; X = S, Se, Te). The essential parameters affecting the HER activity were clarified by the SHAP technique, including hydrogen adsorption on the C site, MX layer, S site, and the intercalation of TM atoms at the N site. Important information for strategic catalyst design and optimization was provided by the ML model, which revealed that the hydrogen adsorption energies on the N site of the CN layer resulted in exceptional HER performances with high exchange current densities, especially in Sc and Ti-intercalated heterostructures.
![]() | ||
Fig. 13 Role of hydrogen adsorption free energy (ΔGH*) in HER electrocatalyst performance evaluation: (A) and (B) ΔGH* as a universal descriptor for HER activity in TM-doped metal phosphides. Reproduced with permission from ref. 165 copyright 2023, Elsevier; (C) trends of ΔGH* in TM-doped MXenes. Reproduced with permission from ref. 158 copyright 2023, Elsevier; (D)–(F) RFR-based ΔGH* predictions in TM-intercalated g-C3N4/TMD heterostructures. Reproduced with permission from ref. 61 copyright 2024, the American Chemical Society. |
Alloy electrocatalysts can modulate the adsorption Gibbs free energy of hydrogen to efficiently lower the HER overpotential. The combination of various metallic elements modifies the density of d electronic states, thus attaining an almost ideal ΔGH* value, which is necessary for effective proton adsorption and hydrogen desorption. Catalytic activity improves and energy consumption diminishes when the HER overpotential is decreased. The active site availability and intrinsic activity are further enhanced by the synergistic effects (e.g., strain effects and electronic structure modulation) of alloy components. In this context, Zhou and coworkers established a high-throughput pathway to evaluate the adsorption energy of a catalyst surface and predict its final configuration, identifying 43 appropriate alloys as possible HER electrocatalysts. This can significantly expedite the identification of high-performance HER electrocatalysts (about 100 times faster than DFT). A promising AgPd candidate, beyond the ML dataset, was randomly chosen and rigorously examined using ab initio simulations in a realistic electrocatalytic environment, therefore validating the precision of the ML model and facilitating the identification of suitable structures for calculations.51
Electronic structural characteristics, including the d-band center and Fermi level, are essential for predicting the catalytic activity by describing the interaction between the catalyst and reactants. The electronic structural properties, such as local charge distribution and bonding states, are crucial in influencing the HER activity in SACs (Fig. 14A).160 The d-band center, a commonly utilized parameter for transition metal catalysts, elucidates the distribution of electronic states and affects the adsorption intensity of reactants on the catalyst surface (Fig. 14B).156 Charge transfer is a crucial element for understanding the electronic characteristics of catalysts during catalytic processes. In high-entropy alloys, the combined effects of charge transfer and mixing entropy enhance their stability and HER activity, underscoring their potential as effective HER catalysts (Fig. 14C).166 Additionally, in MBene materials, Bader charge analysis offers quantitative insight into their charge distribution and electron transfer during catalytic reactions. ML investigations have revealed a direct correlation between charge transfer and catalytic performance, as illustrated in the charge maps and activity volcano plots (Fig. 14D–F).167 In recent years, the combination and optimization of ML-based descriptors have emerged as an important direction in catalyst research, especially for the development and validation of descriptors utilizing DFT data. The integration of ML and DFT allows the efficient identification and optimization of descriptors.168 Research on doped MBenes and phosphides utilized structural and elemental characteristics as inputs, employing models such as support vector machine (SVM) and gradient boosting tree (GBT) to attain precise predictions of ΔGH*, thereby accelerating the screening of potential electrocatalysts.167 Recent research has established a multi-step ML workflow for forecasting the HER performance of 4500 varieties of MXenes, effectively selecting the most catalytically active materials.169 In Mo2C MXenes, electrostatic repulsion was recognized as a crucial element influencing HER kinetics, establishing a novel theoretical foundation for catalyst design.161 High-throughput screening employing random forest regression has demonstrated the superior HER performance of transition metal-intercalated g-C3N4/MX2 heterostructures.61 Furthermore, the influence of various metal compositions and cluster sizes on the HER performance in nanocluster electrocatalysts was elucidated by ML, helping to identify the optimal nanocluster combinations.162 Research through ML on supported heteroatom-doped metal compounds demonstrated the synergistic influence of electrolyte type, catalyst shape, and combinations of metals and nonmetals on their HER performance.164
![]() | ||
Fig. 14 Integrative analysis of electronic structure descriptors, charge transfer, and catalytic performance for HER over diverse catalyst systems. (A) Electronic structure characteristics and HER activity over SACs. Reproduced with permission from ref. 160, copyright 2021, the Royal Society of Chemistry; (B) free energy profile and active site analysis for HER on the Z2-βGyNR system. Reproduced with permission from ref. 156, copyright 2023, Elsevier; (C) impact of charge transfer and mixing entropy on HER activity of high-entropy alloys. Reproduced with permission from ref. 166, copyright 2023, Wiley-VCH; (D)–(F) charge transfer distribution and catalytic performance correlation of MBene materials. Reproduced with permission from ref. 167, copyright 2023, Wiley-VCH. |
A multitude of effective case studies utilizing ML and DFT simulations demonstrate the significance of descriptors in the screening and optimization of HER catalysts. For example, in N-doped graphene-based dual-atom catalysts (NG-DACs), descriptors such as ΔGH* and local average electronegativity were employed to swiftly identify high-performance catalysts (Fig. 15A).170 Research on Ni-doped MXenes159 (Fig. 15B) and Pt-doped graphene systems160 demonstrated that the use of descriptors markedly improved the catalytic efficacy. Furthermore, research on bimetallic MXenes has demonstrated the considerable impact of the outer metal atoms on hydrogen binding strength, with Mo2NbC2O2 exhibiting the lowest overpotential, indicating substantial promise as an HER electrocatalyst (Fig. 15C).163 ML-assisted screening of nanocluster catalysts has demonstrated the influence of various metal atom combinations on their performance, particularly at the nanoscale, where the synergistic effects of size and metal composition markedly impact their catalytic activity.162 These investigations illustrate that ML-enhanced high-throughput screening techniques have substantial potential in catalyst design, markedly diminishing the utilization of experimental and computational resources, while expediting the advancement of effective HER catalysts.
![]() | ||
Fig. 15 Comprehensive visualization of descriptor-based screening and optimization strategies for HER catalysts. (A) ML and DFT-assisted workflow for descriptor-based screening of NG-DACs. Reproduced with permission from ref. 170, copyright 2025, Elsevier; (B) adsorption free energy diagram (ΔGH*) of Ni-doped MXenes for HER performance enhancement. Reproduced with permission from ref. 159, copyright 2023, Wiley-VCH; (C) hydrogen bonding strength and overpotential analysis of Mo2NbC2O2 and bimetallic MXene with excellent HER activity. Reproduced with permission from ref. 163, copyright 2024, the American Chemical Society. |
![]() | ||
Fig. 16 (A) Conventional AEM and (B) LOM OER electrolysis in alkaline solutions; (C) dissolution of surface cations represents another OER electrolysis pathway caused by thermodynamic instability under OER catalysis conditions. Reproduced with permission from ref. 171, copyright 2021, the Royal Society of Chemistry. |
Generally, electrocatalytic activities are linked to the catalyst composition and experimental conditions. The oxidation states and structures influence the catalytic efficacy, prompting the consideration of electrolysis as a multifaceted issue. Under these conditions, investigating electrocatalysts is a formidable challenge. The effective design of active catalysts through trial and error necessitates supplementary knowledge and intuition. Catalysis informatics offers AI-driven data-centric design to assist researchers in understanding information, implicit guidelines, and concealed patterns associated with electrocatalysts and electrolysis, hence expediting the design of electrocatalysts. This section emphasizes the utilization of the ML screening method to identify high-efficiency electrocatalysts for the OER process. Fig. 17A illustrates six essential factors to consider when applying X-ides (e.g., transition metal borides, carbides, pnictides, and chalcogenides) as electrocatalysts for OER electrolysis. Objectives must be specified before conducting the investigation. Materials databases and available literature can be utilized to select the most suitable composition of TM X-ides. Researchers should predict the stability of TM X-ide compositions using Pourbaix diagrams and appropriate computer assessments. Efficient physicochemical characterizations are essential for evaluating the composition of the material after conditioning and monitoring its reconstruction processes during long-term testing (Fig. 17B).17
![]() | ||
Fig. 17 (A) Proposed design considerations for evaluating TM X-ide-based electrocatalysts for OER catalysis; (B) analysis workflow for examining physicochemical transformations of TM X-ide electrocatalysts before, during, and after OER catalysis. Reprinted with permission from ref. 17, copyright 2023, the American Chemical Society. |
The performance of 18 perovskites for OER catalysis was assessed at various current densities in an alkaline solution, yielding 1080 data points with each measurement performed three times to ensure analytical reproducibility. A symbolic regression could elucidate the linear relationship between tolerance and octahedral characteristics of perovskites and their corresponding OER activity (Fig. 18A and B).176 Within this scenario, around 3000 theoretical structures were evaluated. Thirteen promising perovskites were synthesized, of which five were produced in pure form, and four exhibited greater catalytic reactivity than previously identified perovskites, demonstrating no significant loss in activity over time. This manifested the experimental utility and data-intensive nature of intimate practical/ML feedback loops.
![]() | ||
Fig. 18 (A) Density profile and Pareto front of mean absolute error versus the complexity of 8640 mathematical formulas; (B) OER catalysis onset potential against the octahedral/tolerance factors ratio (μ/t). The red and black dots mark the discovered and previously known perovskites, respectively. Reprinted with permission from ref. 176, copyright 2023, Springer Nature. |
Furthermore, Hong et al. examined the behavior of 14 descriptors to assess the strength of the metal–oxygen bond in perovskite materials through a literature review and additional analysis. Findings from statistical methodologies including linear regression and factor analysis validated the significance of employing multiple descriptors to enhance the predictive accuracy, highlighting the critical influence of factors such as d-electron density, charge transfer energy, and structural characteristics (e.g., M–O–M bond angle and tolerance factor) on the electrocatalytic activity for OER. The examined predictive models showed superior capability in probing electron occupancy and covalency as initial factors influencing the OER activity compared to traditional single-descriptor methods.177 The disruption of correlations among the adsorption energies of OH*, O*, and OOH* intermediates could enhance the sluggish OER process, aiding in the identification of new electrocatalysts. Recently, ML algorithms have explored various valuable and abundant metal oxides, including 2D compounds, perovskites, and metal oxides, yielding intriguing potential.178–181 Rohr et al. examined the efficacy of many ML models by utilizing a diverse range of pseudo-quaternary metal oxide datasets derived from high-throughput synthesis and electrochemical tests, revealing their OER electrocatalytic activities. The linear ensemble, random forest, and Gaussian process algorithms exhibited varying behaviors, relying on the research target rather than a particular model, as confirmed by exploring three distinct research objectives. The examination of various learning schemes, each comprised of 2121 catalysts over four chemical windows, enabled a research acceleration of 20-fold greater than random acquisition.182 Flores and colleagues tested the electrochemical stability of iridium oxide polymorphs in acidic OER via an active ML accelerated algorithm.183 After creating the dataset of 38000 molecules, 196 polymorphs of IrO2 and 75 polymorphs of IrO3 were examined, focusing on α–IrO3, which suggests a possible detection rate double that of random searching. The acquired structural outputs predicted the octahedral local coordination environments as the low-energy configurations, with the Pourbaix Ir–H2O analysis certifying α–IrO3 as the most stable phase.
Additionally, diverse TMD (MoX2/WY2, where X/Y = S, Se, Te) heterostructure composite electrocatalysts were predicted through integrated AI, specifically utilizing the LASSO methodology in conjunction with quantum mechanics, thereby innovatively employing PL (λ, θ, d, and l) as a universal activity descriptor. The PL descriptor encompasses layer distance, rotational angle, bandgap ratio, and bond length to predict the catalytic efficacy of TMD heterostructures. Analyses of the free energy and binding energy can identify the more stable heterojunction schemes and elucidate the fundamental mechanics of water splitting electrocatalysis. Findings indicated that the unique MoTe2/WTe2 heterostructure, when rotated at 300°, had exceptional electrocatalytic performance for HER and OER, achieving overpotentials of 0.03 V and 0.17 V for HER and OER, respectively, surpassing existing water splitting electrocatalyst systems.91 The utilization of LASSO regression demonstrated the critical descriptors influencing the adsorption behavior, hence improving the computational efficiency and reducing the dependence on time-consuming DFT calculations.
The recent literature indicates that single- and dual-atomic catalysts are highly valued for enhancing next-generation anodes and cathodes, achieving performance levels comparable to noble-based electrocatalysts, while remaining cost-effective. Furthermore, the smart properties of single- and dual-atom catalysts in comparison to their bulk counterparts provide distinctly superior performances due to their quantum size effect and extensive electrochemically active surface area.184 Moreover, significant research endeavors have focused on enhancing atomic catalyst–support interactions utilizing both traditional carbonaceous and transition metal supports. A novel carbonaceous electrocatalyst utilizing graphdiyne was developed, exhibiting advantageous electrical and structural stability due to its abundant sp– and sp2–hybridized carbon framework derived from diacetylene and benzene motifs, respectively, which enhanced its mechanical and chemical robustness.185 Lin et al. presented an effective ML framework for 104 graphene-modified metal–nitrogen–carbon (M–N–C) SACs to analyze the intrinsic patterns of their physical properties and limiting potentials (UL), enabling accurate predictions of the HER/OER/ORR UL for 260 additional graphene-SACs containing metal-NxCy active centers.179 The DFT training data were prudently obtained via an open-source RF technique from a prior study.186 Six descriptors of the OER catalysts were established including the electron number of the d-orbital “d”, the Pauling electronegativity of the metal atom “Em”, the average pKa value of the surrounding atoms “pKa”, the formation energy of the oxide “Hof”, the hydride formation enthalpy “Hxf”, and the cumulative Pauling electronegativities of the metal atoms “Es”.130 Among them, descriptor d was the most significant characteristic for the OER, followed by the Hxf and Hof descriptors.187 The accuracy analysis based on the mean square error evaluation produced a value of 0.021 for the OER after confirming the available data points. The ML framework was utilized to predict the performance by evaluating the UL values of 260 M–N–C SACs. It is important to note that the existing data for metal and non-metal-based OER electrocatalysts with multiple characteristics induced by different descriptors can be utilized to systematically design fully optimized OER electrocatalysts. The universal descriptor can be achieved with an AI algorithm spanning from heuristics to meta-heuristics algorithms. The primary aim of an algorithm is to determine the selection procedure for an appropriate algorithm.
To boost the electrochemical behavior, it is essential to choose the proper electrode material and electrolyte. Novel electrode materials can be theoretically proposed by predicting their properties, but identifying their properties via standard DFT analysis or experimental efforts seems costly and difficult. Attractively, ML algorithms may afford a feasible option to solve the materials exploration problem owing to their ability to capture complex patterns and correlations from existing results. Although screened electrode materials may exhibit high performance and optimized reaction kinetics, the safety of ESCs represents an additional concern. Thus, predicting the degradation of SCs by ML algorithms is primary for the entire system. Thus, the fundamental aim of ML models in ESCs is to construct structure–activity correlation via inexpensive and precise predictions. This section focuses on the recent applications of ML models for monitoring the properties of materials and the design of materials for ESCs. In the case of SC-based carbon electrodes, factors such as heteroatom doping and surface area should be considered when optimizing electrode materials. In addition, the ion diffusion efficiency affects the rate performance of SCs.
In this case, Zhao and coworkers introduced a data-driven project, employing ML tools to optimize the capacitive properties of carbon-based electrodes by well-directing material synthesis and electrolyte selection (Fig. 19).188 The authors utilized a tree-based pipeline optimization technique (TPOT) and GBR model, realizing that N dopants and specific surface area could positively impact the resultant specific capacitance of carbon network electrodes. Interestingly, the N motifs displayed the highest influential role in boosting the capacitive properties because of their tunable electric conductivity and regulated charge transfer dynamics at the electrode/electrolyte interface. Furthermore, N-doping reinforced the stability of the carbonaceous electrode in high-voltage electrolytes. Importantly, given that the voltage window revealed a relatively minor impact on the storage performance, the safety issue in electrolyte selection should be given high priority.
![]() | ||
Fig. 19 (A) Schematic of data-driven construction of SCs; (B) dataset of carbon-based SCs adopted by the ML algorithm; (C) algorithm analysis by the optimized TPOT technique; (D) Shapley additive analysis of parameters, and (E) average of the absolute Shapley value of parameters. Reproduced with permission from ref. 188, copyright 2023, Elsevier. |
The combination of transition metal oxides (TMOs) and graphene oxide (GO) has become important as hybrid electrode materials for SCs. The temperature-dependent capacitance of Co–rGO hybrid electrodes was illustrated using the ML RF model combined with X-ray photoelectron spectroscopy (XPS) findings for the Co ion ratio in a Co–rGO complex fabricated at various temperatures, with an excellent accuracy of 99.9% (Fig. 20A).189
![]() | ||
Fig. 20 (A) Schematic of the project; (B)–(D) XPS spectra of Co 2p core level of the Co–rGO electrode fabricated at 200 °C, 400 °C, and 600 °C; (E) actual and predicted Co3+/Co2+ ratio, (F) capacitance at various synthesis temperatures and 16 wt%; (G) actual and predicted specific capacitance value, and (H) resistance measured at various temperatures and 16 wt%. Reproduced with permission from ref. 189, copyright 2023, Elsevier. |
The outcomes demonstrated that the Co3+/Co2+ ratio of the Co-rGO hybrid increased consistently with an increase in temperature from 200 °C to 600 °C, verifying the ability of the RF model to predict the XPS results and shortening the time-consuming XPS analysis (Fig. 20B–E). The measured capacitance of the Co-rGO electrode followed the order of 600 °C (176.2 F g−1) > 200 °C (160.9 F g−1) > 400 °C (16 F g−1) (Fig. 20F and G). Moreover, the results elucidated that the capacitance was strongly correlated with the ion diffusion kinetics. The electrodes obtained at 200 °C and 600 °C exhibited a higher Warburg slope than that obtained at 400 °C, implying that the electrodes prepared at 200 °C and 600 °C possessed a greater ion transfer rate (Fig. 20H). It could be inferred that the lower ion transfer resistance of the electrodes prepared at 200 °C and 600 °C significantly tuned the overall capacitance compared with the electrode obtained at 400 °C with a large ion transfer resistance.
Transition metal carbides/nitrides (MXenes) are emerging 2D materials for electrochemical energy conversion and storage due to their novel structural/compositional characteristics, modified ionic/electronic conductivities, outstanding chemical stability, and surface functionalities. To achieve a superior performance, researchers are studying their structure–performance relationships by manipulating their microstructure and existing elements. Wang's group explored the structure–property relationship of 600 MXenes such as M2XT2 (T = bare, O, S) and their doped configurations. The authors screened the metallicity and stability via hydrogen ion adsorption using the DFT technique. The ML-derived sure independence screening and sparsifying operator (SISSO) model developed the pseudocapacitance formulas of 200 MXenes (e.g., M2X, M2X-m, M2XO2(-m), M2XS2 and M2XS2-m) based on their key features. Among the M2X systems, Sc2N and Ti2N revealed the best pseudocapacitance values, which were ascribed to their stronger hydrogen ion binding. Statistical analyses claimed that the elements that contributed significantly to the high pseudocapacitance of group-free, O-functionalized and S-functionalized MXenes were positioned in the upper left, lower left, and upper right of the periodic table, respectively.190
2D layered TMDs are potential candidates for nanoelectrochemical and flexible electronic systems. These materials offer new chances to achieve extraordinary functionalities owing to their quantum confinement and the occurrence of a direct bandgap within their monolayer scheme. Understanding the mechanical properties of 2D TMDs in different environments is crucial to ensure their prolonged operation in flexible electronics. In this context, ML algorithms (long short-term memory “LSTM” and feed-forward neural network “FFNN”) paired with molecular dynamics simulations were established to predict the mechanical behavior of MX2 (M = Mo, W and X = S, Se) TMDs with more than 95% accuracy. The LSTM model could predict the stress–strain response with an accuracy close to 1 for the training and validation samples. With a similar accuracy level, the FFNN model could predict the Young's modulus, fracture stress, and fracture strain. More importantly, both ML models estimated the mechanical properties of 2D TMDs under varying operating conditions.191
Metal oxynitrides with a composition ranging between pure oxides and nitrides have favorable physicochemical properties, including chemical inertness, elevated melting point, and high thermal stability. The presence of nitrogen in the oxide phase leads to strong faradaic redox reactions, yielding high rate capability and long cycle life. Thus, the research on advanced metal oxynitride materials requires further exploration. As a proof-of-concept study, the specific capacitance and cyclic stability of the cerium oxynitride electrode were predicted by ML models (multilayer perceptron model (MLP), RF, APRF and APMLP) (Fig. 21A). A specific capacity of ∼26.6 mA h g−1 at a current density of 2 A g−1 with a capacity retention of > 90% over 10000 cycles could be obtained under specific material characteristics of morphology, composition, and surface area operational conditions (e.g., current density and applied potential window) (Fig. 21B–D). The experimental findings (∼26.6 mA h g−1 and ∼100% capacity retention) matched well with the predictive strategic approach (Fig. 21E).192 Based on the above discussion, the prediction of electrode materials requires merging proper descriptors with ML methods. Some correlation strategies (e.g., sequential backwards selection algorithm and contribution analysis) and embedded analysis techniques were employed to identify the key parameters that impact the electrode material properties. Consequently, advanced electrodes can be rationally engineered by researchers. However, the correlation between the targeted properties of electrode materials and selected descriptors is complex in the majority of cases. Therefore, utilizing multiple ML algorithms for optimizing and creating huge data through virtual simulation may be helpful for optimal prediction.
![]() | ||
Fig. 21 (A)–(D) Predicted against actual values of specific capacitance calculated by multiple value-prediction models; (E) plot of error values compared to actual values of specific capacitance obtained by RF, APRF, MLP and APMLP models. Reproduced with permission from ref. 192, copyright 2021, Elsevier. |
The effectiveness of these descriptors has been demonstrated in multiple studies. Su et al. demonstrated that an artificial neural network trained on graphene-based electrodes with descriptors covering surface chemistry, pore structure and i_EDLC/i_pseudo could predict capacitance with R2 = 0.88 on unseen data.195 Zhu et al. applied a feed-forward neural network to a broad library of carbon materials and showed that including ΔEp and redox-site density improved the prediction accuracy by 15% over models using only textural features.196 In hybrid systems, Yogesh et al. used the TPOT AutoML framework on graphene-oxide nano-ring electrodes employing descriptors such as i_EDLC/i_pseudo, mesopore volume and interlayer spacing to discover formulations that raised the energy density by ≈25% under 1 A g−1.197 Moreover, Nanda et al. developed ML models correlating hybrid-device cyclic stability with descriptors spanning redox-site density, interface synergy and σ, achieving an R2 of 0.90 for cycle-life prediction.198 Together, these works illustrate how physically grounded, hybrid-specific descriptors and explainable ML can establish a closed-loop workflow for the discovery and optimization of next-generation hybrid SCs.
Recent advances demonstrate that ML-guided strategies can substantially reduce both the number of experiments and material consumption in materials discovery workflows. For example, active learning reduces DFT simulations by over 70%, reducing the computational time and energy use.188 Building on this efficiency, Ceder's A-Lab platform combines robotic synthesis with ML heuristics and DFT energetics to achieve a success rate of 71%, synthesizing 41 of 58 predicted solids in 17 days and delivering more than two new materials per day.202 Moreover, adaptive Bayesian optimization further extends these gains to the laboratory by reducing the number of physical trials by about 70% when targeting specific properties.203
However, although ML can facilitate material usage and minimize waste, it is associated with certain economic and sustainability issues. Substantial resources and energy are necessary for running large-scale ML models, leading to significant costs. Although high-throughput synthesis is effective in performing massive datasets, material waste occurs if not properly handled. Thus, to overcome these drawbacks, scientists should explore more energy-efficient ML algorithms and anticipate sustainable practices in experimental configurations. Utilizing reusing and recycling approaches for electrodes/electrocatalysts in high-throughput tests may reduce waste. The gap between idealized environments usually expressed by DFT calculation simulations and the sophisticated operational conditions of electrodes/electrocatalysts is substantial. Future research topics should prioritize high-throughput experimental routes that investigate device-scale synthesis and analysis. These experiments are realistic, enabling the practical deployment of ML-optimized electrode/electrocatalyst configurations and creating leapfrog-based theoretical models for feasible operational systems. Collaboration between industry and academia can effectively address these challenges. Partners from the industry can offer support and real-world requirements that instruct researchers toward more applicable opportunities. Special collaborative initiatives may involve shared datasets, industry-sponsored projects, and joint research enterprises. Resource sharing, elevated innovation, and practical utilization of research findings are few, and thus, teamwork between industry and academia can address the economic and sustainability challenges in electrochemical energy storage and conversion. The collaborative role will allow the rapid progress of ML applications in electrode/electrocatalyst design and optimization, resulting in the advancement of more sustainable and efficient energy conversion and storage technologies.
Footnote |
† Diab Khalafallah and Fuming Lai contributed equally to this work. |
This journal is © the Partner Organisations 2025 |