Open Access Article
Chandra
Chowdhury†
a,
Matteo
Lovato†
ab,
Giovanni
Di Liberto
a,
Francesc
Viñes
b,
Francesc
Illas
b,
Gianfranco
Pacchioni
a and
Livia
Giordano
*a
aDepartment of Materials Science, University of Milano-Bicocca, Via Cozzi 55, 20125 Milano, Italy. E-mail: livia.giordano@unimib.it
bDepartament de Ciència de Materials i Química Física & Institut de Química Teòrica i Computacional (IQTCUB), Universitat de Barcelona, c/ Martí i Franquès 1-11, 08028 Barcelona, Spain
First published on 24th December 2025
The efficient design of single-atom catalysts (SACs) for hydrogen evolution reaction (HER) requires rapid and accurate screening methods capable of handling a large number of candidate materials. Traditional density functional theory (DFT) calculations, while reliable, are computationally expensive, necessitating alternative approaches for high-throughput screening. In this work, we create our own database based on DFT calculations using MXene-based SACs for HER and develop an interpretable machine learning (ML) model that predicts hydrogen adsorption free energy (ΔGH) with an accuracy of 0.17 eV, relying solely on simple, non-DFT-calculated features. Our approach systematically transitions from a black-box model to a grey-box model, incorporating feature importance analysis to identify key descriptors influencing ΔGH. This ultimately leads to the development of a glass-box model, where we derive a transparent analytical equation that allows for direct prediction of ΔGH using raw feature values. Importantly, the final model does not require data standardization or complex computational simulations, making it highly accessible for experimental researchers and enabling rapid, first-order screening of MXene-based SACs for HER.
SACs have received great attention in recent years, due to their potential to minimize the critical raw metal loading and improve the catalytic activity and selectivity.17,18 SACs are made by transition metal atoms atomically dispersed in a solid matrix, maximizing the utilization of the active phase.19 Additionally, the catalytic activity of SACs can be tailored and eventually engineered by modifying the nature of the support and the local coordination of the metal atom.20 The typical supporting matrices are oxides and 2D materials, most often involving carbon-based supports, such as graphene, N-doped graphene, and carbon nitride.21 In recent years, MXenes, a new family of 2D materials,22 have attracted attention as supporting templates of SACs to catalyze a variety of reactions, including the HER.23–28 MXenes are two-dimensional (2D) transition-metal carbides, nitrides, and carbonitrides, and they are promising in electrochemistry owing to their electronic conductivity and ability to anchor metal atoms and eventually to host alkaline metals in accumulators.22,29,30
MXenes are synthesized from MAX phases, a material family with the general formula of Mn+1AXn (n = 1–3), where M, A, and X correspond to early d-transition metals, main group p-elements, and C and/or N, respectively. Starting from these materials, MXenes can be obtained by selective chemical etching of the A element.29,31 Using selective etchants, it is possible to break the M–A bonds that are weaker than the M−X bonds, resulting in highly active MXene sheets. Depending on the etchants and the subsequent chemical treatment, the thus obtained MXene sheets might be terminated with surface groups, such as fluorine (–F), hydroxyl (–OH), hydrogen (–H), or oxygen (–O).29,32 By selecting the M and X elements, the number of atomic layers in the sheet, and the type of surface terminations, a huge number of possible MXene-based materials can be synthetized, affording the possibility to customize their properties.31,33 Additionally, recent work has shown that the surface termination can be removed or substituted in a rational, predetermined way.34–36 Recently, Oschinski et al.37 used density functional theory (DFT) to study the interaction of first-row transition metals with M2C (M = Ti, Zr, Hf, V, Nb, Ta, Cr, Mo, and W), aiming to provide a rational guide to the design of new SAC catalysts based on MXenes and facilitate the experimental work to synthesize stable systems.
The improvement of computational power is promoting computational (electro)catalysis studies towards an even more ambitious goal, the development of descriptors and data-driven statistical models for the prediction of catalytic activity.38,39 In the case of the HER, this translates to seeking models capable of predicting ΔGH based on a set of defined physical properties (features) of the catalyst. Nevertheless, there are multiple ways to pursue this goal, and one can adopt different data-driven approaches to find relevant correlations between the target variable (ΔGH) and a number of features of varied complexity levels. The drawback of this approach is that it typically results in a complex mathematical expression for ΔGH as a function of the input variables, where the functional form is difficult to interpret in physical or chemical terms. The input parameters often require the full DFT calculations of adsorbed H to be determined, raising some questions on the real predictive power of the schemes and their usefulness. There are several recent studies showing the usefulness of machine learning models in predicting performances across several electrocatalytic applications based on MXene systems.40,41 For instance, Zheng et al.40 utilized a high-throughput screening approach combined with a machine learning (ML) model to predict the hydrogen binding energy and then the HER activity of a total of 299 MXene materials. Their methodology incorporates features such as the d-band centre and Bader charges, which necessitate DFT calculations of the H adsorption on the active site. They found that the random forest algorithm predicts hydrogen binding energy with a root mean square error of 0.27 eV. Likewise, Umer et al. incorporated a comprehensive set of 29 features in their study of rationally designed 364 catalysts by embedding 3d/4d/5d TM single atoms on diverse substrates, like graphitic carbon nitride (g-C3N4), π-conjugated polymer, pyridinic graphene and hexagonal boron nitride, including complicated descriptors such as the Coulomb matrix. Additionally, they integrated DFT-derived features to further refine their predictive model.40 At this stage, we recognize a gap in the availability of a straightforward and interpretable model that relies solely on simple features. Such a model should provide a clear, user-friendly equation for predicting the target using raw feature values, without the need for data standardization. This would enable broader accessibility, allowing anyone to utilize the derived equation for initial screening with ease.
In this work, we performed a computational screening of MXene-based SACs for HER with the aim of providing a straightforward and interpretable model for the prediction of ΔGH from simple variables that do not require any DFT calculations of the SAC or hydrogen adsorption on the SAC. We expanded the dataset of transition metals reported in ref. 37 to the second and third rows, thereby constructing a comprehensive dataset covering the complete transition-metal series. We then considered hydrogen adsorption on the resulting SACs, obtaining a dataset of 208 ΔGH values. We trained statistical models with different numbers of features and levels of complexity to predict ΔGH, by paying close attention to the simplicity of the descriptors and the interpretability of the model in physically meaningful terms. We began with a black-box model and gradually transitioned to a grey-box approach41 by conducting feature importance analysis across models ranging from linear to complex, which essentially provided us with the reason behind the noted differences in model performance. Ultimately, we developed a glass-box model41 by formulating a simple equation for the target using the interpretable and meaningful attributes, relying solely on non-DFT calculated features. Our results show that it is possible to predict a target variable with a linear multivariate model based on tabulated or easy-to-get features with a reasonable accuracy (about 0.17 eV), without the need to invoke complex models or not easily interpretable features. Please note that we do not claim this as the first or only interpretable model; instead, we emphasize its significance as a physically grounded, computationally efficient surrogate that connects chemical intuition with data-driven learning. The provision of a comprehensive analytical expression (eqn (5)), necessitating no preprocessing or scaling, aims to improve accessibility for researchers, especially in experimental contexts where implementing complete ML pipelines may be impractical.
The results of this study could provide a first-order screening of MXene-based SACs for the HER, aimed at identifying promising candidates for more elaborate calculations, including electronic structure and geometrical relaxation analysis, as well as the consideration of other effects, such as multiple hydrogen loading, solvation, pH and applied voltage, to ultimately guide the experimental synthesis and characterization. We remark that the present calculations were conducted using bare M2C (0001) MXene surfaces, which serve as an idealized structural model. Under the experimental conditions, MXenes are generally terminated with surface anions, such as –O, –OH, or –F, or a mixture of them,42 depending upon the synthesis and etching methods employed. These terminations can partially passivate the surface, hence reducing the metal binding strength compared to the uncoated surface. We chose to work with bare surfaces due to the need to identify inherent trends in metal–support interactions and to maintain uniformity across various MXene substrates. While absolute binding energies may be reduced, the comparative trends and feature–property correlations established herein are anticipated to remain valid. Future research will incorporate terminated MXenes to specifically assess the impact of surface functionalization on single-atom stability and hydrogen adsorption energetics.
MXenes with the M2C stoichiometry, comprising three atomic layers and exhibiting the (0001) surface, were modelled by fully optimizing the atomic structure of the supercell representing the nanosheets using previously optimized lattice parameters.37 Thus, the atomic coordinates of the nanosheet atoms and the adsorbed TM atom were all fully relaxed. For the bare MXene, the atomic structure of the supercell is the same as that of the primitive unit cell. We calculated the binding energy of the anchored transition metal (TM) adatoms, as follows:
| Eb = ESAC − Es − ETM | (1) |
![]() | (2) |
We employed permutation feature-importance analysis for feature selection and conducted a Pearson correlation analysis for the considered features. We also employed various supervised machine learning techniques to predict the features, including traditional Multiple Linear Regression (MLR), Ridge Regression (RD), LASSO, Random Forest models (RF), Extra Trees (ET), Decision Trees (DT), Gradient Boosting (GB), Extra Gradient Boosting (XGB) and AdaBoost (ADB). To evaluate the generalization capability of our ML models, we implemented a careful data splitting approach. Initially, we randomly partitioned 15% of the full dataset as a strict external test set, entirely removed from any aspect of model training, scaling, or feature analysis. The external test set (15% of the full dataset) was extracted once at the beginning of the workflow using a fixed random seed for reproducibility. This held-out subset was kept constant and used consistently across all machine learning models. This ensured that all performance comparisons were made on the same test data, providing a fair and controlled evaluation of the generalization performance. The remaining 85% of the data was subsequently used for training and validation. All model development, excluding the leave-one-out procedure, was performed exclusively on the 85% dataset using 10-fold cross-validation. All data preprocessing, including feature scaling using StandardScaler within Pipeline, and model fitting were conducted solely on the training set within the cross-validation loop. Identical scaling values were utilized for the validation fold and external test sets to prevent leaking. Furthermore, for the tree-based models (e.g., RF, GB), we avoided any early stopping or information leakage from the test set by depending exclusively on internal splits or designated validation subsets. For models that facilitate internal validation (e.g., GB), the validation_fraction was confined to the training set exclusively. In the leave-one-out procedures, the training set for each excluded support or metal was fixed and included all remaining data. Although the held-out subset remained fixed in each case, the models were retrained with internal randomization (e.g., bootstrapping, feature selection), leading to slightly different predictions across runs. The reported error values reflect the average across these 1000 randomized training iterations. We used a ten-fold cross-validation method with a fixed random seed (seed = 42) to provide a fair and robust evaluation while preventing the overfitting of our model. Mean absolute error (MAE) was used as a metric for the ML models. MAE defines the average magnitude of errors between the values predicted by our ML model and those obtained from DFT calculations. For all the ML model training and testing calculations, we used the Scikit-learn library.60 The importance analysis was based directly on the intrinsic parameters of each model type, rather than on permutation scores. For linear models (LR, RD, LASSO), we calculated importance using the absolute standardized regression coefficients, which measure each descriptor's contribution to the target following feature scaling. For tree-based and ensemble models (Decision Tree, Random Forest, Extra Trees, Gradient Boosting, AdaBoost, and XGBoost), we utilized the inherent mean decrease in impurity (MDI) values, which indicate the average variance reduction related to each feature across all decision nodes. To guarantee robustness, all importances were averaged across 100 randomized 80/20 train-test splits and normalized for each model prior to aggregation.
The interaction of the different first-row adatoms on each of the MXene surfaces considered has been investigated by considering the preferred adsorption sites found by Oschinski et al.37 Regarding the second and third TM row atoms, we investigated four different high-symmetry sites: the atop site (T) where the adatom is placed directly above an M surface atom, the bridge site (B) corresponding to the midpoint of the M–M bond, and two hollow sites that correspond to carbon (hcp) and empty (fcc) sites (see Fig. S1 of SI). After geometry optimization, we found that the hcp and fcc sites are preferred in most cases, regardless of the adatom or substrate. We found that metal adsorption is favourable for all systems considered, with binding energies ranging from −2.6 to −8.2 eV, as shown in Fig. 1a. As already discussed in ref. 37, the energetic trends depend not only on the adatom but also on the MXene substrate, Fig. 1a. Three similar patterns are observed for the three TM rows (3d, 4d, 5d), with a W-shaped behaviour consisting of two local minima and three local maxima for the binding energy, known to be related to the special stability of d5 and d10 transition metals in vacuum.61,62 Moving along each of the series (Sc–Cu, Y–Ag, Hf–Au), the value of Eb decreases until the first minimum (Ti, Zr/Nb, Ta), after which it starts to increase, reaching the first maximum (Cr/Mn, Tc, and W/Re, which have a half-occupied nd shell). Eb then decreases, reaching the second local minimum (Ni, Ru, Os/Ir), and finally increases to reach the second maximum (Cu, Ag, Au, which have a fully occupied nd shell). In other words, elements with nearly half-filled or filled nd shells are characterized by a weak interaction with the surface. The effect of the substrate can be observed as a change in the values of Eb, which follow the same pattern along the series. In every series, this effect is more marked in the early elements of the row, where the Eb differences between the same TM in different substrates are more pronounced and can reach several eV, while for the late elements of the row, the effect of the substrate is less noticeable, typically within 1 eV. Interestingly, the order of stability for the adsorption of the same TM atom on different substrates is maintained for the most part, with a few exceptions. In general, Mo2C and W2C bind the adatoms with the most negative energy values, while Hf2C and Zr2C form the weakest bonds. The absolute maximum values, corresponding to weak bindings, for the different TMs are achieved by the following systems: Cr@Ta2C, Mn@Ti2C, Ag@Hf2C and Ag@Ta2C, while the absolute minimum values, corresponding to strong bindings, are achieved by Ru@V2C, Os@V2C and Ir@Ti2C. A similar trend (especially along the 3d series) is observed for the average distance between the TM and the three nearest neighbours (cf. Fig. S2a), indicating that very negative binding energies correspond to short distances.
The degree of charge transfer between the transition metal and the substrate has been analysed to gain more insight into the interactions between the TM and the eight substrates. Fig. 1b reports the excess TM charge (ΔQ) calculated for the supported TM atom and obtained from the analysis of the electron density through the QTAIM developed by Bader.55–58 Interestingly, the same pattern is reproduced on every substrate, and a similar parabolic trend repeats for the three TM series (see Fig. 1b). Another clear tendency is observed depending on the type of metal constituting the substrate. In fact, it can be observed that the charges on the adatoms adsorbed on d2 M2C (Ti2C, Zr2C, Hf2C), d3 M2C (V2C, Nb2C, Ta2C), and d4 M2C (Mo2C, W2C) follow similar well-separated trends along the d series. Furthermore, the sign of the excess charge ΔQ depends on the type of adatom–substrate combination. For example, the values of ΔQ for the d2 M2C substrates are negative in most cases, except for Sc and Y, Fig. 1b. For d4 M2C, the values of ΔQ are mainly positive with some exceptions, while for d3 M2C, the ΔQ values are more TM-dependent. Clearly, the oxidizing character of the d4 M2C MXenes is larger than that of their d3 M2C and d2 M2C counterparts. As expected from the electronegativity trend, the early TMs in each d series tend to donate charge to the substrate or remain neutral species, while electrons are transferred from the substrate to the late TMs, which become partially reduced.
The magnetization of the TMs anchored on MXene substrates also follows a periodic trend, as shown in Fig. S3 of SI. Early and late TMs in the d-series expose a small number of unpaired electrons, while the atoms in the middle of the series exhibit large spin values, regardless of the MXene substrate.
Once the trends in the properties of the TM@MXene systems have been obtained, the adsorption free energy of H on the SAC, considered as H*, becomes a key intermediate in the HER. For each TM@MXene, a hydrogen atom was adsorbed on top of the TM. Interestingly, the optimized distances between the hydrogen adatom and the TM of the SAC follow a periodic trend (Fig. 2a), with a parabolic shape similar to the one obtained for the TM charge reported in Fig. 1b. For each d-series, the TM–H distance is the largest for the first atom in the series and then decreases, reaching a minimum for Group X metals (Ni, Pd and Pt), except for Ti2C, Zr2C and Hf2C substrates, where the minimum occurs earlier in the series. We also noticed a strong correlation between the TM–substrate distance before and after hydrogen adsorption, indicating that the adsorption of H affects the interaction of TM with the substrate similarly for all considered systems (see Fig. S2 of SI).
The obtained values for the H adsorption free energy are reported as a function of the TM (Fig. 2b). Unlike the case for the other quantities discussed above, here, it is not possible to extrapolate a specific trend or relationship with the substrates or the d-series. The energy results are distributed in a range of values between −0.3 and 1.2 eV, and the adsorption Gibbs free energy is negative only for a few systems. The lack of trend for ΔGH with the TM or the substrate raises the question of whether any physical property of the SAC can be used as a descriptor for this quantity. To address this, we are trying to design an interpretable machine learning model that utilizes simple, easily accessible features, aiming to provide a transparent and intuitive predictive framework.
The first group contains information describing the properties of the TM constituting the SAC. The second group includes information about the supports and their interaction with the TM. The last group accounts for the interaction of SAC with hydrogen. The features of the first group are the electronegativity of the TM (χTM), the number of TM valence electrons (NVAL), the TM covalent radius (RCOV), the first ionization energy of the TM (IE), and the electronegativity of M in the MXene substrate (χS). The data of this group of features are tabulated63 and do not need any simulation. The second group contains the following features: binding energy of TM to the support (Eb); mean distance between the adsorbed TM and the nearest neighbouring atoms of the substrate (d); and charge accumulation or depletion of the metal from the substrate (ΔQ), calculated by taking the difference between the number of valence electrons in the isolated TM atom and the number of electrons in the anchored state, as provided by the Bader method. This quantity represents the charge acquired by the TM upon adsorption. The data of these features require the DFT geometry optimization of the bare MXene support and of the TM adsorbed on the support. The third group of features includes the binding energy of isolated diatomic H–TM molecules (ΔEisoH) and the corresponding bond length (disoH). These two variables involve calculations of isolated TM atoms and their interaction with a hydrogen atom, and therefore, they do not involve the simulation of specific supported SACs. Considering all three groups, we have a total of 10 features for designing the model. An increase in the number of features occasionally improves the accuracy of machine learning models by supplying additional information for predictions, but at the same time, it adds complexity to the model and possibly generates overfitting issues.
To check the reliability of our predictive model, we incorporated supplementary properties that, although not directly associated with hydrogen evolution activity, may nonetheless provide significant insights. These properties encompass electron affinity, atomic mass, melting point, boiling point, the principal quantum number of the valence shell, and TM density, as it is known that in SACs, the regulation of atomic density can affect the catalytic performances.64 The resulting expanded dataset currently has a total of 16 features. We aim to compare the performance of our model utilizing both the original collection of 10 features and the combined set of 16 characteristics, enabling us to assess the impact of these new properties on the model's accuracy and predictive efficacy.
![]() | ||
| Fig. 4 (a) Correlation matrix for the features considered in this work, defined in Fig. 3. (b) Performance of individual features for predicting the target property, namely ΔGH by a linear regression model, which is quantified as MAE. Features are ranked according to their MAE values, with lower values signifying better predictive performance. | ||
We notice a relatively strong relation between the TM properties in the upper-left corner (from χTM to IE), indicating that they are not completely independent. It can also be seen that ΔQ has a significant correlation with the TM properties. Furthermore, χS has no significant correlation with any other variable. Similarly, the properties of SAC@MXene, i.e., Eb and d, do not correlate strongly with each other or with the substrate electronegativity. Finally, the properties of the TM–H diatomic molecule, ΔEisoH and disoH, do not show any strong correlation with other properties, other than the weak correlation of ΔEisoH with Eb. Metrics such as ΔQ, IE, and distance-related measures exhibit various degrees of correlation, illustrating the complex interplay of structural, electronic, and magnetic features that affect the hydrogen adsorption energetics. The association among these properties offers insight into their combined effect on hydrogen evolution, potentially informing the selection of catalysts for enhanced performance.
The predicted values with the multivariate linear method against the original DFT data are reported in Fig. 5a. Overall, the model provides a MAE of 0.159 eV, which is comparable to the intrinsic error of the DFT methodology. This result is quite remarkable since it indicates that calculating the adsorption free energy from the multivariate linear model would not induce a loss of accuracy. Fig. 5a demonstrates the superior prediction power of the multivariate linear regression model relative to the single-feature-based linear regression model. Although the individual features had a nearly linear, low-correlation relationship with the target variable (cf. Fig. S5 of the SI), the integration of several features in a linear regression framework shows a robust correlation with the target, as indicated by the close clustering of points around the parity line (see Fig. 5a). The enhanced correlation indicates that the multivariable model effectively captures complex interactions between features and the target variable, which individual features could not accomplish independently.
![]() | ||
| Fig. 5 Multivariate linear regression. (a) Parity plot of the multivariate linear regression for ΔGH as a function of the features considered in this work, defined in Fig. 3. The shaded grey area corresponds to ± 0.2 eV, which corresponds to the typical errors of DFT calculations. (b) Analysis of MAE for several machine learning models, namely, Linear Regression (LR), Ridge (RD), LASSO, Extra Trees (ET), Random Forest (RF), Decision Tree (DT), Gradient Boosting (GB), AdaBoost (ADB), and XGBoost (XGB). | ||
As shown in Fig. 4a, some of the tabulated transition-metal properties, particularly electronegativity (χTM), ionization energy (IE), valence electron count (NVAL), and covalent radius (RTM), exhibit moderate to strong pairwise correlations (∼0.8), reflecting their shared periodic origin. However, the single-feature regression results presented in Fig. 4b show that none of these descriptors alone can accurately predict ΔGH (all single-feature MAEs > 0.18 eV), indicating that, despite their interdependence, they capture complementary physicochemical information rather than redundant relationships. To ensure that these correlations do not compromise the model's robustness, we applied regularization techniques (Ridge and LASSO regression), which constrain large coefficient magnitudes and stabilize correlated inputs. The regularized models produced nearly identical MAE values (0.16–0.18 eV) compared to the unregularized linear regression, confirming that multicollinearity does not significantly affect predictive reliability. Furthermore, to verify this conclusion, we performed an additional analysis by removing the most strongly correlated features (IE, χTM, NVAL, and RTM) and re-evaluated the linear regression model. The resulting MAE increased slightly from 0.159 eV (all features) to 0.179 eV (after feature removal), demonstrating that the correlated descriptors contribute synergistically to the model's predictive accuracy without introducing instability. These results confirm that the inclusion of these physically meaningful descriptors enhances the model's performance while maintaining interpretability.
First, we performed a feature-importance analysis to further investigate the findings of the multivariate linear regression model. This is essential for identifying the features that most significantly affect the model predictions, elucidating the key factors that influence the target variable. Understanding feature importance allows us to explain the model with greater clarity and transparency. The feature importance has been evaluated by the absolute values of the coefficients, as reported in Fig. 7a. Interestingly, from the feature-importance analysis, we found a comprehensive overview that connects the multivariate linear regression outcomes with the insights gained from individual feature performance. In the single-feature analysis, NVAL and IE were identified as the features exhibiting relatively low MAE values, indicating that they possessed the most robust individual correlation with the target variable. The significance of NVAL and IE is further validated in the multivariate model, where they constitute approximately 24% and 18% of the model's predictive capacity, respectively, highlighting their essential functions in both univariate and multivariate cases. Likewise, ΔEisoH and χTM, which had moderate correlations in the single-feature analysis, became significant in the multivariate model. This indicates that although these features may individually possess inadequate predictive power, their synergistic effect with other factors improves the model's accuracy. The minimal contributions of the other features in both univariate and multivariate models indicate that they offer additional information but are not robust independent variables. This comprehensive analysis highlights the need to integrate essential features, such as IE, NVAL, and χTM, to enhance the predictive model, utilizing their distinct and complementary insights into the target variable.
![]() | ||
| Fig. 7 (a) Feature-importance analysis for multivariate linear regression (b) and gradient boost models. | ||
Similarly, to closely examine the predictive accuracy of complex models, we have performed feature-importance analysis for each model and found an interesting correlation between feature distribution and model performance. For this analysis, we considered feature-importance analysis for models like GB and compared the results with the MLR results (Fig. 7b). For all other models, the analysis is shown in Fig. S6 of SI. The feature-importance distributions for the MLR (cf.Fig. 7a) and GB (cf.Fig. 7b) models illustrate the effect of increased model complexity on feature utilization and prediction accuracy. In the simplified MLR model, importance is predominantly focused on a limited number of essential features, which are mainly from Group 1, with IE, NVAL, and χTM collectively prevailing, signifying a dependence on these primary descriptors. In the more complex GB model, feature importance is more evenly distributed across a broad array of all feature groups, indicating that the GB model identifies further interactions among descriptors and utilizes a wider spectrum of information. This distributional shift corresponds to the rather superior accuracy of GB relative to that of MLR (MAE values: 0.133 vs. 0.159 eV), demonstrating that complex models can be tailored to more effectively capture nuanced patterns in the data by assigning appropriate weights to a greater number of relevant features, hence improving predictive performance.
To gain more insight from the feature-importance analysis and to delve more into the grey-box approach, we created a heatmap (see Fig. 8a), which provides insight into the feature importance of each model. Linear models tend to prioritize a few easily accessible features (mainly from Group 2 & Group 3), whereas complex models distribute importance across a wider range of features. To better understand how each model assigns importance to feature groups, we analysed the performance of each model by evaluating each group individually, as shown in Fig. 8b. Fig. 8b presents a comparison analysis of the feature-group importance across various models, demonstrating the distribution of importance among Group 1, Group 2, and Group 3 features for each model. Simple models, such as LR, RD, and LASSO, mostly focus on Group 1 and Group 2 features, indicating their dependence on basic properties for predictive power. With the enhancement of model complexity, tree-based models (e.g., RF, ET, DT) and boosting techniques (e.g., XGB, GB, ADB) tend to allocate importance more uniformly among all groups, with substantial emphasis placed on features from Group 2 and Group 3.
This transition signifies that complex models gain from a wider range of feature interactions, compiling further information beyond the basic features in Group 1. This distributional pattern reinforces the notion that simpler models emphasize essential aspects, but more advanced models utilize a broader array of properties across groups, hence improving their capacity to identify complex interactions within the data. For all complex models, we computed the correlations between each group of features and the target (see Fig. S6–S13 of SI). In complex models, such as tree-based and boosting models, features from Group 2 and Group 3 demonstrate non-linear correlations with the target variable. In these models, data points are concentrated around the parity line, signifying robust prediction capability. Conversely, in linear models, the features of Group 2 and Group 3 exhibit almost linear plots (see Fig. S14 of SI), with negligible connection to the target.
Next, we trained the model on a subset of the dataset, which included all data except those associated with a single support. The testing set was created using SAC supported by the newly excluded support, employing the leave-one-out method. The schematic diagram for this leave-one-out method is shown in Fig. 9a for clarity. This method enables the assessment of the model's predicted performance for each substrate without prior exposure to that particular substrate, thus evaluating the model's predictive capacity across various substrates. We iterated the method for all eight substrates 1000 times, obtaining an averaged MAE of 0.176 eV and a maximum MAE of 0.250 eV (W2C).
The violin plot (Fig. 9b) illustrates the distribution of MAE values for each substrate across 1000 experiments, offering insight into the variability and consistency of the model's predictions. Substrates such as Nb2C and V2C have low and closely grouped MAE distributions, with mean values of about 0.141 and 0.144 eV, respectively. Thus, the model predicts these substrates with precision and consistency over iterations. Conversely, W2C and Mo2C have relatively elevated average MAE values (0.250 and 0.223 eV, respectively) and wider dispersion, indicating reduced predictive reliability. The shape and spread of each violin plot show the model's stability across many substrates, with narrower distributions and lower MAEs indicating more accurate performance. Upon analyzing the reason behind the different prediction accuracies for different substrates, we found that substrates like Nb2C and V2C have a smaller distribution of target variables (approximately 0.13–1.07 eV), whereas for substrates like W2C, Mo2C, the distributions of target variables are higher (approximately −0.26 to 1.17 eV). Finally, the same leave-one-out procedure was carried out by including all substrates and iteratively excluding one adsorbed metal at a time, so that the model is tested on a single metal atom never seen before. The averaged MAE is 0.232 eV, slightly higher than the previous values. The corresponding violin-plot for the leave-one-out experiment for metals is shown in Fig. S15 of SI. We identified Ru and Rh as being critical, as excluding them from the average calculation decreased the MAE to 0.174 eV. It is important to note that in this last assessment of the predictive power of the model, the testing dataset is quite small (only eight data points), which could affect the accuracy. These assessments show the robustness and portability of the resulting model. Overall, the model can predict the hydrogen adsorption energy of unknown systems, and the method could be a starting point to screen new catalysts based on similar substrates.
It is to be noted that to evaluate the generalization and robustness of our model, we conducted a series of validation procedures, including ten-fold cross-validation and leave-one-out tests. In the tests, each substrate and metal was systematically excluded from the training set and subsequently predicted as novel systems (see Fig. 9 and S15, SI). These supplementary methods guarantee that the model's predictive power is not limited to a particular subset of the dataset but encompasses various substrates and transition metals throughout the examined domain. While validation against an independent external dataset would yield the strictest assessment of transferability, such data are presently inaccessible for SAC@MXene systems with equivalent DFT accuracy and descriptor definitions. Nevertheless, the uniform results achieved in all internal evaluations, along with the model's physically interpretable parameters, indicate its reliable applicability for screening novel and compositionally analogous systems.
Further, we performed analysis with Group 2, Group 3 and simultaneously Group 2 & 3 features, separately (see Fig. S16 of SI). Interestingly, we found that only Group 1 features correlate well with the prediction of the target, whereas other features exhibit poor correlation. This finding corresponds well with the outcomes of the feature-importance analysis (cf.Fig. 6b), which showed the considerable significance of features like IE and NVAL, all of which are classified as Group 1 features.
To construct a glass-box model, we first formulated a reduced equation by employing feature-importance analysis only for Group 1 features. This method enabled us to establish a clear, interpretable connection between the chosen features and the ΔGH target variable. The coefficients in the equation were established according to their impact on the model's predictive efficacy, ensuring that each term accurately represents the influence of its corresponding feature on ΔGH. At first, we formulated a clear equation using all the features used in the study, i.e. using all 10 features, where the MAE is 0.16 eV, and the corresponding equation is as follows:
| ΔGH = 1.39 + 0.78·RTM − 0.74·disoH + 0.35·χTM − 0.29·χs + 0.24·ΔQ − 0.23·IE + 0.16·ΔEisoH + 0.08·NVAL + 0.06·d − 0.04·Eb | (3) |
After that, we employed standardized features (see Fig. S17a of SI for feature-importance analysis), whereby each feature was normalized to possess a mean of zero and a standard deviation of one. The simple equation is given as follows:
| ΔGH = 0.91 + 0.41·RTM − 0.20·IE + 0.10·χTM − 0.09·χS + 0.07·NVAL | (4) |
This equation offers a straightforward, interpretable, and computationally efficient method to estimate ΔGH on SACs at the considered MXenes, utilizing a small set of readily available features with a MAE of 0.17 eV (see Fig. S17b of SI for parity plot). To further simplify our model, we explored the possibility of using non-standardized features to assess the predictive efficiency while ensuring ease of application. By utilizing raw feature values, we aimed to create a model that eliminates the need for data preprocessing, implying that no prior standardization is required. This is important because no standardization coefficients are required, allowing anyone to perform an initial screening without requiring specialized programming or statistical transformations. This approach makes the model even more accessible, particularly for experimental researchers and practitioners looking for a quick and intuitive way to estimate ΔGH. Interestingly, when we switched to non-standardized features, we observed a slight shift in the MAE values (0.18 eV, see Fig. S18 of SI for parity plot) as in the standardized case, and the corresponding simplified equation is given as follows:
| ΔGH = 0.34 + 0.76·RTM − 0.19·IE − 0.15·χS + 0.11·NVAL + 0.01·χTM | (5) |
This finding is particularly valuable, as it confirms that the simplified, non-standardized model can be used directly with raw data, making it a practical tool for initial screening and decision-making in material selection. It is essential to understand that, although the glass-box model (eqn (4) and (5)) offers a highly interpretable analytical connection grounded purely upon tabulated atomic properties, this simplification inherently imposes constraints on the prediction accuracy. The parity plots (Fig. S17b and S18 of SI) indicate that specific systems, especially those at the extremities of the ΔGH range, display significant dispersion in relation to the parity line. This dispersion may result from localized geometric and electronic properties that are not explicitly captured by the basic atomic descriptors employed in this glass-box model. Nevertheless, the glass-box equation effectively captures the trends within the dataset, attaining a MAE of roughly 0.17 eV, which is analogous to the intrinsic error of DFT-calculated adsorption free energies. This accuracy is sufficient for fast, preliminary evaluation and for differentiating between distinctly promising and less-effective catalyst candidates. In contrast to the single-feature models (e.g., solely ionization energy or valence count) and models utilizing Group 2 or Group 3 features, which demonstrate minimal predictive capability and produce parity plots nearly parallel to the x-axis (Fig. S5, S16 of SI), the Group 1-based model displays a distinct correlation with the DFT reference values (Fig. S18). This indicates that, despite its simplicity, the model effectively captures significant multivariate structure–property correlations. Consequently, we emphasize that the glass-box model should be considered a physically significant, predictive framework that prioritizes transparency and accessibility over high-precision quantitative estimations. This result validates the efficiency and simplicity of our model, demonstrating that precise predictions may be achieved without supplementary DFT calculations, thereby reinforcing our aim of developing a computationally efficient and interpretable model. Note also that to assess if interpretability undermines prediction performance, we compared our linear models with various sophisticated, high-capacity algorithms, including RF, ET, GB, ADB, and XGB (refer to Fig. 5b). The top-performing black-box models, including GB and ET, attained MAE values of approximately 0.133–0.135 eV, somewhat lower than the 0.159 eV produced with the multivariate linear regression model. The minor discrepancy of 0.02–0.03 eV falls within the intrinsic error of the DFT-calculated adsorption free energies (about 0.10–0.20 eV), suggesting that our interpretable methodology maintains almost the same predictive capability. Thus, the linear model presents an advantageous compromise—attaining DFT-level precision while establishing transparent, physically significant correlations between descriptors and ΔGH. This illustrates that interpretability may be achieved without much compromise in prediction performance, rendering the suggested model both scientifically accessible and computationally efficient for expedited catalyst screening.
Conversely, the properties of Group 2 and Group 3 necessitate additional DFT calculations, which may be computationally expensive, especially for large datasets. Thus, although complex models provide enhanced accuracy, they require greater processing resources owing to their dependence on features from Group 2 and Group 3. The LR model attains satisfactory accuracy while reducing computing expenses, offering an attractive trade-off between accuracy and efficiency. This trade-off indicates that, based on the application requirements, the LR model may be the ideal selection for scenarios when computational efficiency is paramount.
We finally underline that the final model predicts the hydrogen binding energy with an accuracy of 0.170 eV using only quantities that do not require any atomistic simulation of the SACs. In this respect, the model could be of help to provide a first-order approximation for the screening of SACs for the HER based on M2C materials like MXenes, followed by more accurate calculations on promising candidates. The MAE values for the best-performing models (0.13–0.18 eV) are comparable to the intrinsic accuracy limits of the DFT calculations, often ranging from 0.10 to 0.20 eV for the hydrogen adsorption free energies, depending upon the functional and dispersion corrections employed. From a practical viewpoint, this level of accuracy suffices for the initial screening of the MXene-based SACs, facilitating the dependable identification of interesting candidates with near-optimal ΔGH values (ΔGH < 0.2 eV). The model is designed not to substitute comprehensive DFT computations but to assist in their prioritization, delivering DFT-level performance at a significantly reduced computational expense. Note also that even if the conclusions are extracted from the SACs involving M2C materials like MXenes, it is likely that the presented model works reasonably well for M2C3 and M3C4 MXenes, as the number of atomic layers has a small influence on the adsorption properties of molecules such as CO2.65 Further work will be dedicated to the extension of the approach to other relevant supports.
The data supporting this article have been included as part of supplementary information (SI). Supplementary information: adatom adsorption sites, TM–substrate distance before and after H-binding, magnetization value of the TM adsorbed on the substrate, correlation with DFT-calculated features, parity plot for single-variable linear regression models, feature-importance analysis for different models, parity plots for different models, leave-one-out experiment with individual metals, and comparison of the MAE values among several models utilizing 10 and 16 features. See DOI: https://doi.org/10.1039/d5ta07143g.
Footnote |
| † These authors contributed equally. |
| This journal is © The Royal Society of Chemistry 2026 |