Open Access Article
Thantip Roongcharoen and
Alessandro Fortunelli
*
CNR-ICCOM, Consiglio Nazionale delle Ricerche, via Giuseppe Moruzzi 1, Pisa 56124, Italy. E-mail: thantip.roongcharoen@cnr.it; alessandro.fortunelli@cnr.it
First published on 18th March 2026
We present a computational workflow, the Conformal Sampling of Catalytic Processes (CSCP) approach, and its application to the case of heterogeneous hydrogenation/reduction of carbon dioxide (CO2) on copper and nickel catalysts. CO2 activation is of critical importance for a sustainable global future and one of the major reactions for which sustainable routes must be found urgently. We use the fcc(100) facet of Cu as a worked-out case, and exhaustively derive at the DFT level its reaction mechanisms. We then apply CSCP to first derive a Machine Learning Interatomic Potential (MLIP) for this initial system, and then carry over the knowledge derived on Cu(100) to a pure monometallic facet, Ni(100), to rapidly derive a MLIP for this different system, so as to test the transferability of the approach. The accuracy of the so-derived MLIPs is excellent, predicting both reaction energies and energy barriers for all the mechanistic steps of this complex reaction diagram with consistent and uniform accuracy, with a maximum discrepancy of 0.05 eV for Cu and 0.03 eV for Ni. In the exception cases in which this discrepancy is larger, we show and rationalize that this is due to a change in the reaction mechanism, where the MLIP simulations explore pathways different from the reference DFT ones, without however prejudicing their absolute accuracy. The CSCP-MLIPs are thus shown to be able to assure a transferability en par with the best physics-based models, provide alternative atomistic mechanisms of catalytic processes, and offer themselves as a tool for catalyst rational design on a process societally relevant and exhibiting significant catalytic complexity.
Deriving accurate MLIPs for heterogeneous catalysis is the theme of the present contribution, grounded on our recently developed Conformal Sampling of Catalytic Processes (CSCP) approach.8,11 CSCP proposes a practical route to exploit accumulated information on catalytic mechanisms of worked-out systems to accelerate sampling of the same or similar mechanisms of new systems (note that the reaction mechanisms need not to be identical, since CSCP seems to be able to cover all chemically equivalent paths, en par with the best physics-based interatomic potentials11). A rigorous theorem12 is rephrased within the context to state that structures (atomistic geometries) are invariant up to 2nd order in a perturbation theory expansion to a change in the catalyst stoichiometry. This implies that the “catalytic funnel” along which a reaction occurs for a given catalyst can be conformally transferred into the catalytic funnel of a different (novel, unknown) catalyst. The structural database derived to train the reactive MLIP for the worked-out system can thus be carried over to the new system as a first robust approximation (modulus affine transformation to account for distance rescaling8,11), and then quickly refined with a few steps of active learning (1–2 are sufficient in the cases treated so far), to achieve an absolute accuracy of ∼0.03 eV on barrier prediction with respect to the generating first-principles approach. Such sub-chemical accuracy corresponds to the level of accuracy targeted to ensure that the MLIP surrogate model can be employed with confidence in operational catalyst screening. Additionally, since the training structural database is available a priori, the single-point calculations, generating energies and forces over the database geometries as needed to train the MLIP pertaining to the new system, are fully disconnected, thus perfectly parallelizable on HPC machines, maximizing computational throughput8,11,13,14 to an estimated ∼1 hour for screening a new system on a properly equipped HPC system. Importantly, CSCP has been extended with uniform performance to bimetallic systems,10 a necessary step toward the design of multi-metallic catalysts.
The practical details of the CSCP approach, including the latest advances, such as the use of extrapolation techniques derived from the theory of convergence of numerical series and the adjustments for its extension to bimetallic catalysts, can be found in the original ref. 8, 10 and 11 (selected information can be found in Sections 2 and 5). In our original work, we investigated methanol (CH3OH) decomposition on Pt fcc(100) and fcc(111) facets as worked-out catalysis, and then conformally transferred its reaction paths onto the same facets of eight different catalysts (Fe, Co, Ni, Cu, Pd, Ag, Au, and Pt itself at a different adsorbate coverage). In the present work, we test and demonstrate the CSCP potentialities focusing on a different heterogeneous catalytic process, the thermal hydrogenation/reduction of carbon dioxide (CO2), or t-CO2RR, of critical importance for a sustainable global future. We will use the fcc(100) facet of Cu as a worked-out case, and exhaustively derive at the DFT level its reaction mechanisms. We will then apply CSCP to carry over the knowledge derived on Cu(100) to a pure monometallic facet, Ni(100), to test the transferability of the approach. We thus go beyond our original work in terms of catalytic complexity, and demonstrate that the CSCP approach works equally well on such a societally relevant process.
The article is organized as follows. In Section 2, we provide computational details. In Section 3, we derived the reaction energy diagram for CO2RR on the (100) facet of fcc Cu, and then discuss how to apply CSCP to transfer the so-derived information to a transition metal (TM = Ni). In Section 3, conclusions are summarized.
We then examined the reaction mechanism and energetics of the reaction by placing six H atoms and CO2 on a Cu surface, and allowing the reaction to proceed from CO2 + 6H to CH3OH + H2O. We note that in our previous studies8,11 we focused on CH3OH decomposition, a key pathway in the CH3OH aqueous-phase reforming (APR) reaction for hydrogen production25 [CH3OH + H2O → CO2 + 3H2]. The CSCP scheme was evaluated along the C–H cleavage path [CH3OH → CH2OH + H → CHOH + 2H → COH + 3H → CO + 4H], as well as the C–O and O–H cleavage paths. This motivated us to apply the CSCP procedure to CO2 hydrogenation [CO2 + 3H2 → CH3OH + H2O], the reverse process of CH3OH-APR, to provide a comprehensive assessment of CSCP performance for both C–H and C–O bond cleavage and hydrogenation reactions.
The CSCP workflow in Fig. 2 illustrates the sequential steps for constructing the database-DX and MLIP-X potentials for CO2 hydrogenation. As mentioned, we select the Cu-fcc(100) facet as the worked-out system, for which we also derive a database and a MLIP using the CSCP strategy as a test of the quality and accuracy of the approach, and the Ni-fcc(100) facet as the newly investigated system to evaluate the performance of the CSCP protocol to transfer information from Cu and generate a MLIP for an unknown system. The first step, shown in the pink block in Fig. 2, involves constructing the first database (Database-D1) for the Cu worked-out case using DFT calculations. As customary, we perform two types of DFT simulations: (1) optimization of intermediate species and (2) Nudged Elastic Band (NEB) simulations for transition-state searches. Details of the database-D1 for Cu-fcc(100) are provided in Section 5, Computational details. In the next step, shown in the yellow block in Fig. 2, the CSCP workflow leverages knowledge from the worked-out case, transferring information as a set of coordinates along the reaction pathways tuned via a rescaling approach to build an initial database for the new Ni catalytic system. The rescaled structures are then divided into two sets. The first set (Database A) contains rescaled coordinates of optimized intermediate species [e.g., CO2, COOH, CHOH, HCOO, CH2OH–OH, OH] (see Scheme 1). These coordinates are then DFT-optimized to determine the local minima for Ni. The second dataset (Database B) contains rescaled coordinates along the elementary reaction steps of CO2 hydrogenation, extracted from NEB@DFT calculations performed on the Cu worked-out case. NEB@DFT is highly demanding in terms of computational and human resources, a step which we bypass for the Ni system by exploiting knowledge from the Cu case. The resulting rescaled NEB coordinates are processed through single-point DFT calculations to obtain energies and forces. Database A and Database B are then combined to form Database D1 for Ni, and fed into the parametrization of MLIP-1.
The CSCP employs an equivariant message-passing neural network (MPNN) in the form of the machine-learning MACE MLIP model26 that is trained on the given database, which consists of the XYZ reaction coordinates together with their forces and energies. Details of the MACE parameters are provided in Table S1. After the first MLIP parametrization (producing MLIP-1), the procedure proceeds through two steps of Active Learning (in the magenta, light and dark blue and dark green blocks in Fig. 2) to generate new databases and retrain more accurate MLIP-X potentials, increasing reliability and accuracy. The sampling techniques to produce newly relevant coordinates in the active-learning steps are: (1) running NEB@MLIP-X (X = 1 or 2) for all elementary reaction steps (see Scheme 1) and properly extracting the coordinates from the simulations, and (2) performing extrapolation techniques to generate new coordinates along the reaction pathway. The extrapolation operates on two sets of XYZ coordinates via the equation: N = B + (B − A) = 2B − A, where N are the extrapolated coordinates, and A and B are XYZ coordinates generated from different NEB@MLIP-X iterations. We applied two types of extrapolations: vertical extrapolation, which operates on NEB coordinates obtained at different iterations using the same MLIP-X potential, and horizontal extrapolation, which operates on NEB coordinates generated from different MLIP-X potentials (X = 1 or 2) at the same iteration. Details of the extrapolation technique are described in our previous work.11 We applied extrapolation across all NEB@MLIP-X steps. We note here that the databases (D1, D2, D3) built within CSCP are incremental: new datasets are gathered and merged into the datasets from the earlier stages. After two Active Learning steps and the parametrizations of MLIP-2 and MLIP-3, we assess the performance of the final MLIP-3 for the Cu and Ni catalysts in terms of both internal technical accuracy (RMSE, internal test) and energy prediction with respect to the generating first-principles (DFT) model (external test). The results are presented in Section 3.
Note that we define a threshold of 0.03 eV as our targeted accuracy. This definition corresponds to sub-chemical accuracy (<1 kcal mol−1), or ≈kBT at room temperature, or a factor of ≈3 in kinetic rates at room temperature (a factor <2 at 600 K), and is below the usual accuracy of DFT (around 2 kcal mol−1) as well as typically also of post-Hartree-Fock or post-DFT computational methods. Empirically, we found that in the cases so far investigated (methanol decomposition and CO2 reduction) two active learning steps are sufficient to achieve the targeted accuracy. This is consistent with the principle that the initial, conformally derived database (D1) already contains the essential information on the reaction path on the new catalyst, so that the number of AL steps is reduced to a minimum. In the CSCP protocol in productions runs, control parity plots are performed to test that the quality of the CSCP MLIPs does not degrade for the given catalytic process.
| Cu (100) | Ni (100) | |||
|---|---|---|---|---|
| ERMSE | FRMSE | ERMSE | FRMSE | |
| MLIP-3 | Train: 0.5 | Train: 9.1 | Train: 0.5 | Train: 9.4 |
| Valid: 0.7 | Valid: 10.3 | Valid: 0.6 | Valid: 10.0 | |
| Step | Reaction step | 12-DFT | 12-MLIP-3 | 24-MLIP-3 | 48-MLIP-3 |
|---|---|---|---|---|---|
| a Exception cases where ΔEa > 0.05 eV, for which parity plots were further analyzed. | |||||
| (a) | CO2 + H → HCOO | 0.803 | 0.859 (0.06) | 0.915 (0.11) | 0.820 (0.02) |
| (b) | HCOO + H → H2COO | 1.096 | 1.155 (0.06) | 1.148 (0.05) | 1.137 (0.04) |
| (c) | H2COO + H → H2COOH | 0.810 | 0.870 (0.06) | 0.832 (0.02) | 0.854 (0.04) |
| (d) | H2COOH + H → HO–H2C–OH | 0.917 | 0.880 (0.04) | 0.832 (−0.09) | 0.801 (−0.12)a |
| (e) | CO2 + H → COOH | 1.936 | 2.089 (0.15)a | 1.788 (−0.15)a | 1.863 (0.07)a |
| (f) | COOH + H → HCOOH | 0.521 | 0.539 (0.02) | 0.531 (0.01) | 0.510 (0.01) |
| (g) | HCOOH + H → H2COOH | 0.565 | 0.739 (0.17) | 0.658 (0.09) | 0.575 (0.01) |
| (h) | H2COOH → CH2O + OH | 0.738 | 0.750 (0.01) | 0.736 (0.00) | 0.714 (0.02) |
| (i) | CH2O + OH → CH3O + OH | 0.414 | 0.321 (0.09)a | 0.340 (0.07)a | 0.340 (0.07)a |
| (j) | CH3O+2H + OH → CH3OH + H + OH | 0.670 | 0.704 (0.03) | 0.655 (0.02) | 0.656 (0.01) |
| (k) | HCOO + H → HCOOH | 0.846 | 0.827 (0.02) | 0.814 (0.03) | 0.810 (0.04) |
| (l) | HCOOH + H → H2COOH | 1.033 | 1.024 (0.01) | 1.038 (0.01) | 1.035 (0.00) |
| (m) | HO–CH2OH → OH + CH2OH | 1.403 | 1.482 (0.08) | 1.414 (0.01) | 1.314 (0.09)a |
| (n) | CH3OH + OH + H → CH3OH + H2O | 0.823 | 0.768 (0.06)a | 0.729 (0.09)a | 0.705 (0.12)a |
| (o) | COOH + H → HO–COH | 0.716 | 0.794 (0.08) | 0.788 (0.07) | 0.761 (0.05) |
| (p) | HO–COH + H → HO–CHOH | 0.667 | 0.764 (0.10) | 0.698 (0.03) | 0.710 (0.04) |
| (q) | HCOOH + H → HO–CHOH | 1.160 | 1.071 (0.09) | 1.276 (0.12) | 1.179 (0.02) |
| (r) | H2COOH + H → HO–CH2OH | 0.974 | 0.860 (0.11) | 0.926 (0.05) | 0.916 (0.06) |
| (s) | HO + CH2O + H → CH2OH + OH | 0.605 | 0.568 (0.04) | 0.616 (0.01) | 0.604 (0.00) |
| (t) | HO–COH → HO + COH | 1.313 | 1.397 (0.08) | 1.347 (0.03) | 1.305 (0.01) |
| (u) | HO + COH → OH + CHOH | 0.344 | 0.311 (0.03) | 0.316 (0.03) | 0.306 (0.04) |
| (v) | OH + CHOH + H → OH + CH2OH | 0.150 | 0.154 (0.00) | 0.172 (0.02) | 0.157 (0.01) |
| (w) | OH–CHOH → OH + CHOH | 0.305 | 0.320 (0.02) | 0.298 (0.01) | 0.298 (0.01) |
| (x) | HO–CH2OH + H→ H2O + CH2OH | 1.327 | 1.346 (0.02) | 1.211 (−0.12) | 1.116 (−0.16)a |
| (y) | HCOOH → HCO + OH | 1.063 | 1.094 (0.03) | 1.036 (0.03) | 1.008 (0.05) |
| (z) | HCO + OH → HCOH + OH | 0.605 | 0.507 (0.01) | 0.553 (0.05) | 0.574 (0.03) |
| (aa) | CH2OH + OH → CH2OH + H2O | 0.687 | 0.698 (0.01) | 0.672 (0.02) | 0.675 (0.01) |
| (bb) | CH2OH + H2O → CH3OH + H2O | 0.501 | 0.738 (0.24) | 0.608 (0.11) | 0.545 (0.04) |
| (cc) | CH3O + OH → CH3O + H2O | 0.869 | 0.661 (0.21) | 0.787 (0.08) | 0.835 (0.03) |
| (dd) | H2O + CH3O + H → CH3OH + H2O | 0.798 | 0.731 (0.07) | 0.767 (0.03) | 0.739 (0.06) |
| Step | Reaction step | 12-DFT | 12-MLIP-3 | 24-MLIP-3 | 48-MLIP-3 |
|---|---|---|---|---|---|
| a Exception cases where ΔEa > 0.05 eV, for which parity plots were further analyzed. | |||||
| (a) | CO2 + H → HCOO | 0.928 | 1.099 (0.17) | 1.036 (0.11)a | 0.979 (0.05)a |
| (b) | HCOO + H → H2COO | 1.206 | 1.231 (0.02) | 1.234 (0.03) | 1.231 (0.02) |
| (c) | H2COO + H → H2COOH | 1.443 | 1.438 (0.00) | 1.417 (0.03) | 1.431 (0.01) |
| (d) | H2COOH + H → HO–H2C–OH | 1.126 | 1.186 (0.06) | 1.120 (0.01) | 1.117 (0.01) |
| (e) | CO2 + H → COOH | 1.504 | 1.420 (0.08) | 1.494 (0.01) | 1.475 (0.03) |
| (f) | COOH + H → HCOOH | 0.00 | 0.00 | 0.00 | 0.00 |
| (g) | HCOOH + H → H2COOH | 0.940 | 1.082 (0.14) | 0.974 (0.03) | 0.811 (0.13)a |
| (h) | H2COOH→CH2O + OH | 0.824 | 0.843 (0.02) | 0.814 (0.01) | 0.814 (0.01) |
| (i) | CH2O + OH → CH3O + OH | 0.561 | 0.557 (0.00) | 0.573 (0.01) | 0.583 (0.02) |
| (j) | CH3O+2H + OH → CH3OH + H + OH | 0.922 | 0.907 (0.01) | 0.941 (0.02) | 1.109 (0.19)a |
| (k) | HCOO + H → HCOOH | 0.939 | 0.949 (0.01) | 0.952 (0.01) | 0.956 (0.02) |
| (l) | HCOOH + H → H2COOH | 1.018 | 1.086 (0.01) | 1.105 (0.01) | 1.114 (0.02) |
| (m) | HO–CH2OH → OH + CH2OH | 1.010 | 0.989 (0.02) | 0.974 (0.04) | 0.964 (0.05) |
| (n) | CH3OH + OH + H → CH3OH + H2O | 1.014 | 1.059 (0.05) | 1.063 (0.05) | 1.045 (0.03) |
| (o) | COOH + H → HO–COH | 1.230 | 1.190 (0.04) | 1.180 (0.05) | 1.199 (0.03) |
| (p) | HO–COH + H → HO–CHOH | 1.019 | 1.082 (0.06) | 1.043 (0.02) | 0.973 (0.05) |
| (q) | HCOOH + H → HO–CHOH | 1.219 | 1.239 (0.08) | 1.295 (0.02) | 1.337 (0.02) |
| (r) | H2COOH + H → HO–CH2OH | 1.050 | 0.992 (0.06) | 1.076 (0.03) | 1.073 (0.02) |
| (s) | HO + CH2O + H → CH2OH + OH | 0.750 | 0.816 (0.07) | 0.781 (0.03) | 0.801 (0.05) |
| (t) | HO–COH → HO + COH | 0.705 | 0.741 (0.04) | 0.680 (0.02) | 0.663 (0.04) |
| (u) | HO + COH → OH + CHOH | 1.516 | 1.514 (0.00) | 1.503 (0.01) | 1.477 (0.04) |
| (v) | OH + CHOH + H → OH + CH2OH | 0.581 | 0.612 (0.03) | 0.613 (0.03) | 0.617 (0.04) |
| (w) | OH–CHOH → OH + CHOH | 0.113 | 0.111 (0.00) | 0.086 (0.03) | 0.071 (0.04) |
| (x) | HO–CH2OH + H → H2O + CH2OH | 1.529 | 1.722 (0.19) | 1.359 (0.16) | 1.300 (0.23) |
| (y) | HCOOH → HCO + OH | 1.231 | 1.166 (0.07) | 1.177 (0.05) | 1.202 (0.03) |
| (z) | HCO + OH → HCOH + OH | 1.059 | 1.011 (0.05) | 1.074 (0.02) | 1.077 (0.02) |
| (aa) | CH2OH + OH → CH2OH + H2O | 0.926 | 0.965 (0.04) | 0.950 (0.02) | 0.939 (0.01) |
| (bb) | CH2OH + H2O → CH3OH + H2O | 0.591 | 0.565 (0.03) | 0.620 (0.03) | 0.624 (0.03) |
| (cc) | CH3O + OH → CH3O + H2O | 1.023 | 1.154 (0.13) | 1.138 (0.12) | 1.092 (0.07) |
| (dd) | H2O + CH3O + H → CH3OH + H2O | 0.893 | 0.804 (0.09) | 0.898 (0.00) | 0.866 (0.03) |
We first assess the MLIP performance for the Cu worked-out case. The energy differences in Ea predicted by NEB@DFT and NEB@MLIP (ΔE) for each reaction step are listed in parentheses in Table 2. The Ea simulated by CI-NEB@DFT for each reaction step is listed in the third column (NEB@DFT). The Ea values predicted by NEB@MLIP using 12, 24, and 48 intermediate images are listed in the columns titled 12-MLIP-3, 24-MLIP-3, and 48-MLIP-3, respectively. The NEB@MLIP-3 simulations for the Cu worked-out case show excellent performance, with ΔE values below 0.05 eV for most steps.
Let us now discuss the exception steps, where ΔE exceeds 0.05 eV. The exception steps occur at steps (d), (e), (i), (m), (n) and (x); the corresponding reaction formulas are provided in Table 2.
We can distinguish two classes of discrepancies:
In the first class, we observe that NEB@MLIP predicts lower barriers than NEB@DFT. To further examine the accuracy of the approach, in these cases we produced further test data as reported in Fig. 3. In particular, we performed single-point DFT calculations on the reaction coordinates generated from the NEB@MLIP-3 simulations and generated parity plots. Taking reaction step (d) as an example, this step corresponds to the hydrogen protonation of the H2COOH species [H2COOH + H→ HO–H2C–OH]. The parity plots comparing the energies obtained from NEB@DFT and NEB@MLIP are shown in Fig. 3a–c. These plots exhibit clear linear trends between the two sets of energies, with R2 values of 0.998–0.999 for the three NEB@MLIP simulations using 12, 24, and 48 images (see Fig. 3a–c), respectively, indicating high accuracy in the energy predictions of coordinates along the reaction pathways. The optimized structures along the reaction pathway, as shown in Fig. 3d and e, illustrate that this reaction step involves both the bending and migration of the bulky H2COOH species across the adsorption sites of the Cu surface (from bridge to hollow), together with the migration of a single hydrogen atom that subsequently forms a bond with the O atom of the H2COOH species. Given the efficiency of NEB@MLIP simulations, using 48 images is very fast and provides a more precise optimization of the energy profile along the reaction pathway, with a clear advantage over NEB@DFT, which relies on a constrained number of images due to computational resource limitations. For the other exception cases (e, i, m, n, x), their parity plots are shown in Fig. S2–S7 of the SI. We always found linear correlation values R2 of 0.999–0.998 between MLIP and DFT energies for all NEB simulations, regardless of the number of intermediate images used (12, 24, or 48), indicating the high accuracy of MLIP in energy predictions.
To gain deeper insight into the exceptional cases in the MLIP-3 accuracy analysis, we illustrate in Fig. S8 the reaction energy profile of CH3OH + OH + H → CH3OH + H2O on Cu(100) as a function of reaction coordinates. We found that the water formation step on Cu(100), in the presence of adsorbed CH3OH, corresponds to a composite reaction pathway. First, CH3OH migrates on the surface, induced by nearby OH species, due to hydrogen bonding between the H atom of CH3OH and the OH species (see Fig. S8(d)–(f)). The OH species then reacts with a single H atom adsorbed on the surface to form H2O, while CH3OH acts as a spectator throughout the reaction. Such composite reaction paths often occur on metal surfaces, especially at high coverage. Note the energies in the range −9.1 to −8.8 eV in Fig. S4b–c may appear redundant, or highly correlated, with a risk of data leakage. However, the configurations with similar energies along the reaction pathway reflect the migration of species on the surface and are structurally distinct, in keeping with our rigorous construction of the fitting database. The energy profiles shown in Fig. S8a–c indicate that the final MLIP-3 potential for Cu accurately reproduces the DFT energies, regardless of the number of intermediate images used in the calculations. We did not find examples of the second class of discrepancies for Cu, so we discuss them in the next paragraphs about Ni.
We next evaluate the performance of MLIP-3 for the Ni catalyst. Table 3 lists the Ea values predicted by NEB@MLIP-3 for all the reaction steps, with ΔE values in parentheses representing the energy differences between the NEB@DFT and NEB@MLIP-3 calculations. The results demonstrate that MLIP-3, generated within the CSCP scheme, exhibits outstanding performance in NEB@MLIP calculations and shows excellent agreement with the DFT energy benchmark, with ΔE maximum values as low as 0.03 eV.
The four exception cases are the reaction steps (a, g, j, and x). The most interesting case is the reaction step j [CH3O + 2H + OH → CH3OH + H + OH], which is an example of the second class of discrepancies we mentioned above, that we now discuss. We thus produced further test data as reported in Fig. 4. In particular, in Fig. 4a–c we report the parity plot for the reaction step j in Table 3. The ΔE values in columns 12-MLIP and 24-MLIP are 0.01 and 0.02 eV, respectively; however, a difference of +0.19 eV in the energy barrier is observed when NEB@MLIP is performed using 48 intermediate images. In other words, the energy barrier found by NEB@MLIP using 48 images is higher than the energy barrier found by NEB@DFT using 12 images or NEB@MLIP using 12 or 24 images. Examination of the parity plots indicates that all three cases exhibit R2 of 0.998–0.999, implying that MLIP accurately reproduces the DFT energies of the coordinates along the reaction pathway for the hydrogen protonation of CH3O to generate CH3OH on Cu with OH and H residues adsorbed on the surface. Further inspection of the reaction profiles and coordinates reveals that NEB@MLIP using 48 intermediate images gives a different reaction path compared with NEB@DFT (whereas NEB@MLIP simulations using 12 and 24 images produce reaction energy pathways that closely resemble those of NEB@DFT). In the case of NEB@MLIP with 48 images, at transition state 2 (TS2), the CH3O species first moves from the bridge site of (a and b) to the adjacent bridge site (b and c) in Fig. 4, while the activated hydrogen atom approaches the migrated CH3O species. The reaction thus occurs at a different active site on the Cu surface compared with TS1 from NEB@DFT, where only the hydrogen atom migrates toward the CH3O species at the site of (a and b) in Fig. 4. Fig. 5 compares the energy profile along the two alternative NEB paths, showing that they are clearly different. In our previous work on CH3OH decomposition,8,11 we observed in some cases a change in the reaction mechanism with a lower energy barrier when employing the final MLIP with a larger number of intermediate images than the NEB@DFT simulations, i.e., as in the Cu case above we only found examples of the first class of discrepancies. In contrast, the present example, reaction j [CH3O+2H + OH → CH3OH + H + OH], shows that using NEB@MLIP simulations with a larger number of intermediate images leads MLIP to explore alternative pathways with higher energy barriers. The MLIP is accurate, but the NEB simulation starting with a larger number of images (we remind that they are initially interpolated linearly from initial and final configurations) may converge to a different reaction path, usually with a lower energy barrier, but sometimes (counterintuitively) with a higher barrier if the initial and final configurations are too distant in the space of atomic coordinates. This discrepancy is thus not connected to the accuracy of the MLIP, but rather to the need to improve the NEB simulations using configurations with a lower number of images as the starting point of NEBs with a larger number of images, or better by performing more systematic reaction path searches than simple NEB simulations, e.g. using Reactive Global Optimization (RGO) approaches.27 Note in this connection that recent advances in Hessian estimates28 enable a very significant speed-up of reaction path search approaches such as RGO.
![]() | ||
| Fig. 5 Reaction energy profiles for the hydrogen protonation of CH3O on Cu(100) with OH and H species on the metal surface, obtained using NEB@MLIP with (a) 12 and (b) 24 intermediate images. | ||
To further assess the MLIP-3, we performed ramped-temperature molecular dynamics (MD) simulations in the canonical (NVT) ensemble to evaluate the performance of MLIP-3 as a function of temperature. The adsorption configuration of HO–CH2OH + 2H on the Ni(100) surface was selected as a representative case. The MD simulation began with a heating phase, during which the temperature was gradually increased from 0 to 200 K over 80 ps. This was followed by an equilibration phase at a constant temperature of 200 K for an additional 50 ps. Subsequently, additional heating cycles were performed, in which the temperature was further increased in increments of 50 K over 80 ps, each followed by a 50 ps equilibration phase at the corresponding constant temperature reached in the preceding heating step. This procedure was repeated up to the final temperature of 350 K, as shown in Fig. S9. The two bottom layers of the Ni(100) slab were frozen, while the top three layers were allowed to move freely, consistent with the setup used in the NEB@MLIP-3 and DFT calculations. The input parameters used for the MD simulations are listed in Section S3 in the SI. The results demonstrate that the potential energy of the HO–CH2OH + 2H complex on the Ni(100) surface gradually and steadily increased as the temperature rose to 350 K, indicating the physical stability of the system and consistent energy behavior. From the trajectories sampled every 8000 steps during the heating phase and every 5000 steps during the equilibration phase, we observed that the adsorbate species moved physically on the catalytic surface. The H atoms, having a small mass, migrated faster, while the heavier HO–CH2OH species moved more slowly. The metal surface atoms also exhibited physical motion but remained within the plane without a significant distortion. The results suggest that MLIP-3 can serve as a reliable initial potential to study the effects of temperature on the energetics and catalytic behavior of the CO2 hydrogenation reaction under different temperature conditions.
Finally, we evaluate the reaction energies (Ereaction) using Ereaction = EFS − EIS where Ereaction is the reaction energy, and EFS and EIS are the total energies of the final and initial states in the NEB calculations, respectively. We list Ereaction for all elementary steps of CO2 hydrogenation on Ni(100) in Table S2. The results show that MLIP-3 accurately predicts the reaction energies, with differences in reaction energies (ΔE), less than 0.03 eV when compared with the values computed from NEB@DFT calculations.
Point (i) is related to the actual “resting state” of the catalytic surface under the given conditions. To make an example, under conditions of aqueous phase reforming of oxygenates (such as methanol), Pt-bases catalyst surfaces are typically covered by hydrogen adatoms.25 Similar phenomena have been reported in electrocatalysis, e.g., CO coverage in CO2 electro-reduction (CO2RR).29 Under these conditions, not the bare surfaces but surfaces exhibiting a full coverage of, e.g., hydrogen species should therefore be considered as the configurations in the worked-out case or in the predicted reaction energy diagrams to enable a proper, quantitative comparison with experiment. This is due to two reasons: first, there will be an energy price associated with making empty sites for incoming reactants different from the dominant adsorbate species; second, the presence of other adsorbates, even if they are “chemically spectators”, will modify the energetics of the system via many-body effects. The inclusion of this point within CSCP should not present excessive difficulties. One shall only need to generate worked-out or rescaled configurations in a proper way, but there is no intrinsic reason why the conformal principle should fail. Indeed, in our original work,8 we purposely used, as a worked-out case, configurations for aqueous phase reforming of methanol on Pt(100) and Pt(111) under conditions of high coverage of hydrogen, i.e., facets fully covered with hydrogen adatoms on all top sites except two for Pt(111) or on all bridge sites except two for Pt(100). Then, we conformally translated these into configurations for methanol decomposition at low coverage (no spectator hydrogens) on (111) and (100) facets of Au, Ag, Pd, Cu, Ni, Co, Fe, and Pt itself by simply removing the spectator hydrogens. In other words, we used a worked-out case of a different reaction and under completely different coverage conditions within the CSCP strategy and we obtained excellent results.
Clearly, care should be taken to account for high-coverage many-body effects, but in principle without any insurmountable obstacles. Indeed, within this same present study we found a case in point that rationalizes one example of discrepancy between DFT and MLIP-3 larger than 0.05 eV. For the C–O bond dissociation step (HO–CH2OH + H → HO + CH2OH + H) on Cu(100), the difference in energy barrier between DFT and MLIP-3 reads 0.16 eV. Analysis of parity plots shows that MLIP-3 does not reproduce the DFT energies accurately in the high-energy region, and we found that this is due to a many-body coverage effect. One hydrogen is in fact close to the HO–CH2OH adsorbate, and this interaction produces a difference in the energetics of the reaction as predicted by MLIP-3 (in whose parametrization high-coverage effects were not included) with respect to DFT (which obviously includes high-coverage effects), generating a discrepancy especially notable in the energy barrier. This is unwanted because in this phase of the CSCP development we would like to limit ourselves to the low-coverage case. To explore whether CSCP can reproduce this effect, we have parametrized a new MACE@MLIP-3′ using a database composed of the Database-D3 plus the newly generated reaction coordinates for the step HO–CH2OH + H→HO + CH2OH + H obtained via the NEB@MLIP-3. Now, although in doing so we have included in the database a minimal number of configurations, without performing the full extraction and extrapolation procedures foreseen by the CSCP protocol, the parity plot of the new NEB@MLIP -3′ shows a much more homogeneous agreement with respect to DFT, particularly in the higher-energy region, as shown in Fig. S3 of the SI. Moreover, the energy barrier is now predicted to be 1.29 eV, which compares very well with the DFT barrier of 1.33 eV for this step.
A more delicate situation occurs when high-coverage species significantly perturb the catalyst surface, as in the case of subsurface interstitials: this case when the high-coverage perturbation drastically modifies the catalyst is better seen under point (ii). Point (ii) is related to a basic assumption of the CSCP approach in its present version, that is, the use of single-crystal facets as structural models of the catalyst, which excludes less conventional materials or phenomena, such as nanostructured configurations or non-adiabatic processes. Both of these non-conventional factors are likely to play a role in CO2 hydrogenation on the industrial catalyst (CO2-Hydr). Indeed, it is known that the ZnO support drastically promotes the catalytic activity of Cu nanoparticles in CO2-Hydr, such that the catalytically active sites are thought to lie at the interface with the support, whence an ensemble of non-classical configurations have been proposed to account for and rationalize experimental observations,16,18–21,24 including partial ZnO reduction to Zn and successive alloying with Cu.30 It is possible to include these exotic configurations as worked-out systems in a “nanostructured” CSCP approach and compare with experiment (we are indeed working along these lines), but clearly the energetics of nanostructured systems will be different from that of the regular ones considered in the present work. Along another perspective, non-adiabatic thermal phenomena have also been argued to play a role in high-temperature reactions, e.g., in ethylene oxidation on Cu catalysts as monitored via operando transmission electron microscopy.31 These phenomena may play a role also in CO2-Hydr. Using an argument by analogy, switching to the electrochemical (non-thermal) CO2RR, several studies indicated that non-adiabatic bias-induced phenomena give rise to under-coordinated32–34 or oxygen-containing35–37 Cu surfaces, see a recent detailed investigation of CO2RR reaction intermediates.38 These undercoordinated or oxygen-containing sites are critical to favor CO2RR with respect to HER on regular Cu facets, which per se exhibit a very low faradaic efficiency. The robustness of the CSCP MLIPs in MD simulations as reported above gives us confidence that non-adiabatic phenomena can be included in our approach using appropriate configurations to generate the MLIP training database, but the assumption of regular and static facets held so far in CSCP precludes a direct comparison with experiment at this stage.
An indirect validation comes from comparison with previous computational simulations. Apart from technical details, our DFT@PBE-plus-dispersion approach is equivalent (and also quantitatively very close) to that employed in previous theoretical work32,34,38 which, after inclusion of implicit solvation and bias effects, compared accurately with experimental observations. The DFT@PBE-plus-dispersion approach (after including dispersion and implicit solvent) has thus been shown to give reasonably accurate results in comparison with experiment.39–41 Additionally, the same DFT approach has been employed in simulations including explicit water molecules, and used to predict reaction mechanisms42,43 and reaction intermediates,44–46 then validated by in situ experimental vibrational spectroscopy.38,47
Despite these present limitations of the approach, one important question is whether we can nevertheless derive some experimental implications or suggestions or testable predictions for the new (Ni) system. In Fig. 6 we compare the reaction energy diagrams for the most favorable path of CO2-Hydr for Cu(100) derived from our DFT search and for Ni(100) derived from the CSCP approach. Note that the diagrams of Fig. 6 are divided into two sections, where on the left before the dashed vertical line we consider CO2 adsorption on a metal surface with only two hydrogen adatoms and two H2 molecules in the gas phase (CO2 + H + H + 2H2 → HCOO + H + 2H2), whereas on the right after the dashed vertical line all the hydrogen molecules are dissociated and the surface contains a number of H adatoms ranging from zero (CH3OH + H2O) to 5 (HCOO + 5H). Clearly, the diagrams for Cu and Ni are quite different. First, it is apparent that the energy of the states on the Ni(100) catalyst increases much more quickly than on the Cu(100) system. This is due to the fact that Ni interacts much more strongly with hydrogen. This explains why we have a drop of 1.18 eV when we adsorb 4 hydrogens on the Ni surface, accounted for by a break between the HCOO states before and after the dashed vertical lines. This means that Ni(100) must escape from a thermodynamic dip to hydrogenate CO2, thus being a worse catalyst than Cu(100). Note also that in Fig. 6 we report energies, to be converted to free energies under the actual conditions, so naturally high H2 pressures and high temperatures would decrease substantially the thermodynamic penalty associated with first generating the H2COOH intermediate and then dissociating it to CH2O + OH on Cu(100). The stronger interaction of Ni with formaldehyde with respect to the methoxy and methanol intermediates instead contributes to making Ni(100) a worse catalyst than Cu(100). Finally, to the best of our knowledge, there are no experimental results that could be compared with a possible Ni–Cu alloyed catalyst for thermal CO2-Hydr (that we could derive as a straightforward follow-up of the present study10), although Ni–Cu systems have been demonstrated as efficient dual atom catalysts in electrochemical CO2RR.48
Naturally, as discussed in Section 3.1, the present contribution represents a proof-of-principle study. The industrial working catalyst for CO2 thermal hydrogenation is made of Cu nanoparticles on a ZnO support, where the intimate interaction between the metal and the oxide components has been shown to play a critical role,18–24 a catalyst/support phenomenon that we do not consider here. We believe however that this type of phenomena are within the realm of an appropriately modified CSCP approach, which we defer to future studies. We leave for a future investigation also the introduction of solvation and bias effects, as needed to study CO2 electrocatalytic hydrogenation, for which the Cu fcc(100) surface is a well studied model of a catalytic facet. The structural database we have derived here should be a robust starting point for such an investigation, once the physics of reactant/electrode interactions49 is appropriately included within the CSCP framework.
Spin-unrestricted calculations were employed in both DFT optimizations and NEB@DFT simulations. The face-centered cubic (fcc) bulk structures of Cu and Ni were optimized using vc-relax calculations in the QE suite to obtain the lattice parameters of Cu and Ni in their bulk forms. The calculations were performed using GBRV ultra-soft pseudopotentials52 together with the Perdew–Burke–Ernzerhof (PBE) exchange-correlation functional.53 A plane-wave basis set with a cut-off energy of 40 Ry was used for the wave functions, while the charge density was expanded with a cut-off of 200 Ry. A 3 × 3 × 1 k-point mesh was used for both structural optimizations and NEB calculations. An electronic energy convergence threshold of 10−6 Ry was applied.
The Cu(100) and Ni(100) surfaces were modeled using five-layer slabs composed of 45 metal atoms, with a 15 Å vacuum space applied along the z-direction to eliminate interactions between periodic images. The initial magnetization parameter (starting_magnetization) was optimized and set to 0.01 for Cu(100) and 0.3 for Ni(100). Cu and Ni atoms in the top three layers of the surface were allowed to move freely, while the bottom two layers were fixed during the optimization. Details of the input parameters for structural optimization and NEB calculations are provided in Section S1 of the SI.
We first carried out structural optimization of the intermediates (e.g., COOH, HCOO, CHOH, CH2OH, OH, CH2OH–OH, and CHOH–OH; see Scheme 1) on several adsorption sites of the Cu(100) surface including top, bridge, and 4-fold sites to identify the most stable adsorption configurations. These optimized structures were then used as the initial and final states for the subsequent NEB calculations. At this step, we exploit information obtained from several iterations of the DFT optimization by extracting XYZ coordinates together with their forces and energies through an algorithmic sampling technique. Based on the convergence series of the optimization, the structures change markedly during the very first iterations. Our CSCP approach targets this period when sampling for database construction, capturing coordinates around the local-minima basins rather than only the final optimized structures. Coordinates with their corresponding forces and energies in the neighborhood of the local minima are valuable for training the MLIPs, as they may guide the models to navigate these regions correctly. The algorithmic sampling scripts select 20 XYZ coordinates, primarily from the initial iterations, as well as the final structure obtained from the optimization. All extracted coordinates from the optimization tasks are gathered and combined with the NEB coordinates (details provided in the next following section) to form Database 1 of the Cu worked-out case.
We applied NEB and Climbing-Image Nudged Elastic Band (CI-NEB) calculations54,55 to identify the transition-state structure for each reaction step in Scheme 1 and to determine the corresponding energy barriers, enabling construction of the overall energy profile of CO2 hydrogenation on the Cu (100) surface. We first carried out the TS search with 12 intermediate images in the NEB calculation to estimate the activation energy (Ea) and obtain approximate reaction coordinates along the reaction pathway. It is worth noting that CO2 hydrogenation involves bulky steric fragments such as CHOH–OH, CH2OH–OH, and CH3OH + H2O on the metal surfaces, so in several cases the number of intermediate images was increased to 18 to achieve convergence in the NEB search. We then performed the CI-NEB calculation using the reaction coordinates from the initial NEB runs to accurately locate the transition structure and its energy barrier. The energy barrier (Ea, eV) was computed using:
| Ea = ETS − Ereactant |
After obtaining MLIP-3 for both the Cu and Ni systems, we assessed their performance by performing NEB calculations using the MLIP potentials (NEB@MLIP). For technical implementation, the MACE model, as implemented in the LAMMPS software,56 was interfaced with the OPTIM code.57 We applied MLIP-3 as the force field in conjunction with the OPTIM code to perform NEB calculations. We note that the input parameters at the MLIP level, such as the energy threshold, supercell dimensions, 15 Å vacuum spacing along the z-direction, and fixation of the two bottom layers (see details of the metal slab models), were kept consistent with those used in the DFT optimizations and NEB calculations to ensure technical accuracy and reliability. The input parameters for NEB@MLIP simulations were listed in Section S2 in the SI.
In each reaction step, we carried out three NEB@MLIP simulations using different numbers of intermediate images (12, 24, and 48). The accuracy of the MLIP-predicted energy barriers was evaluated using the equation as ΔE = ECI-NEB@DFT − ENEB@MLIP where ECI-NEB@DFT is the barrier obtained from CI-NEB@DFT, ENEB@MLIP is the barrier predicted by NEB@MLIP-3. The resulting trends in barrier height and convergence behavior are summarized in Table 2 for Cu and Table 3 for the Ni catalyst.
The NEB@MLIP simulations were also employed to generate new datasets during the active-learning stages of the CSCP framework. They serve two purposes: (1) performing NEB calculations and extracting the final set of coordinates to be added to the database, and (2) providing input structures for extrapolation to generate additional datasets (see Methodology section for details). NEB@MLIP-x simulations (x = 1 for active-learning cycle 1 and x = 2 for cycle 2) were performed for all reaction steps using 48 intermediate images. Employing a large number of images helps capture coordinates that reflect not only bond breaking and forming but also rotations and reorientations of adsorbed species, which is valuable for extending the accuracy range of the MLIP model. Note that we do not perform fine-tuning of a previous MACE MLIP for a new system. For example, we did not fine-tune the MLIP of a Ni system using the previously developed MLIP for Cu. Instead, we generate an initial MLIP (MLIP-1) for Ni by training it from scratch on a database for Ni which is constructed by properly rescaling configurations from the Cu database. The validity range of MLIP-1 for Ni systems is then expanded through two active-learning iterations, resulting in MLIP-2 and MLIP-3. Additional database entries from the active-learning cycles, obtained by (1) extracting structures from geometry relaxation and NEB@MLIP simulations and (2) applying extrapolation techniques on NEB@MLIP coordinates, are incorporated into the dataset on which we train MLIP-2 and MLIP-3. To complete the database, on the so-generated database configurations, we perform the single-point calculations using the Quantum ESPRESSO suite to obtain their energies and forces at the DFT level.
Supplementary information: Table S1 lists the input parameters used for MACE parametrization of MLIP. Table S2 lists the reaction energy simulated from NEB@DFT and NEB@MLIP. Fig. S1 shows the possible adsorption sites for intermediates on the metal (100) surface. Fig. S2–S8 present parity plots comparing single-point DFT energies with NEB@MLIP-3 energies for coordinates along the reaction steps: (e), (i), (m), (n), (x) for the Cu case and (g), (x) for the Ni case (see the full reaction paths in Tables 2 or 3). Section S1 describes the DFT calculation parameters, and Section S2 provides the NEB@MLIP calculation parameters. See DOI: https://doi.org/10.1039/d5fd00132c.
| This journal is © The Royal Society of Chemistry 2026 |