Open Access Article
Calvin Pietersa and
Alon Grinberg Dana
*abc
aWolfson Department of Chemical Engineering, Technion – Israel Institute of Technology, Haifa 3200003, Israel. E-mail: alon@technion.ac.il
bGrand Technion Energy Program (GTEP), Technion – Israel Institute of Technology, Haifa 3200003, Israel
cBoeing-Technion SAF Innovation Center, Technion – Israel Institute of Technology, Haifa 3200003, Israel
First published on 10th June 2026
Accelerating the discovery of complex chemical systems, from sustainable aviation fuels to atmospheric models, requires the rapid determination of thousands of elementary rate coefficients, a task fundamentally bottlenecked by traditional, low-throughput transition-state searching. Here we develop a high-throughput digital pipeline and a reaction-aware geometric message-passing framework for predicting the three parameters of the modified Arrhenius equation directly from molecular structure. A dataset of ∼1800 hydrogen-abstraction reactions was generated using automated workflows and high-level electronic-structure calculations. By incorporating reactive-atom distance (RAD) features – a novel data representation that solves the “spatial blindness” of standard molecular graphs – the model achieves a cross-validated median error of 0.285 dex (∼1.9×) in k(T) across 300–3000 K. While accuracy is modestly lower in heteroatom-rich environments, the framework robustly captures the underlying structural trends and directly yields the complete Arrhenius parameter triplet, ensuring a rigorous, continuous temperature dependence across the entire evaluated range. These results establish reaction-aware representation learning as a scalable strategy to replace weeks of quantum chemical compute with near-instantaneous inference, providing a clear path for the data-driven acceleration of kinetic modeling.
Determining the rate coefficients with high confidence is not trivial. These reactions often involve transient radicals18 and are important at extreme conditions,19–21 complicating direct experimental kinetic measurements and limiting the available data to narrow temperature ranges. While experimental determination of rate coefficients remains the gold standard, particularly for validating critical reactions, it is often impractical to characterize all important individual steps in complex kinetic networks. The time, cost, and difficulty of radical reaction experiments—especially those involving short-lived intermediates like ˙OH or H˙ radicals—restrict experimental efforts to a limited set of major reactions.22–24
As experimental determination of rate coefficients remains limited by cost and scope,25 quantum chemical methods combined with transition state theory (TST) have become the primary means of determining reaction kinetics.2,26–28 In principle, this enables the construction of kinetic models purely from ab initio parameters, providing predictive insight even in the absence of experimental data. In practice, however, the generation of accurate rate coefficients across large reaction networks remains computationally demanding. Modern tools such as CREST,29 EStokTP,30 and ARC31 have automated many steps—conformer generation, transition-state (TS) searching, and refinement—but the exploration of the vast TS conformational space continues to be a major bottleneck. Even for reactions with only a few degrees of freedom (e.g., two key dihedral angles adjacent to the reaction zone), identifying the correct TS conformer can require dozens of high-level quantum calculations and still often demands expert supervision, making the process time-consuming, especially when scaled. TS searches typically involve iterative optimization or heuristic guidance and must be validated via intrinsic reaction coordinate (IRC) calculations.32 Vibrational frequency analysis is then used to confirm a single imaginary mode and to compute partition functions for TST rate evaluation. For flexible molecules, multiple candidate TS geometries arising from distinct rotamers may each require separate optimization and validation, adding to the computational burden. To refine energetic accuracy, high-level single-point energy calculations such as CCSD(T) are usually performed on lower-level optimized geometries, but their steep scaling, ∼O(N7), confines their use to small systems or a limited subset of benchmark reactions. Consequently, in detailed kinetic mechanisms—such as those used in combustion—only a small fraction of reactions are computed at the highest levels of theory, while most are estimated using more approximate quantum chemical or empirical approaches.2,33,34 This imbalance introduces substantial uncertainty, as missing or inaccurate rate coefficients can propagate unpredictably through reaction networks.35,36 Thus, while ab initio kinetics provide a rigorous theoretical foundation, their broad application remains constrained by computational cost, scalability, and the complexity of exploring the TS landscape.
The low-throughput nature of the quantum computations involved creates an iterative bottleneck that prevents the rapid exploration of the thousands of individual rate coefficients required to model complex chemical systems, such as sustainable aviation fuels (SAF) or atmospheric oxidation pathways. To meet these demands, the field requires a transition from manual, case-by-case mechanistic studies toward scalable, data-driven kinetic modeling that can instantly generalize across diverse chemical spaces.
To bridge this gap, chemists have turned to systematic data-driven and knowledge-based methods to fill in missing kinetic data. For decades, rate rules and correlation formulas have served as the workhorses of reaction rate estimation. Early empirical relationships such as the Evans–Polanyi correlation approximate activation energies as linear functions of reaction enthalpy.37,38 Another foundational method is group additivity, pioneered by Sidney Benson, which estimates thermochemical properties—and later kinetic parameters—by summing contributions from molecular substructures.39,40 Notably, Saeys et al. developed an ab initio group additivity scheme for hydrogen-abstraction reactions,41 while Sumathi et al. applied group-based rules to estimate rate coefficients for alkane abstractions by H˙ and ˙CH3 radicals.42
Building on these ideas, the Reaction Mechanism Generator (RMG)43 utilizes a rule-based rate estimation approach, whereby a new reaction is mapped onto the most specific applicable node in a hierarchical kinetic tree; if no exact match exists, RMG falls back to parent nodes or averages over sibling rules. The rules are organized in decision trees whose branching reflects increasing structural specificity. This scheme enables rapid and interpretable estimation of missing rate coefficients at negligible computational cost.43,44 However, its accuracy ultimately depends on the coverage and granularity of the underlying training reactions and the expressiveness of the group definitions; extrapolation beyond the represented chemical space, or into environments where important non-local interactions are not explicitly encoded, can lead to unreliable estimates.43,45 To address these challenges, RMG incorporates automated tree generation methods that construct and update rate-rule hierarchies directly from training reactions using the Subgraph Isomorphic Decision Tree (SIDT) algorithm.45,46 This automation improves scalability and consistency across reaction families, though the estimator's performance remains limited by sparse data coverage and the local nature of the group-based representation. These ongoing limitations motivate a shift toward more flexible, data-driven approaches capable of learning reactivity patterns directly from the molecular structure.
Machine learning (ML) has rapidly emerged as a powerful tool to overcome the limitations of traditional kinetics calculations, enabling fast and accurate predictions of reaction parameters directly from molecular structure. Recent studies have demonstrated that graph-based neural networks (NNs), including message passing neural networks (MPNNs), can predict activation energies with errors as low as 2 kcal mol−1 to 3 kcal mol−1, rivaling DFT-level accuracy.47,48 For example, Grambow et al. applied a variant of the MPNN model, called DMPNN (Directed Message Passing Neural Network) to over 12
000 reactions and achieved remarkably accurate activation energy predictions across multiple reaction types.47 More recently, Spiekermann et al. showed that the same DMPNN trained on ∼24
000 reactions could predict barrier heights to within ∼2.6 kcal mol−1 of coupled-cluster reference values – outperforming even high-quality DFT – while being many orders of magnitude faster to evaluate.49 These successes demonstrate the game-changing potential of ML: once properly trained, such models can generalize to new reactions and instantly return chemical parameters that would otherwise require expensive, time-consuming quantum chemical calculations.
Recent studies have also focused on improving ML performance in data-scarce regimes, a common challenge in chemistry. Chang et al.48 demonstrated that incorporating global molecular descriptors, such as overall charge or electron count, into graph neural networks (GNNs) can modestly improve barrier height predictions (though often loosely referred to as activation energies in the ML literature) when high-level training data are limited. They also explored strategies such as Δ-learning, where the model learns the difference between a low-level and a high-level calculation, achieving high accuracy with far fewer expensive computations.
Hydrogen-abstraction reactions have become a prime testing ground for these approaches because of their central role in combustion, atmospheric oxidation, and radical chemistry, coupled with the difficulty of obtaining accurate rate coefficients experimentally. While the Spiekermann et al.49 work contains many hydrogen-abstraction reactions, it focuses on barrier heights (Ea), leaving the Arrhenius pre-exponential factor (A) and temperature exponent (n) in the modified (three-parameter) Arrhenius expression largely unexplored. A few recent studies have begun to address this gap, particularly for specific hydrogen-abstraction families.
Yu et al.8 developed a deep NN model to predict rate coefficients for hydrogen-abstraction reactions between alkyl esters and atomic hydrogen, successfully reproducing temperature-dependent rate coefficients from high-level calculations. In a separate study, Yu et al.50 proposed a hybrid ML approach for ˙CH3 + alkane abstractions, combining gradient boosting decision trees (XGBoost) with a feed-forward NN to predict A, n, and Ea from bond dissociation energies and steric/electronic descriptors. Using six optimized descriptors, this hybrid model reduced the cross-validated RMSE from 0.239, Feedforward Neural Network (FNN) alone, to 0.152 and the average deviation from 74% to 42% on the prediction set, and achieved residual standard errors of 0.07–0.16.
Xia et al.51 extended the hybrid XGB–FNN framework to
+ alkane abstractions, training on thousands of TST-derived rate coefficients with descriptors including bond dissociation energies, steric factors, and molecular symmetry numbers. For ˙OH + alkane hydrogen abstraction, this hybrid model achieved median leave-one-out deviations mostly below 100% with training, validation and prediction set errors of 46.2%, 45.4%, and 89.1% respectively – meeting the “accurate” threshold by combustion-modeling standards.52 Performance was weaker for
+ alkane reactions, with corresponding deviations of 148–191%, reflecting the smaller, less diverse training set and the need to extrapolate to larger, more complex molecules. Importantly, the ˙OH model reproduced well-known site-specific reactivity patterns (primary < secondary < tertiary barrier heights), consistent with prior ab initio studies, whereas the
model predicted more uniform reactivity across sites. Both models generalized reasonably to unmeasured alkanes up to C16H34 and produced temperature trends that fit smoothly to a modified three-parameter Arrhenius form, indicating physically plausible extrapolation beyond the training domain.
Despite this progress, four persistent challenges limit the deployment of ML-predicted H-abstraction rate coefficients in large-scale kinetic models: (i) severe data scarcity, with most datasets restricted to a single radical and/or substrate class; (ii) incomplete prediction of the Arrhenius triplet, with many studies focusing on Ea alone; (iii) limited generalization across radical–substrate combinations, with accuracy dropping sharply outside the training domain; (iv) lack of physics-constrained rate prediction frameworks enforcing Arrhenius consistency. Addressing these challenges is essential for producing physically reliable, broadly applicable rate predictions that can replace costly high-level calculations in combustion and atmospheric chemistry.
Here, we present a graph neural network (GNN) framework for predicting the full Arrhenius triplet (A, n, Ea) for hydrogen-abstraction reactions, trained on a new database of ∼1800 diverse reactions. TS geometries were optimized at the ωB97X-D/def2-TZVP level of theory, followed by single-point energy refinements using DLPNO-CCSD(T)-F12/cc-pVTZ-F12 to obtain high-accuracy barrier heights. The resulting dataset spans a broad range of radicals and substrates, including carbon-, oxygen-, nitrogen-, and sulfur-centered radicals and both saturated and unsaturated donor substrates.
While topological representations provide a strong foundation, standard 2D molecular graphs often fail to capture the subtle electronic and spatial effects governing TS structures and energetics. To bridge this gap, recent studies have successfully integrated quantum-mechanical (QM) descriptors such as on-the-fly predicted atomic charges, bond orders, and Fukui indices as explicit node or edge features in GNNs to improve barrier prediction and regioselectivity.53–55 Furthermore, alternative reaction representations, such as Condensed Graphs of Reaction (CGR) and unified reaction-level graphs have been shown to consistently outperform reactant-only baselines by explicitly encoding the mechanistic transformation into the graph structure.56,57 Concurrently, the rise of 3D-aware architectures, which incorporate explicit reactant/product coordinates or extract learned descriptors from generated TS geometries, has demonstrated the critical importance of spatial information for activation energy prediction.58,59 Building on these methodological foundations, the present approach provides a highly targeted, reaction-aware geometric representation that captures essential TS physics without the substantial computational overhead or noise sensitivity associated with full 3D equivariant message passing or on-the-fly coordinate generation.
The model is based on the Directed Message Passing Neural Network (DMPNN) architecture, incorporating a physics-informed Arrhenius output head that predicts (A, n, Ea) and deterministically computes ln
k(T) over a sampled temperature range during training, thereby enforcing internal curve-fit consistency. We also explore the Communicative MPNN (CMPNN)60 as an alternative encoder, enabling a direct comparison of different message-passing strategies. By combining molecular graph representations with a multi-target kinetic loss, our framework predicts the complete Arrhenius parameter set and captures continuous structure–reactivity relationships across diverse radical–substrate families.
For each reaction, ARC identified the lowest-energy conformers of all reactant and product species using a search algorithm that uses random RDKit61 MMFF94s62 conformers, and performed geometry optimizations and harmonic vibrational frequency calculations at the ωB97X-D/Def2-TZVP level of theory. All Density-Functional Theory (DFT) calculations were carried out using Gaussian 09.63 These vibrational frequencies were used to compute zero-point energy correction.
![]() | (1) |
Transition-state (TS) guesses were generated using two independent workflows, ARC heuristics and AutoTST.64 Conformational sampling was performed with CREST29,65–69 while constraining reactive-region distances. All TS candidates were optimized at the ωB97X-D/Def2-TZVP level and verified by a single imaginary frequency corresponding to hydrogen transfer. Details of the ARC hydrogen-abstraction heuristic are provided in Section S2.
For all optimized reactants, products, and transition states (TSs), high-level single-point energies were computed using the DLPNO-CCSD(T)-F12 method70 with the cc-pVTZ-F12 orbital basis set,71 as implemented in ORCA (version 5.4).72,73 The RIJCOSX74 approximation was used to accelerate the Hartree–Fock exchange and Coulomb integrals. For density fitting, the aug-cc-pVTZ/C auxiliary basis75 (for correlation) and the def2/J auxiliary basis76 (for Coulomb fitting) were employed. The cc-pVTZ-F12-CABS complementary auxiliary basis77 was used for the resolution of the identity in the F12 integrals and for the CABS singles correction to the Hartree–Fock energy.
Confirmed TS geometries, together with all optimized species, DFT vibrational frequencies, and DLPNO-CCSD(T)-F12 single-point energies, were subsequently processed using Arkane.78 Arkane combines the high-level electronic energies with DFT-derived zero-point and thermal corrections to compute thermochemical properties and transition-state-theory (TST) rate coefficients. Internal rotors were not treated explicitly; for practical high-throughput workflow robustness, all species were modeled within the rigid-rotor harmonic-oscillator (RRHO) approximation. Quantum tunneling effects were incorporated using the Eckart correction.79 External symmetry numbers were determined by Arkane.78
TST rate coefficients k(T) were computed and fitted to the modified Arrhenius expression to obtain the pre-exponential factor A, temperature exponent n, and activation energy Ea via nonlinear least-squares regression over the 300–3000 K temperature range. The resulting dataset, including optimized geometries, vibrational frequencies, high-level electronic energies, and fitted Arrhenius parameters, is publicly available on Zenodo80
Each hydrogen-abstraction reaction was represented as a two-component molecular graph: the hydrogen donor (R1H) and the acceptor (R2H), both taken here in the H-atom bearing form (see eqn (1)). Optimized 3D geometries from ARC were stored in Structure Data Files (SDF format), ensuring consistent atom and bond indexing. Baseline atom and bond features followed the Chemprop81 D-MPNN featurization scheme, implemented using RDKit.61 Our training and evaluation pipeline builds on the Chemprop codebase and extends it to multi-component reaction inputs and Arrhenius-parameter prediction. Atom features included atomic number, number of directly bonded neighbors, formal charge, chirality, hydrogen count, hybridization, aromaticity, and normalized atomic mass. Bond features comprised null-bond indicators, bond order, conjugation, ring membership, and stereochemistry. These features served as inputs to the MPNN.
Beyond the standard Chemprop atom features, we considered an atom-augmented representation that incorporated per-atom electronic descriptors derived from the quantum chemical computations. Specifically, we included the Mulliken atomic charge, the atomic polar tensor charge, and the Mulliken spin-density magnitude for open-shell species, which together provided a local, geometry-independent description of charge distribution and radical character.
In addition, atoms were annotated with binary role indicators encoding the known reaction center for hydrogen abstraction. The donor heavy atom, acceptor heavy atom, and transferring hydrogen were explicitly identified; the hydrogen was defined as Hdonor in the reactants and as Hacceptor in the products. First-neighbor atoms of the donor and acceptor centers were also labeled. These role flags were concatenated into the atom feature vector. By explicitly flagging the breaking and forming bonds, this geometry-free baseline topological representation fully specifies the reaction mapping. It functions similarly to the atom-mapped CGR approach used in reaction property prediction models.82 Consequently, performance gains observed in our geometry-enhanced configurations can be attributed to the inclusion of 3D spatial information rather than to the mere identification of the reaction center.
Seven feature configurations were evaluated, organised along two orthogonal design axes—RAD treatment (none/full-graph/path-restricted) and edge-level geometry (off/on)—together with a Baseline reference. (i) Baseline: standard DMPNN representation with atom and bond features but no continuous geometric information. (ii) Atom: extends the Baseline with per-atom physicochemical descriptors and role-anchored binary indicators. (iii) RA: adds full-graph reaction-anchored RAD geometric descriptors (radial, angular, dihedral) computed for every atom reachable from Href. (iv) Path: same as RA but with RAD geometric quantities defined only along simple covalent paths between Href and the focus atom. (v) Geom: omits RAD descriptors and instead provides edge-level continuous geometric features (bond distances, bond angles, dihedral angles) derived from the optimized reactant structures. (vi) RA + Geom: the union—full-graph RAD plus edge-level continuous geometry, with RAD values and existence indicators masked beyond four graph hops from Href. (vii) Path + Geom: path-restricted RAD plus edge-level continuous geometry, again with RAD masked beyond four graph hops. Table 1 summarises these configurations.
| Config. | Mode | Atom-augmented | Node-level RAD | RAD scope | Edge-level geom. |
|---|---|---|---|---|---|
| (i) | Baseline | — | — | — | — |
| (ii) | Atom | ✓ | — | — | — |
| (iii) | RA | ✓ | ✓ | Full graph | — |
| (iv) | Path | ✓ | ✓ | Path-restricted | — |
| (v) | Geom | ✓ | — | — | ✓ |
| (vi) | RA + Geom | ✓ | ✓ | Full graph (4-hop masked) | ✓ |
| (vii) | Path + Geom | ✓ | ✓ | Path-restricted | ✓ |
The RAD descriptors encode local abstraction geometry in a reaction-centered reference frame. The reactive hydrogen, Href, is identified via reaction mapping. For each atom v in the molecular graph, geometric quantities are computed relative to Href. Pivot1 is the heavy atom covalently bonded to Href, and Pivot2 is a heavy neighbor of Pivot1 selected by minimal graph distance to Href. For each focus atom v, we compute the radial distance
| r = ‖Href − v‖, |
| θ = ∠(Href, Pivot1, v), |
| τ = ∠(Href, Pivot1, Pivot2, v). |
Bond angles θ are retained in their raw angular form. Dihedral angles τ are represented using sine and cosine components to preserve periodicity and avoid discontinuities at ±π.
In the path-restricted RAD variant, geometric quantities are defined only along simple covalent paths between Href and v. Let P denote the simple path between these atoms. The radial distance r is retained only if |P| = 2, the bond angle θ only if |P| = 3, and the dihedral τ only if |P| = 4, using intermediate vertices in canonical order as geometric pivots. Quantities that do not satisfy these path-length constraints are set to zero and accompanied by binary existence indicators.
To assess the spatial extent of reactive geometry, we introduce a locality-masked RAD variant. Let dgraph(Href, v) denote the shortest-path bond distance between Href and atom v. For atoms satisfying
| dgraph(Href, v) > h, |
In the edge-geometric configurations, continuous geometric features are constructed for each directed bond i → j from the 3D reactant geometry (Fig. 1). The Euclidean bond length dij is expanded using a radial basis function (RBF) representation,
| ϕk(dij) = exp[−γ(dij − µk)2], |
Local angular structure is incorporated by aggregating bond angles θijk = ∠(i, j, k) around the terminal atom j using mean-pooled sine–cosine encodings over neighboring atoms k. Torsional structure about the central bond (i, j) is incorporated analogously by mean pooling sine-cosine encodings of dihedral angles τkijl = ∠(k, i, j, l) over valid (k, l) combinations. A detailed mathematical specification of the neighbor sets and pooling operations is provided in Section S4.
Two classes of global representations were considered. First, circular Morgan fingerprints (ECFP) were computed with radius r = 2 and length 2048 using RDKit, evaluated in both binary (bit) and count (occurrence) variants.83 Second, we used 200-dimensional RDKit2D descriptors generated via Descriptastorus.84 The RDKit2D features were normalized using empirical cumulative distribution functions to ensure consistent scaling across heterogeneous descriptors. This design enabled a direct evaluation of whether whole-molecule context provides complementary information beyond local reactive and geometric descriptors.
From the resulting donor and acceptor embeddings, denoted hd and ha, respectively, we constructed bidirectional composite vectors h
a = [hd; ha] and h
d = [ha; hd]. Because the hydrogen-abstraction reaction family is formally reversible yet chemically asymmetric with respect to radical localization and bond polarity, we evaluated both order-aware and order-invariant treatments of the donor–acceptor pairing. Both DMPNN and CMPNN encoders were evaluated under each order treatment.
![]() | ||
| Fig. 3 Order-invariant architecture. Directional concatenations are symmetrically averaged before Arrhenius prediction. | ||
Each head predicted the three modified Arrhenius parameters (A10, n, Ea), where A10 = log10
A. These outputs were passed through a differentiable Arrhenius layer to compute temperature-dependent rate coefficients ln
k(T) over a fixed temperature grid. Specifically,
![]() | (2) |
The model therefore predicts Arrhenius parameters rather than rate coefficients directly; temperature-dependent rates are obtained deterministically from the predicted parameters. This design constrains predictions to the modified Arrhenius functional form, embedding physical structure directly into the learning objective.
, Êa). These parameters are passed through a differentiable Arrhenius layer to obtain temperature-dependent rate coefficients. Because A10 = log10
A while rate coefficients are modeled in natural-log space, conversion of the prefactor introduces a factor of ln
10.Rates are evaluated on a fixed temperature grid from 300 to 3000 K in 100 K increments:
This formulation enforces cross-temperature consistency between the predicted Arrhenius parameters and the resulting rate trajectory, reducing compensating errors among (A10, n, Ea).
The loss combines parameter regression and trajectory fidelity terms. Because training is performed in ln
k space, additive errors correspond to multiplicative deviations in rate constants. Parameter errors are evaluated using the Huber86 (smooth L1) loss,
k corresponds to a fixed factor error in k, independent of magnitude. Since all reactions in the dataset are reversible, metrics are computed separately for the forward and reverse directions and their average is reported,Within each outer fold, model selection uses a pair-aware Kennard–Stone (KS) procedure.87 Group representatives are embedded as binary Morgan fingerprint vectors x, y ∈ {0,1}2048 (radius 2) and compared using the generalized Tanimoto similarity,88
| Dpair = min(DDD + DAA, DDA + DAD), |
The search space spanned both architectural and training parameters, including message-passing depth and width, feedforward head width, dropout rate, learning-rate schedule, order-handling mode, and optional global features.
Following hyperparameter selection, the top-ranked configurations were evaluated using a fixed 10-fold, donor–acceptor pair – grouped cross-validation procedure. For each fold, the model was trained on nine folds and evaluated on the held-out fold, and out-of-fold (OOF) predictions were aggregated so that each reaction was evaluated exactly once as unseen data. We report mean ± standard deviation across test folds. Identical optimization protocols were applied to all encoder and feature variants.
k(T) was standardized across T ∈ [300, 3000] K. For RAD features, radii were expanded using radial basis functions and dihedrals were encoded as (sin, cos). A single input scaler was fit on training data within each fold. Experiments used fixed random seeds and deterministic kernels, while split signatures and KS seeds were logged. Full hyperparameter search ranges, selected configurations, and training scripts are provided in the public repository (see Code availability).
(1) Reactant optimization. Optimize the hydrogen donor (R1H) and hydrogen acceptor (R2H) structures at the ωB97X-D/def2-TZVP level of theory, consistent with the training data.
(2) Electronic descriptor extraction. Extract per-atom Mulliken charges and spin densities from the DFT output. Utility scripts provided in the repository parse Gaussian or ORCA log files and generate a tabular (CSV) file containing atom-indexed electronic descriptors.
(3) Structure export. Export the optimized reactant geometries in Structure Data File (SDF) format, preserving atom indexing and connectivity.
(4) Model inference. Provide the donor and acceptor SDF files together with the corresponding electronic-descriptor CSV files to the pretrained model. During featurization, atomic descriptors are merged with graph-based and geometric features to construct the full node and edge representations. The model predicts the modified Arrhenius parameters (A10, n, Ea) and deterministically computes temperature-dependent rate coefficients ln
k(T).
Because the workflow requires only DFT geometry optimizations and descriptor extraction, the computational cost is substantially lower than a full transition-state-theory treatment involving TS optimization, vibrational frequency calculations, intrinsic reaction coordinate analysis, and high-level single-point refinements.
Donor-labeled sites are predominantly carbon-centered, with
as the most frequent atom type and
as the principal secondary class;
and
environments also contribute appreciably. Acceptor-labeled sites exhibit a broadly similar composition, likewise dominated by carbon with oxygen and nitrogen contributions, but include a modest
subset and a slightly longer low-frequency tail (e.g.,
,
, and halogens). Fig. 4 summarizes these empirical role-labeled distributions.
These site-level statistics indicate that within this dataset, hydrogen abstraction most frequently involves saturated (sp3) centers on both sides of the labeled reaction role, with a substantial minority of cases involving sp2 hybridized sites (Fig. 5A). Atoms labeled as acceptors are modestly more likely to be adjacent to π systems compared to donor-labeled sites (Fig. 5B). This difference reflects the statistical composition of the curated reactions rather than an intrinsic mechanistic asymmetry. Overall, the donor- and acceptor-labeled sites share largely similar hybridization profiles, with contextual differences confined primarily to π-adjacent environments.
vs.
), and (iii) core architectural and training hyperparameters (e.g., message-passing depth and width, aggregation type, dropout, and learning-rate schedule). To prevent leakage, the outer folds were grouped by unordered donor–acceptor InChIKey pairs so that both reaction orientations remained within the same fold. Candidate configurations were ranked by validation MAE(ln
k), and the top-ranked settings were re-evaluated using repeated group-preserving cross-validation. Full cross-validation results across feature sets and training objectives (including runs with and without the Arrhenius-layer consistency term) are reported in the Table S2.Table 2 summarizes model-selection results across feature and architecture variants. We denote node-level reactive annotations as RA (reaction-anchored features), edge-level continuous geometry as Geom, and their combination as RA + Geom. OA and OI refer to order-aware and order-invariant donor–acceptor handling, respectively, while M-bin and M-cnt indicate binary and count-based Morgan fingerprint global features. Across all geometry- and annotation-enhanced configurations, performance differences are relatively modest. The lowest cross-validated error is achieved by a DMPNN using combined node-level and edge-level geometry (RA + Geom), order-aware pairing, and Arrhenius-layer supervision. Several alternative augmented representations—most notably those using only edge-level geometry (Geom) or atom-augmented descriptors without explicit geometry—exhibit statistically indistinguishable performance within one standard deviation. Thus, while RA + Geom yields the numerically best result, multiple geometry- and annotation-enhanced configurations provide comparably reliable predictive accuracy.
k(T) (300–3000 K), ranked by MAElnk,avg.a
| Modeb | Globalc | Orderd | Sup.e | MAEf | MSE | R2 |
|---|---|---|---|---|---|---|
a Metrics are computed on physical ln k(T) and averaged over forward and reverse directions (MAElnk,avg).b Mode denotes the geometry and annotation channels provided to the encoder. Atom = atom-augmented baseline without explicit geometry; RA = role-anchored annotations with full-graph node-level RAD descriptors but without edge-level geometry; path = role-anchored annotations with path-restricted node-level RAD descriptors but without edge-level geometry; Geom = edge-level continuous geometric features (bond distances, angles, and dihedrals) without node-level reactive geometry (RAD); RA + Geom = full-graph node-level RAD combined with edge-level continuous geometry; RAD geometric values and existence flags are masked beyond 4 graph hops from the transferring hydrogen. Path + Geom is the analogous path-restricted variant.c Global features: M-bin/M-cnt = Morgan fingerprints (binary/count).d OA/OI denote order-aware and order-invariant donor–acceptor handling (see Fig. 2 and 3).e All models supervise (log10 A, n, Ea); “Sup.” indicates whether the additional Arrhenius-layer consistency loss on ln k(T) is applied during training.f Bold indicates the best-performing configuration within the DMPNN family.g Baseline rows are shown as a reference point and are not part of the MAE-ranked top six. The full Modes × MPNN result table including all evaluated configurations is provided as Table S2. |
||||||
| DMPNN | ||||||
| RA + Geom | — | OA | On | 0.952 ± 0.072 | 2.856 ± 0.682 | 0.952 ± 0.011 |
| Geom | — | OA | On | 0.979 ± 0.078 | 2.956 ± 0.765 | 0.950 ± 0.012 |
| Atom | — | OA | On | 0.999 ± 0.135 | 3.039 ± 1.004 | 0.948 ± 0.017 |
| RA + Geom | M-bin | OI | On | 1.005 ± 0.092 | 3.305 ± 0.822 | 0.944 ± 0.014 |
| RA + Geom | M-bin | OA | Off | 1.014 ± 0.113 | 3.294 ± 1.039 | 0.944 ± 0.017 |
| RA | — | OA | On | 1.015 ± 0.100 | 3.206 ± 0.817 | 0.945 ± 0.014 |
| Baselineg | — | OI | On | 1.364 ± 0.138 | 5.583 ± 1.465 | 0.909 ± 0.023 |
![]() |
||||||
| CMPNN | ||||||
| RA + Geom | M-cnt | OI | On | 0.954 ± 0.028 | 2.784 ± 0.121 | 0.931 ± 0.003 |
| Path | M-cnt | OI | On | 0.957 ± 0.017 | 2.969 ± 0.118 | 0.926 ± 0.003 |
| Path + Geom | — | OI | On | 0.974 ± 0.031 | 2.954 ± 0.157 | 0.927 ± 0.004 |
| RA + Geom | — | OI | On | 0.975 ± 0.033 | 2.821 ± 0.158 | 0.930 ± 0.004 |
| RA + Geom | M-bin | OI | On | 0.986 ± 0.029 | 2.881 ± 0.144 | 0.928 ± 0.004 |
| RA | M-bin | OI | On | 0.991 ± 0.015 | 2.938 ± 0.160 | 0.927 ± 0.004 |
| Baselineg | M-cnt | OI | On | 1.121 ± 0.013 | 3.524 ± 0.104 | 0.912 ± 0.003 |
CMPNN achieves similar cross-validated error across feature modes. However, we adopt DMPNN as the primary architecture due to its bond-localized inductive bias, established benchmarking history, and consistent stability across data regimes. For hydrogen abstraction, an edge-focused message-passing scheme provides a chemically natural bias, as the reaction is governed by localized bond cleavage and radical-centered rearrangements. We therefore select the DMPNN configuration with RA + Geom features, order-aware handling, and Arrhenius-layer supervision (without global molecular descriptors) as the reference model for all subsequent analyses.
In contrast, geometry-free baseline models perform substantially worse, with MAElnk values in the range 1.36–1.49, irrespective of donor–acceptor order handling or inclusion of global molecular fingerprints (see Table S2 in the SI). The systematic gap between baseline and augmented models demonstrates that enriching graph representations with reaction-aware geometric information yields a substantial and reproducible improvement in kinetic accuracy. While no single geometric encoding is uniquely decisive, the inclusion of local chemical context beyond topology alone is essential for accurate prediction of hydrogen-abstraction rates.
A, n, Ea] separately for the forward and reverse directions, which are then de-standardized to their physical units before analysis. During training we also pass the predicted parameters through an Arrhenius layer to compute ln
k(T) over 300–3000 K as an auxiliary supervision signal.We first summarize predictive accuracy in ln
k(T), where k is expressed in units of cm3 mol−1 s−1. Fig. 6 presents a parity plot of predicted versus observed mean ln
k(T), averaged over the temperature range 300–3000 K and both reaction directions, using the aggregated out-of-fold (OOF) predictions. Predictions are tightly clustered along the y = x line across the full dynamic range, demonstrating accurate reproduction of absolute rate magnitudes with minimal systematic bias. The high density of points near parity indicates consistent performance across chemically diverse reactions rather than reliance on a narrow subset of the rate spectrum.
The shaded envelope represents the binned ±MAE in ln
k, computed per reaction and averaged within bins of true ln
k. This band remains narrow over most of the populated kinetic range, indicating stable predictive accuracy across multiple orders of magnitude in rate. A slight broadening is observed in the intermediate ln
k regime, suggesting modest heteroscedasticity associated with chemically heterogeneous reactions in this region rather than a global degradation in model fidelity. Overall, the parity plot confirms that the selected model provides well-calibrated rate predictions suitable for downstream mechanistic and comparative analyses.
Analyzing prediction errors versus the local reaction environment (Table 3) shows that abstractions from sp3-hybridized donor atoms are predicted relatively accurately (MAE below 0.9 in ln
k), while errors increase for sp2 donors.
k stratified by donor hybridization (RDKit assignment). Shown are the number of reactions (n), the mean absolute error (MAE), median absolute error, and 95% confidence interval (CI) of the mean
| Donor hybridization | n | MAE (ln k) |
Median (ln k) |
95% CI |
|---|---|---|---|---|
| sp3 | 1227 | 0.880 | 0.633 | ±0.052 |
| sp2 | 380 | 1.137 | 0.780 | ±0.116 |
The largest errors are associated with reactions involving heteroatom-centered radicals and strongly polarized donor–acceptor pairs, with MAEs in the range 1.5–1.9 in ln
k. In contrast, hydrocarbon-dominated abstractions are predicted relatively accurately. Alkyl–alkyl reactions, constituting the largest single dataset class, achieve an MAE of 0.75 in ln
k (n = 617). These chemically homogeneous reaction classes dominate the central portion of the rate spectrum and contribute to the tight clustering observed near parity (Fig. 6).
On the aggregated OOF predictions from the 10-fold cross-validation, the model attains an overall error of MAE = 1.082 in ln
k, corresponding to 0.470 dex (decimal exponent) or a multiplicative uncertainty factor of approximately 2.95. Performance is well balanced between reaction directions, with MAEfwd = 1.048 and MAErev = 1.117 in ln
k, indicating no systematic directional bias. Table 4 reports mean absolute error (MAE), mean squared error (MSE), coefficient of determination (R2), and bias (mean residual, predicted–true; positive indicates over-prediction) by reaction direction.
| Direction | Parameter | MAE | Bias | R2 |
|---|---|---|---|---|
| Forward | log10 A |
0.902 | −0.0130 | 0.789 |
| n | 0.268 | −0.0012 | 0.662 | |
| Ea | 4.736 | −0.9786 | 0.856 | |
| Reverse | log10 A |
0.885 | −0.0033 | 0.760 |
| n | 0.264 | −0.0029 | 0.657 | |
| Ea | 5.393 | −0.6005 | 0.909 |
For the forward direction, biases are small across all parameters, indicating negligible systematic over- or under-prediction. For the reverse direction, accuracy remains comparable for the prefactor and temperature exponent. Reverse-direction activation energies exhibit a modest increase in MAE (5.39 kJ mol−1 versus 4.74 kJ mol−1 forward) while showing a slightly higher coefficient of determination (R2 = 0.909 versus 0.856). Taken together, the comparable MAE, bias, and R2 values across directions indicate balanced predictive performance without systematic directional asymmetry.
Parity plots (Fig. 7) show tight clustering around the y = x line for log10
A and n in both directions. For Ea, both forward and reverse predictions align closely with the diagonal across the range of barriers, and no systematic directional deviation is apparent by visual inspection.
Biases are small in magnitude for all parameters and both directions, indicating limited systematic error. For the forward direction, biases are −0.013 for log10
A, −0.001 for n, and −0.98 kJ mol−1 for Ea; for the reverse direction they are −0.003, −0.003, and −0.60 kJ mol−1, respectively. Interpreted multiplicatively, the prefactor biases correspond to average scaling factors of 10−0.013 ≈ 0.97 (forward) and 10−0.003 ≈ 0.99 (reverse).
Residual distributions (Fig. 8) are centered near zero with approximately symmetric tails for all parameters. The broadest residual spread is observed for Ea, consistent with its wider dynamic range and greater sensitivity to small structural variations. Overall, these results indicate near-linear calibration for log10
A and n in both directions, and well-ranked but moderately noisier predictions for activation energies. Despite this asymmetry, systematic biases remain small, and the resulting factor-of-few uncertainties are appropriate for downstream kinetic modeling when propagated explicitly. A quantitative mapping of parameter MAEs to multiplicative rate uncertainties is provided in Sec. S7.
k(T) between the predicted and reference Arrhenius curves over T ∈ [300, 3000] K. For reversible systems, forward and reverse directions are averaged to yield a single, balanced measure of model performance. Across the test set, the distribution of per-reaction MAE(log10k) values over T ∈ [300, 3000] K (28-point grid) is tightly concentrated (Table 5).
k) over T ∈ [300, 3000] K (28-point grid)
| Median (dex) | Q1 | Q3 | Robust σ | % within 1σ | % within 2σ |
|---|---|---|---|---|---|
| 0.285 | 0.182 | 0.485 | 0.194 | 69% | 86% |
The median error is 0.285 dex, with an inter-quartile range of 0.182–0.485 dex. Using a robust normal-equivalent scale estimate (σ ≈ 1.48 MAD, where MAD denotes the median absolute deviation), we obtain σ ≈ 0.194 dex; 69% of reactions lie within ± 1σ of the median and 86% within ± 2σ. In multiplicative terms, a median error of 0.285 dex corresponds to a factor of 100.285 ≈ 1.9 in k(T) across the full temperature range.
In the best-fit example, Fig. 9, the predicted and reference Arrhenius curves are nearly indistinguishable across 300–3000 K. Both the magnitude and curvature of k(T) are reproduced, confirming that the learned (A, n, Ea) parameters capture the physical behavior with quantitative accuracy. The RMG estimates provide a reasonable physical baseline, though they deviate more notably than the ML predictions in this instance. This is expected, as the specific chemical environment in this reaction may not be explicitly represented in the current RMG rate-tree hierarchy, whereas the ML model generalizes across these structural nuances during training.
The median case, Fig. 10, illustrates the typical predictive behavior of the model. Across the full temperature range, deviations remain well below one order of magnitude, and the predicted Arrhenius curves closely track the reference slope in both directions. This indicates that the model reliably captures the underlying activation energy trend while allowing for modest shifts in the overall rate prefactor. By comparison, the template-based RMG(TRR) estimates show larger systematic offsets and slight slope mismatches, highlighting the improved quantitative fidelity of the ML approach for representative reactions.
Even in the worst-performing example (Fig. 11), the model reproduces the correct qualitative Arrhenius behavior: in both directions, the predicted rates increase smoothly with increasing temperature, consistent with physically meaningful activation barriers. However, the predicted curves are systematically offset from the reference by several orders of magnitude, resulting in extremely large MAE(log10
k) values. The template-based RMG predictions exhibit even larger deviations in the forward direction, including noticeable slope mismatches, underscoring the limitations of rule-based extrapolation in this challenging regime.
Fig. 12 illustrates a regime in which the ML model substantially outperforms RMG's template-based rate rule (TRR) estimate. The predicted Arrhenius curves closely track the reference in both magnitude and temperature dependence across the full temperature range. In contrast, the RMG(TRR) estimate deviates by several orders of magnitude and exhibits a slightly shallower temperature dependence. Such behavior typically arises when the local chemical environment lies outside the effective coverage of existing RMG training data, forcing the hierarchy to interpolate from sparsely populated or weakly matched rules. In these cases, the learned model benefits from continuous feature representations and is able to recover accurate kinetics where discrete rule-based extrapolation fails.
Fig. 13 illustrates a case in which the RMG(TRR) estimate achieves lower error than the ML model. In the forward direction, the ML prediction exhibits a large systematic offset from the reference and a noticeably weaker temperature dependence, leading to substantial error across the full temperature range. By contrast, the RMG prediction more closely follows both the magnitude and slope of the reference Arrhenius curve.
A similar pattern is observed in the reverse direction, where the ML model again deviates by several orders of magnitude and underestimates the temperature sensitivity. The RMG(TRR) estimate better captures the overall Arrhenius behavior in this regime, demonstrating that rule-based kinetics can remain competitive for specific reaction environments that are well represented within the existing template hierarchy.
Taken together, these representative reactions illustrate that the ML model typically achieves lower error than template-based estimates while maintaining smooth, physically consistent Arrhenius behavior across a wide temperature range. In the majority of cases, discrepancies between predicted and reference rates appear primarily as approximately uniform shifts in magnitude rather than distortions of the temperature dependence. This indicates that the model has learned continuous structure–kinetics relationships in the (A, n, Ea) parameter space.
A small number of reactions exhibit lower error for RMG than for the ML model. These cases are often associated with specific chemical environments that are well represented within existing RMG template hierarchies but sparsely sampled in the training data used here. Such failures therefore appear to arise from limited coverage of particular regions of chemical space, rather than from an intrinsic inability of the model to represent the underlying kinetics. This suggests that targeted data expansion or focused active-learning strategies in these regimes could further improve model robustness.
Fig. 14 shows a reliability analysis of log10
k(T) obtained by pooling forward and reverse predictions over a 300–3000 K temperature grid and partitioning the predicted values into 20 equal-frequency bins. Bin-wise means of predicted and observed log10
k(T) are nearly collinear, with fitted relation ȳobs = −0.09 + 1.008ȳpred, indicating minimal global bias. The slope remains very close to unity and the intercept is small relative to the dynamic range of the data, demonstrating that the model preserves rate magnitudes without systematic compression or expansion across several orders of magnitude. The absence of structured deviation at low or high rate extremes indicates that residual errors are primarily reaction-specific rather than attributable to global miscalibration.
To examine how predictive accuracy varies across local chemical environments, we performed a site-resolved error analysis stratified by donor and acceptor RMG atom types and their donor–acceptor pairings; full results and figures are reported in Sec. S11.
k(T) relative to the NIST reference.For visualization, we report a representative (median) matched reaction in each direction, selected as the reaction whose MAE(log10
k) relative to the corresponding NIST fit is closest to the median across the matched subset, computed over the overlapping NIST temperature window. When multiple NIST entries were available for a reaction, we used the most recent recommended/evaluated fit with maximal overlap in the comparison range.
In the forward example (Fig. 15), Arkane exhibits a systematic magnitude offset from the critically evaluated Arrhenius fit,95 yielding MAE(log10
k) = 0.794 dex, whereas the model deviation is 0.201 dex. This closer agreement arises from the model drifting relative to its Arkane training target rather than from explicit experimental calibration. Because training is performed exclusively on Arkane-derived kinetics, improved alignment with experiment in individual cases should be interpreted as incidental.
![]() | ||
| Fig. 15 Forward-direction comparison with NIST kinetics. Arkane (blue, solid), model (orange, dashed), and NIST Arrhenius fit (green) over 300–3000 K. | ||
The reverse example (Fig. 16), derived from the same evaluated source,95 shows the complementary situation: Arkane lies slightly closer to experiment (0.753 dex) than the model (0.851 dex). In both directions, discrepancies relative to experiment are dominated by magnitude shifts rather than curvature distortions.
![]() | ||
| Fig. 16 Reverse-direction comparison with NIST kinetics. Arkane (blue, solid), model (orange, dashed), and NIST Arrhenius fit (green) over 300–3000 K. | ||
Taken together, these representative reactions demonstrate that the model neither systematically improves nor degrades agreement with experiment relative to Arkane. Its objective is fidelity to the quantum-chemical reference, not correction of underlying electronic-structure errors.
To quantify trends across the full NIST-matched subset, we computed the temperature-averaged absolute deviation
|Δ| = 〈|log10 k(T)pred − log10 k(T)NIST|〉T∈[300,3000]K. |
For the forward direction (Fig. 17), approximately 70% of reactions lie within one order of magnitude of NIST for both Arkane and model predictions. The ECDFs are nearly overlapping, indicating no systematic dominance and only reaction-specific differences.
![]() | ||
Fig. 17 Forward-direction ECDF of deviation from NIST. Absolute log10 k error (dex) vs. cumulative fraction of reactions. Model (blue) and Arkane (orange) show comparable agreement. | ||
The reverse-direction ECDF (Fig. 18) shows similar behavior: roughly 70–75% of reactions fall within 1 dex of experiment, and the model and Arkane curves remain closely aligned across the full deviation range. Overall, despite the limited size of the NIST-matched subset, the model preserves the experimental kinetic scale encoded in the Arkane reference dataset without introducing systematic bias. Agreement with NIST is governed primarily by the underlying quantum-chemical reference quality rather than by the model approximation.
k (0.470 dex; ∼3×), with balanced forward/reverse performance, minimal global bias, near-linear scaling across several orders of magnitude in rate, and a tightly concentrated per-reaction Arrhenius-curve error distribution (median 0.285 dex; ∼1.9× over 300–3000 K). Geometry-free baselines already achieve strong predictive accuracy, indicating that topological and role-anchored information captures much of the kinetic signal. However, incorporating localized three-dimensional geometric context yields consistent and statistically robust improvements in error and variance across folds, demonstrating that explicit geometry provides complementary information beyond topology alone.
Errors are structured by chemical environment: sp3 donor abstractions and hydrocarbon-dominated classes are predicted most reliably, whereas heteroatom-centered radicals and strongly polarized donor–acceptor pairs contribute the largest deviations, primarily as near-uniform magnitude shifts rather than distortions in temperature dependence. Relative to template-based RMG rate rules, the learned model typically achieves lower error while preserving smooth, physically consistent Arrhenius behavior. Comparison to NIST-matched reactions indicates that the model preserves the kinetic scale encoded in the Arkane quantum-chemical reference without introducing systematic bias, implying that residual disagreement with experiment is dominated by reference-level uncertainties rather than model artifacts. In addition, predicted forward/reverse equilibrium constants agree with the textbook DFT-derived reference to within half an order of magnitude despite no explicit detailed-balance constraint during training, indicating that thermodynamic consistency is largely recovered from the kinetic targets alone rather than being enforced by construction.
The present analysis also identifies clear routes for further improvement. Explicit one-dimensional hindered-rotor and large-amplitude-inversion treatment of the training labels, beyond the current rigid-rotor/harmonic-oscillator approximation, would improve fidelity for chemistry classes such as peroxide-mediated abstractions, multi-rotor ethers, and benzylic-tertiary radicals, where rotor corrections can exceed an order of magnitude in k(T). Augmenting the loss with a detailed-balance-consistent term that draws on the Arkane-derived Gibbs free energies ΔGrxn(T), released alongside the dataset as NASA polynomials, would promote the implicit forward/reverse equilibrium-constant agreement observed here into an explicit training constraint. Transfer learning on a curated subset of experimental NIST rate coefficients would further recalibrate the present DLPNO-CCSD(T)-F12 anchored predictions to measured reality, complementing rather than replacing the systematically generated quantum-chemical training set.
Overall, our results demonstrate that this framework can significantly reduce quantum-chemical computation effort for reaction-mechanism development. The trained graph neural network provides the factor-of-few accuracy required for large-scale mechanism generation without the iterative bottleneck of manual transition-state searching. By identifying the specific chemical environments that bound current accuracy, this framework establishes a clear and actionable path toward systematically improving data-driven kinetic prediction through targeted expansion of underrepresented reaction regimes.
| This journal is © The Royal Society of Chemistry 2026 |