Open Access Article
Leen Fahoum
a and
Alon Grinberg Dana
*ab
aWolfson Department of Chemical Engineering, Technion – Israel Institute of Technology, Haifa 3200003, Israel. E-mail: alon@technion.ac.il
bGrand Technion Energy Program (GTEP), Technion – Israel Institute of Technology, Haifa 3200003, Israel
First published on 26th February 2026
Transition-state (TS) identification for bimolecular liquid-phase reactions is notoriously sensitive to the initial spatial arrangement of reactants, making automated searches difficult, especially in solvation where conformational effects dominate barrier heights. We address this gap with a fully automated, heuristic framework integrated into the Automated Reaction Calculator (ARC) software tool that generates TS guesses for neutral hydrolysis, as a model reaction type, by positioning water using atom-centered internal-coordinate rules derived from representative DFT-optimized cases. The approach is parameterized for three hydrolysis families: carbonyl-based (esters, amides, and acyl halides), ethers, and nitriles, and operates from reactant/product SMILES alone. Validation across 91 diverse reactions shows that chemically guided internal-coordinate placement yields relatively high success rates under SMD-water conditions: 96.9% for carbonyl-based substrates, 86.2% for nitriles, and 72.4% for ethers, consistent with the greater conformational variability and weaker intrinsic directionality of ether substrates. An ablation study highlights that small, targeted reactant-dihedral adjustments and ±φ sign-sampling are essential to robustly align the water nucleophile, while the electronegativity-based neighbor ranking primarily fine-tunes local orientation. By automating the classically manual step of water placement and orientation, and producing chemically faithful geometry initializations, this framework enables scalable, high-throughput TS searches for neutral hydrolysis reactions. It provides a practical foundation for mechanistic studies and kinetic modeling in condensed-phase organic chemistry. The methodology is readily extensible to additional solvents and catalytic regimes, and to other bimolecular liquid-phase reactions where directed fragment placement is the key bottleneck.
Hydrolysis can proceed under neutral, acidic, or basic conditions, each with distinct mechanistic characteristics. Acidic hydrolysis accelerates bond cleavage by protonating key functional groups, while base-catalyzed hydrolysis improves the nucleophilic attack to facilitate bond scission. Hydrolysis under neutral conditions is generally slower and often requires elevated temperatures.9,10 For this reason, acid- and base-catalyzed hydrolysis has been studied more extensively. However, neutral hydrolysis remains highly relevant in systems where extreme pH conditions are undesirable or impractical, such as buffered aqueous media, prebiotic chemistry, specific drug formulations, and environmentally mild synthetic protocols. The study of hydrolysis under neutral conditions provides insight into intrinsic reaction pathways free of external catalytic effects, which is essential for understanding the baseline reaction behavior and capturing transient intermediates that may be obscured under catalyzed conditions.
Recent research demonstrates that hydrolysis significantly affects drug stability even under neutral conditions. For instance, hydrolysis of peptide bonds in antibody-based drugs has been observed during long-term storage at neutral pH, leading to structural degradation and potential immunogenicity.11 In particular, IgG1 antibodies incubated in neutral aqueous media undergo hydrolysis in the hinge region without enzymatic assistance,11 highlighting that even relatively slow, uncatalyzed neutral hydrolysis reactions can significantly reduce therapeutic effectiveness over time.
More broadly, hydrolysis represents a major degradation pathway for pharmaceuticals, affecting the stability and shelf-life of drug products, especially if the active pharmaceutical ingredient (API) contains an ester or an amide group. Drug formulation strategies must consider hydrolytic stability, employing moisture-resistant packaging and appropriate excipients to minimize degradation.12 Furthermore, hydrolysis reactions play a key role in prodrug activation, where ester hydrolysis is commonly used to release the API at the target site, while the formation of amide bonds is a fundamental step in the construction of peptides and small-molecule drugs.13
Understanding hydrolysis in the context of synthesis and degradation is important for modern drug design and synthesis, from the design of stable pharmaceuticals and drug performance optimization to prediction of degradation pathways.7,14 To address these challenges, predictive chemical kinetic models are essential, as they facilitate our understanding of reaction mechanisms and enable design and optimization of reactive systems. Such models can be instrumental in predicting drug shelf life7 and validating proposed synthetic pathways.
Refining these quantitative models by computing first-principles rate coefficients requires identification of proper reaction transition state (TS) configurations in 3D space. While conformer search of stable reactants is comparatively routine, even in solution, locating TSs remains the bottleneck for barrier-height estimation, e.g., via quantum chemical computations. Neutral hydrolysis reactions present unique computational challenges due to the significant impact of molecular conformations on reaction energetics. Conformational changes in TSs can lead to substantial activation energy variations, with rate coefficient differences of up to 350×.
Traditional string-based methods for TS searches15–18 are computationally intensive and require significant manual intervention in the initial steps of mapping the reactant and product atoms in 3D and orienting the reacting fragments. Several automated methods have been developed to address these challenges. KinBot,19 for example, identifies unimolecular gas-phase reaction families and generates TS guesses (TSGs) based on family-specific heuristics, iteratively altering its geometry toward TSGs for reaction classes such as intramolecular hydrogen migration, Diels–Alder, and cycloadditions. AutoTST20 employs decision trees to suggest TSGs for specific gas-phase reactions by analyzing reactive centers from predefined chemical rules. Graph Convolutional Networks (GCN)21 apply machine-learning to propose TSGs geometry for isomerization reactions.
Beyond targeted search algorithms, the field of automated chemical discovery has expanded to include global exploration methods that identify novel reaction pathways and products without prior mechanistic knowledge. Approaches such as the ab initio nanoreactor22 and metadynamics-based simulations23,24 allow for the blind discovery of molecules and mechanisms. Some methods utilize atomic connectivity rules,25 stochastic surface walking,26 basin confinement with temperature-accelerated dynamics,27 or kinetics-guided exploration28 to build extensive reaction networks and predict reactive events, while others use artificial force-induced reactions (AFIR)29,30 to map potential energy surfaces. Complementary to these discovery-first methods are emerging path-search tools that leverage machine learning (ML), such as iteratively trained neural network potentials,31 learned analytical Hessians,32 and geometric flow matching,33 to predict TS geometries with reduced computational cost.
The challenge the present work focuses on when searching for a TSG is correctly positioning and orienting two distinct molecular fragments (e.g., the substrate and water) in a complex energetic landscape, a step that is often trivial or non-existent in unimolecular cases. Established methods such as QST2/QST3 (Gaussian),15 GSM,16 autodE,34 and AFIR29,30 can locate bimolecular TSs but require pre-aligned reactant/product geometries, 3D mapped reactant and product atoms (currently done manually), or parameter tuning. The current methods therefore cannot chemically guide a water molecule placement (Table 1). Despite these advances, to our knowledge no tool automatically generates TS structures for condensed-phase neutral hydrolysis, including chemically informed placement and orientation of the water nucleophile, let alone more complex bimolecular degradation pathways. As a result, computational studies of hydrolysis involving esters or other functional groups35,36 still depend heavily on manual TS searches for refining predictive chemical kinetic models, often requiring manual expert-guided orientation of the water molecule(s) near the reactive site. This manual trial-and-error workflow limits the scalability of chemical kinetic modeling for liquid-phase synthesis and degradation.
| Framework | Condensed-phase/bimolecular | Water placement | User preparation | Main limitation |
|---|---|---|---|---|
| QST2/QST3 (Gaussian) | Yes | Manual alignment of reactants and products | 3D mapped reactant and product geometries required | Needs pre-aligned endpoints |
| GSM (Growing String Method) | Yes | No defined placement of nucleophile | 3D mapped reactant and product geometries required | Computationally intensive |
| autodE | Yes (PCM/SMD) | Random orientation sampling | Reactant SMILES input; conformers may need review | Limited control of water direction; performs only a 2D atom-mapping |
| AFIR | Yes | Bias-force placement approach | User defines reactive fragments and bias parameters | May generate non-specific paths |
| Present work | Yes (SMD) | Chemically guided internal-coordinate placement | Fully automatic (reactants + products SMILES only) | Currently implemented only for neutral hydrolysis |
The present work fills this gap by providing a targeted, high-efficiency alternative for established reaction families, specifically addressing the chemically guided orientation of bimolecular fragments in solution, a challenge that remains a bottleneck for scalable kinetic modeling. Here we introduce a heuristic internal-coordinate algorithm, implemented in the Automated Reaction Calculator (ARC) software tool,37 that automatically positions the water nucleophile for a single-molecule hydrolysis event to generate TSGs for condensed-phase bimolecular reactions using an implicit solvation correction. Our approach, to our knowledge, is the first to automate this process for such systems, using a labeled-graph representation of the reactants to assemble reacting complexes by adding atoms in 3D via internal coordinates to yield high-quality initial TSGs suitable for subsequent refinement, e.g., with implicit solvent models. We implement and validate this methodology for neutral hydrolysis, demonstrating its success in orienting water near a variety of reactive sites, which are referred to as classes in this work, including esters, amides, acyl halides, ethers, and nitriles (Fig. 1). By populating key internal degrees of freedom, the algorithm yields high-quality initializations for TS searches. This establishes a robust and chemically intuitive framework that can be extended to other critical reactions, such as polymer degradation in fuel cells, where general-purpose approaches are insufficient.
![]() | ||
| Fig. 1 Example molecules representing the five hydrolysis classes (ester, amide, acyl halide, ether, and nitrile), each belonging to one of the reaction families defined in this work. | ||
![]() | ||
| Fig. 2 An integrated workflow scheme for automated TS generation and validation. SMILES: simplified molecular-input line-entry system;38 TSG: transition state guess; DFT: density functional theory; IRC: intrinsic reaction coordinate; MEP: minimum energy path. | ||
The workflow begins from hydrolysis reaction SMILES (reactants and products). The reaction family is identified, and the reactive atoms (a, b, d, e) are assigned using subgraph isomorphism. An initial reactive geometry is then prepared by applying a family-dependent elongation of the bond undergoing hydrolysis. To capture local conformational variability, multiple reactant geometry variants are generated by systematic dihedral perturbations.
For each reactant geometry variant, a reactive water molecule is introduced using sequential internal-coordinate placement. The oxygen and hydrogen atoms (o, h1, h2) are positioned using family-specific distance, bond angle, and dihedral angle definitions designed to pre-organize the nucleophilic attack geometry. The resulting TSGs are filtered to remove nonphysical structures based on atomic collision checks and connectivity validation. If no valid TSG is obtained, a single deterministic fallback step involving a 180° inversion of the reaction-center dihedral angle is applied before repeating the generation cycle.
Once at least one valid TSG is identified, the workflow proceeds to QM optimization and validation. TS optimization is performed at the Density Functional Theory (DFT) level, followed by vibrational frequency analysis to confirm the presence of a single imaginary mode. Intrinsic reaction coordinate (IRC) calculations and connectivity checks are then used to verify that the TS connects the intended reactants and products. Successful completion of these steps yields a validated TS structure.
The carbonyl-based hydrolysis family (esters, amides, and acyl halides) proceeds via nucleophilic attack at the electrophilic carbonyl carbon (Fig. 3). The ether hydrolysis family (Fig. 4A)39 serves as a high-barrier model reaction that proceeds negligibly in the absence of acid, base, or catalytic surfaces, due to the strength of the carbon–oxygen bond and the poor nucleophilicity of water.40,41 The nitrile hydrolysis family follows a three-step mechanism; our work focuses on the rate-determining first step (step a in Fig. 4B), forming an imidic acid intermediate.42–45 Although nitrile hydrolysis under neutral conditions is slow, experimental studies at elevated temperatures have confirmed that both aliphatic and aromatic nitriles can hydrolyze without acid or base catalysis, following the same stepwise sequence.44,45
For multi-functional molecules (Fig. 5), the algorithm identifies all potential reactive sites. By default, it resolves site ambiguity by matching the user-supplied products through subgraph isomorphism, ensuring the water molecule is correctly oriented toward the specific bond undergoing cleavage. This mapping procedure resolves potential ambiguities when multiple reactive sites are present, which could lead to distinct chemical products, by comparing the 2D connectivity of the user-specified products with that of all possible products of the bimolecular reaction. This graph-isomorphism-based method results in the identification of the unique bond undergoing cleavage. For instance, in a molecule containing both an ester and an amide group, providing the products of ester cleavage (a carboxylic acid and an alcohol) enables the isomorphism check to identify the unique bond undergoing hydrolysis as the C–O bond in the ester, ensuring that the water molecule is oriented specifically for the ester site rather than the competing amide group. Another possible operational mode is in the context of automated kinetic model generation tools,46,47 where all possible hydrolysis sites can be explored simultaneously if the user does not specify specific desired products.
![]() | ||
| Fig. 5 An example of a molecule containing multiple hydrolysis classes. Note that the ester group is also connected via an ether group. | ||
Atom a represents the site where the nucleophilic attack occurs. For instance, in carbonyl-based and ether hydrolysis, atom a is typically the carbon in the carbonyl group or in the C–O bond, respectively. Atom b is the atom connected to atom a by the bond that is cleaved during the hydrolysis reaction, e.g., in ether hydrolysis it is the oxygen attached to the carbon that undergoes the nucleophilic attack. Atom e is the most electronegative neighbor of atom a, excluding atom b. Atom d is the second most electronegative neighbor of atom a, if the former exists. We avoid using the label c to prevent confusion with the carbon element. Atom o is the oxygen atom in the attacking water molecule. Atoms h1 and h2 are the reactive and non-reactive hydrogen atoms in the water molecule, respectively (Fig. 6).
The identification of the key atoms in a given molecule follows a systematic procedure. First, the algorithm determines the relevant reaction family, as discussed above. Each supported hydrolysis family is implemented in ARC37 using a graph representation of the generic reacting group for each family along with a “recipe” for breaking and forming bonds that convert reactants into products. This reaction family implementation closely follows the convention implemented in the Reaction Mechanism Generator (RMG) software,46,47 yet lacks kinetic data since it is only used to identify and label reacting atoms rather than to estimate rate coefficients.
The algorithm assigns labels to the four reacting atoms (a, b, o, and h1) using subgroup isomorphism checks. To resolve e and d labeling ambiguity in complex scenarios, the algorithm ranks atoms attached to a based on their effective electronegativity. This concept is introduced to prioritize atoms involved in multiple bonds (e.g., a carbonyl oxygen). The algorithm ranks atoms directly attached to atom a (excluding atom b), as defined in eqn (1). Here, χeffi is the effective electronegativity of neighbor atom i, χi is the intrinsic electronegativity value of atom i according to the Pauling scale,48 and BOa,i is the bond order between atoms a and i. In cases where two neighboring atoms have identical χeffi values, the tie is resolved by comparing the sum of the χeffi of the next-level neighboring atoms, with appropriately adjusted atom indices in eqn (1). The neighbor with the highest χeffi is designated as atom e, establishing the primary orientation around atom a, while all additional neighboring atoms are collected in a sorted list (by decreasing χeffi values) that represents potential atoms for the d label.
| χeffi = χi·BOa,i | (1) |
The highest-ranked atom in this list is labeled by default as atom d, while the others are retained as potential alternatives for generating a more diverse set of TSGs if initial attempts fail. This approach allows flexibility in later stages of TSG generation by providing multiple options to define geometrical parameters. In nitrile hydrolysis, atom a often only has two neighbors, namely b and e. As a result, no d atom is defined in the TSG generation for this reaction family.
The atom labeling provides essential and uniform reference points for positioning and orienting the attacking water molecule in 3D. It facilitates positioning atoms o, h1, and h2 relative to atoms a, b, d, and e. Different examples for neighboring atoms labeling are provided in Fig. S1.
In the present framework, ARC was enhanced with the ability to append a new atom X to an existing Cartesian structure {ri}i=1N, using the Z-matrix module. The new atom is placed to satisfy three internal constraints relative to specified reference atoms: a bond length ρ = ‖rN − rX‖ to atom N, a bond angle between three atoms,
, and a dihedral angle involving a four-atom torsion,
(Fig. 7). The three reference tuples of atom indices in the given Cartesian coordinates, (N), (M, N), and (L, M, N), may be chosen independently. This flexibility allows each internal DOF to be anchored to the most chemically meaningful atoms, even if they are not sequentially bonded, providing more intuitive control over the placement than a standard Z-matrix construction.
Let ri = (xi, yi, zi) denote the Cartesian coordinates of atom i. Given ρ, θ, ϕ, we place a new atom X at rX = (xX, yX, zX) so that the computed values ρcalc, θcalc, ϕcalc match the independent targets.
The distance residual is
| gρ(rX) = ρcalc − ρ = ‖rX − rN‖ − ρ. |
For the angle defined by M–N–X, with NM = rM − rN and NX = rX − rN, the residual is
θcalc = a tan 2(‖NM × NX‖, NM·NX), gθ(rX) = wrap(θcalc − θ), |
tan
2 is the quadrant-aware arctangent and wrap(·) maps angles to (−π, π].
For the dihedral defined by L–M–N–X, define the plane normals
| n1 = (rL − rM) × (rN − rM), n2 = (rX − rN) × (rM − rN). |
and the dihedral residual to minimize is
| gϕ(rX) = wrap(ϕcalc − ϕ). |
Finally, we determine rX by minimizing the scaled sum of squares
We employ a sequential fallback strategy: a short list of heuristic initializations (ρ–θ directed, midpoint, perpendicular, bond-length-randomized, randomly perturbed, and shifted) is generated, and a local optimization of F(rX) is attempted from each initialization in turn. An initialization is accepted immediately if the absolute errors
ερ = |gρ| < 0.01 Å , εθ = |gθ| < 0.1 rad, εϕ = |gϕ| < 0.1 rad. |
The approach described here generates multiple structural variants for each reactive configuration. Each generated TSG undergoes a series of validation checks to remove redundant, unstable, or chemically unreasonable geometries by avoiding atomic collisions, verifying water molecule connectivity, and removing non-unique structures (see Section S2 in the SI).
To facilitate cleavage, the a–b bond is systematically stretched based on the identified family-specific parameters (Tables S1–S5). The algorithm achieves robust conformational diversity and overcomes steric hindrances through a hierarchical sampling strategy. While carbonyl-based reactions require rotations to accommodate the transition from planar sp2 toward a partially tetrahedral configuration, ether hydrolysis requires alignment opposite the leaving group to ensure proper orbital overlap. In contrast, nitriles typically preserve their linear sp geometry, with dihedral modifications implemented only as a fallback if initial placements fail.
Core dihedrals involving atoms a, b, e, or d are identified from the internal coordinates and undergo a systematic scan from 15° to 55° in 10° intervals. This process displaces angles away from unstable eclipsed or near-planar boundary values, such as 0° or 180°, while maintaining the overall molecular framework. If no valid TSG passes the filtration step, the algorithm rotates the selected dihedrals by 180° to explore the complementary side of the local torsional landscape. By sequentially modifying additional dihedrals, the framework broadens the accessible conformational space until a chemically valid initialization is generated.
Atom o is first placed using three parameters, as illustrated in step A1 (Fig. 8). The distance r1 represents the forming a–o bond, which is critical for capturing the progress of the nucleophilic attack. The b–a–o angle (α1), is defined using the leaving group (b) and the electrophilic center (a) as reference points to establish the nucleophilic approach trajectory. This angle controls the approach of atom o relative to the bond being broken, ensuring proper orbital overlap for bond formation while avoiding unfavorable steric interactions. The dihedral angle e–d–a–o (φ1), is important for avoiding steric interference with the most electronegative neighbors around the reaction center, ensuring that the water molecule approaches the reactant from an accessible direction.
The reactive hydrogen (h1) in water is positioned relative to the newly placed oxygen atom, as shown in Fig. 8, step B. The o–h1 distance, labeled as r2, is slightly elongated relative to the equilibrium O–H bond length to facilitate the proton transfer. The a–o–h1 angle (α2) uses the newly formed a–o bond as a reference to position h2 for optimal interaction with the leaving group. This arrangement, combined with the b–a–o–h1 dihedral angle (φ2), specifically pre-organizes the 4-center (4-membered ring-like) geometry characteristic of the concerted neutral hydrolysis mechanism. The dihedral configuration (φ2) establishes the spatial relationship between atoms h1 and b, ensuring reasonable positioning for the proton transfer step. By sampling multiple dihedral orientations in a single run, the algorithm provides high-quality initializations that allow the DFT optimization to converge to this concerted saddle point across diverse chemical systems.
Finally, the nonreactive atom h2 is positioned as illustrated in Fig. 8, step C. The o–h2 distance (r3) is close to the equilibrium O–H bond length. Similarly, the h1–o–h2 bond angle (α3) is defined as slightly larger than water's equilibrium angle, 104.45°. The dihedral angle a–o–h1–h2 (φ3) ensures the proper orientation of the water molecule, typically to balance the overall dipole moment in the TS.
In the absence of atom d, where other than the leaving group (atom b) atom a has only one neighbor, atom e. This typically indicates a higher bond order between atoms a and b (such as a C
N bond). Above, atom d participates in the definition of the o atom dihedral angle (φ1). In the present case, this dihedral is defined as the e–b–a–o angle
, Fig. 8 step A 2. This dihedral angle uses both the most electronegative neighbor (e) and the leaving group (b) as reference points to define the relative location of atom o.
Initial TS geometry optimizations were performed at the CBS-QB3 (ref. 52) composite level of theory (utilizing B3LYP/CBSB7 (ref. 53 and 54)) followed by a vibrational-frequency calculation at the same level to obtain zero-point vibrational energy. The CBS-QB3 method was chosen for initial optimizations due to its established record of providing reliable and relatively robust geometries and energies for a wide range of organic reactions at a manageable computational cost. Additional electronic-structure calculations, including geometry optimizations, vibrational frequency analyses, single-point energy calculations, and intrinsic reaction coordinate (IRC)55 calculations, were performed using the ωB97X-D56 functional with the jul-cc-pVTZ basis set.57 Solvation effects were modeled with the implicit Solvation Model based on Density (SMD)51 with water as the solvent. While the internal-coordinate water placement algorithm is independent of the chosen solvation model, all validations reported in this work employed SMD. Other implicit or explicit solvation treatments are compatible within ARC but were not benchmarked here. For the high-throughput validation of the TSG engine, subsequent hindered rotor scans were omitted to reduce computational cost, as the primary objective was to confirm the successful and efficient location of the correct TS structure.
All TSGs generated by the heuristics adapter were internally screened within ARC for automated quality control after generation to eliminate redundant or unphysical geometries, such as structures containing atomic overlaps, before geometry optimization. Each optimized structure, whether generated automatically or constructed manually, was then subjected to a systematic validation protocol to ensure its physical and mechanistic correctness. First, the algorithm was required to produce at least one TSG that successfully converged during DFT optimization, demonstrating its ability to generate chemically reasonable geometries that correspond to stationary points on the potential energy surface. Second, the optimized TS was examined for thermodynamic consistency by verifying that its electronic energy (E0) was higher than that of both the reactant and product wells, as expected for an energy barrier between two minima. Third, harmonic frequency analysis was performed to confirm the presence of exactly one imaginary vibrational mode, indicative of a first-order saddle point. Fourth, the normal-mode displacement (NMD) associated with this imaginary frequency was visually inspected to ensure that the atomic motions corresponded to the intended bond-breaking and bond-forming events characteristic of the hydrolysis step. Finally, intrinsic reaction coordinate (IRC) calculations were carried out to confirm that the forward and reverse trajectories connected smoothly to the designated products and reactants, respectively.58 IRC calculations were attempted for all TSs, but in cases where the calculation was as numerically unstable or computationally prohibitive, and the NMD clearly represented the expected bond-forming and bond-breaking motions, the TS was still considered validated.
| χeffO = 3.44 × 2 = 6.88 (carbonyl oxygen) | (2) |
| χeffC = 2.55 × 1 = 2.55 (methyl carbon) | (3) |
The a–b bond in the lowest-energy 3D conformer of methyl acetate in water is stretched by ∼ x1.3 of its original length (Fig. 9C), in this example to 1.72 Å (Table S6). Next, the algorithm explores several placement branches. In the first branch, water is added to the stretched but otherwise unmodified conformer; the resulting TSG is shown from two viewpoints in Fig. 9D1.1 and 1.2. In the other branch, near-planar dihedrals around the reaction center are perturbed by series of angular offsets (here, the dihedral φ in Fig. 9D2.0), after which the water atoms are placed again shown from two viewpoints in Fig. 9D2.1 and 2.2. This branch strategy systematically samples both sterically hindered and accessible approach geometries to maximize the chance of generating a viable TSG.
The generated variants then undergo filtration according to the predefined criteria. As shown in Fig. 9D1.2, the variant generated without dihedral adjustments was eliminated during this step due to a collision between atoms o and one of the hydrogen atoms attached to atom d. The dihedral adjustment, shown here as an out-of-plane motion of atom e (Fig. 9C and D2.0), resulted in a valid TSG (Fig. 9D2.2). To assess the accuracy of the generated TSG, its geometry was compared to the respective DFT optimized structure using internal coordinates. The optimized TS in the methyl acetate example (Fig. 9E) showed a high degree of agreement with the initial guess with mean similarities of 95.75% for bonds, 97.74% for angles, and 98.35% for dihedral angles, yielding an overall geometric similarity of approximately 97.28% (Table S6).
The dataset used for the study of the transition state (TS) geometries comprised a wide range of molecules for each class. This was done to ensure that the chosen parameter values would be applicable across different molecular motifs rather than being tuned to a single molecular genre. To confirm coverage over a wide chemical space, structural diversity was quantified using Morgan circular fingerprints (ECFP)50,60 and the Tanimoto similarity coefficient.61 These fingerprints encode atom-centered environments within a two-bond radius into a fixed-length binary vector, enabling robust comparison of substructural features.
Analysis of within-class Tanimoto similarity distributions (Fig. 10) shows that all classes include both closely related and structurally distinct molecules. The median within-class similarity values ranged from 0.13 for ethers and amides, to 0.24 for acyl halides, indicating an overall high structural diversity. Ethers displayed the widest spread of similarity values, consistent with their structural flexibility.
![]() | ||
| Fig. 10 Within-class similarity distributions of pairwise Tanimoto similarity values (Morgan fingerprints, radius = 2) for the study cases jobs. Circles represent outlier similarity values. | ||
To visualize the diversity of the dataset in physicochemical property space, Principal Component Analysis (PCA) was performed on nine molecular descriptors (Fig. 11). The broad and overlapping distribution across hydrolysis families provides a secondary confirmation that the benchmark set spans a wide property space. Detailed descriptor definitions and loading analyses are provided in Section S3, while complete descriptor Tables are provided in Tables S7 and S8 and datasets are available in Databases S1 and S2 of the SI.
For each molecule, the water was positioned using internal coordinates defined relative to the atoms in the other reactant as described in Section 2.5.2 and illustrated in Fig. 8. Here, we report the mean and standard deviation of the nine geometric features extracted from each optimized TS (Table 2). The complete set of parameter values extracted from every study case job, alongside the median [IQR] values for each class is provided in Tables S1–S5.
| Parameter | Esters | Amides | Acyl halides | Ethers | Nitriles |
|---|---|---|---|---|---|
| a The a–b bond stretch values represent a multiplicative factor relative to the optimized reactant bond length (e.g., 1.34 corresponds to a 34% increase in bond length). | |||||
| r1 | 1.83 ± 0.09 | 1.91 ± 0.08 | 1.79 ± 0.07 | 2.14 ± 0.14 | 1.83 ± 0.03 |
| α1 | 76.18 ± 3.44 | 80.54 ± 2.06 | 75.43 ± 1.62 | 66.29 ± 3.84 | 97.23 ± 1.18 |
| φ1 | 142.59 ± 4.32 | 127.05 ± 2.10 | 152.24 ± 2.32 | 101.78 ± 13.34 | 174.29 ± 1.50 |
| r2 | 1.21 ± 0.05 | 1.39 ± 0.05 | 1.04 ± 0.01 | 1.10 ± 0.04 | 1.32 ± 0.01 |
| α2 | 73.29 ± 4.09 | 66.54 ± 1.94 | 88.43 ± 1.16 | 72.80 ± 0.90 | 57.89 ± 0.62 |
| φ2 | 1.57 ± 3.23 | 2.40 ± 5.90 | 0.95 ± 2.45 | 0.30 ± 2.49 | −0.02 ± 1.88 |
| r3 | 0.97 ± 0.00 | 0.96 ± 0.00 | 0.97 ± 0.00 | 0.96 ± 0.00 | 0.97 ± 0.00 |
| α3 | 110.42 ± 1.98 | 120.03 ± 5.85 | 106.15 ± 1.01 | 107.20 ± 2.98 | 114.28 ± 0.72 |
| φ3 | 105.19 ± 3.86 | 97.59 ± 3.90 | 106.59 ± 3.45 | 102.19 ± 1.74 | 103.77 ± 0.70 |
| a–b stretcha | 1.34 ± 0.06 | 1.17 ± 0.02 | 1.41 ± 0.04 | 1.55 ± 0.10 | 1.04 ± 0.00 |
During TS optimization for hydrolysis reactions, two of the dihedral angles used to position the water molecule, specifically φ1 and φ3 (Fig. 8), which were used to position the oxygen atom (atom o) and the non-reactive hydrogen atom (atom h1) in the water, respectively, showed consistent magnitudes within each class, yet their signs varied across TSs. In many cases only a specific combination of signs across both dihedrals produces a physically reasonable TS geometry, whereas the opposite sign leads to less favorable geometry. An example that illustrates the effect of dihedral sign combinations on water placement is given Fig. S2. While certain sign combinations yielded successful TSs for specific substrates, these trends are coordinate-convention and conformer dependent rather than mechanistically universal. Therefore, the mean dihedral value was calculated from absolute magnitudes, and both the +φ and −φ variants were considered when generating TSGs. For the third dihedral, φ2 which was used to position the reactive hydrogen (atom h2) in the water molecule, the optimized values across the TSs were consistently small and centered near 0°. In this range, the sign inversion reflects only minor fluctuations in a common geometry rather than distinct mirror image structures, so the signed mean was used directly.62
To improve statistical robustness and leverage mechanistic similarities, esters, amides, and acyl halides were grouped into a single carbonyl-based hydrolysis family. This grouping is chemically justified as all three classes undergo hydrolysis through a nucleophilic attack at the electrophilic carbonyl carbon, proceeding through a partially tetrahedral TS characteristic of the concerted 4-center mechanism observed under neutral conditions. The averaged geometric parameters derived from pooling the optimization results of these three classes are presented in Table 3.
| Parameter | Carbonyl-based family |
|---|---|
| a The a–b bond stretch value represents a multiplicative factor relative to the optimized reactant bond length. | |
| r1 | 1.85 ± 0.09 |
| α1 | 77.39 ± 3.33 |
| φ1 | 140.63 ± 10.96 |
| r2 | 1.21 ± 0.15 |
| α2 | 76.09 ± 9.66 |
| φ2 | 1.64 ± 4.03 |
| r3 | 0.97 ± 0.00 |
| α3 | 112.20 ± 6.86 |
| φ3 | 103.12 ± 5.40 |
| a–b stretcha | 1.31 ± 0.11 |
Analysis of parameter distributions reveals both the strengths and limitations of the family-based approach. The consolidation of carbonyl-based classes demonstrates mixed results; while the oxygen positioning distance r1 shows improved precision compared to individual class variations, other parameters such as φ1 exhibit increased standard deviation after combination. This indicates that φ1 is class sensitive, although the resulting variance remains within acceptable ranges for generating initial guesses. As expected, the most consistent parameter across all functional groups is r3, representing the non-reactive o–h2 bond length in the approaching water molecule. In contrast, the o–h1 bond distance, r2, shows significant variations; carbonyl-based hydrolysis family exhibits intermediate approach distances (1.21 ± 0.15 Å), and ether hydrolysis family require the shortest distances (1.10 ± 0.04 Å).
The dihedral angles reveal important insights into water molecule orientation preferences. The values of φ2 remain consistently small in all families (ranging from −12° to +2.4°), indicating a genuine chemical preference that likely minimizes steric interactions while improving the initial geometry guess of the approach. Notably, the ether hydrolysis family exhibits larger standard deviations specifically in the dihedral angle φ1. This larger deviation suggests a lower geometric consistency and may indicate potential convergence challenges.
Despite these variations, the overall parameter distributions fall within acceptable ranges for automated TSG search. Bond lengths maintain standard deviations below ± 0.1 Å, and most angular parameters remain within ± 10°, meeting the established criteria for reliable initial guess generation.
To ensure diversity and capture different kinds of molecules undergoing hydrolysis across the defined classes, both the Tanimoto similarity coefficient and the PCA based on physicochemical descriptors were performed using the same definitions and methodological specifications applied to the case studies. These analyses provide a complementary confirmation of dataset breadth. Analysis of within-class Tanimoto similarity values shows that acyl halides and amides exhibit the highest internal molecular similarity, with median similarities of ∼0.26–0.27, while esters and nitriles lie in an intermediate range (∼0.22–0.24). In contrast, ethers display the lowest median similarity (∼0.14) and the widest overall spread, indicating substantially higher structural diversity within this class. PCA further supports these findings, confirming that all five classes span a broad and overlapping chemical property space. Complete similarity data for all classes are provided in Fig. S3, S4, Tables S9, S10, and Databases S3 and S4.
Within the carbonyl-based hydrolysis family, a total of 33 jobs were executed: 9 for the ester class, 12 for the amide class, and 12 for the acyl halide class. In addition, 30 jobs were performed for the ether hydrolysis family and 30 for the nitrile hydrolysis family.
A validation job was considered successful only when a converged TS satisfied the validation checks described in Section 2.6. This comprehensive framework confirmed that the automated TSG generation reliably identifies correct TSs across all defined hydrolysis families. In addition to validating the correctness of the identified TSGs, we report the number of TSGs generated by the heuristics adapter.
Table 4 presents the success rate (defined as the percentage of jobs with at least one converged TS that met the validation criteria), as well as the average number of generated TSGs for each reaction family. In Section S1 a complete summary of all jobs is provided, including detailed evaluations against each of the five validation criteria along with the number of TSGs generated (Tables S11–S13).
| Family | Success rate(%) | Average number of generated TSGs |
|---|---|---|
| Carbonyl-based | 96.9 | 5.8 |
| Ether | 72.4 | 14.3 |
| Nitrile | 86.2 | 4.9 |
The results demonstrate that the carbonyl-based hydrolysis family achieved a remarkably high success rate of 96.9%, with an average of ∼6 generated TSGs. This strong performance correlates with the low variation observed during parameterization, and also arises from the highly polarized C
O bond, which defines a clear electrophilic carbon center and provides a directional π orbital that naturally guides the approach of the water molecule. As a result, the generated geometries align well with the electronic structure of the reactive site, producing stable and convergent TSs during optimization. Consequently, a relatively small number of TSGs were needed to undergo DFT optimization to yield a converged TS, with an overall high success rate (Table 4).
The nitrile hydrolysis family (as defined here, corresponding to step (a) in Fig. 4B) also exhibited high performance, with a success rate of 86.2% and ∼5 generated TSGs, consistent with the linear and strongly polarized carbon–nitrogen triple bond that creates a well-defined attack direction for the water molecule.
The lowest success rate in this study, 72.4%, was obtained for the ether hydrolysis family, consistent with the larger geometric variability in the parametrization step, particularly in the φ1 dihedral. Because ethers lack a polarized π system and possess two freely rotatable σ bonds around oxygen, the local environment is more flexible and less directive. As a result, significantly more TSGs are needed on average, and many converge into non-reactive minima during the TS optimization, where the water molecule is stabilized through hydrogen bonding to the substrate rather than attacking the reactive center.
While string methods like NEB18 or GSM16 offer robust path refinement, they are not fully automated for bimolecular systems, as they require manual 3D alignment and atom mapping. In contrast, our workflow operates from SMILES alone. Although CPU cost scales exponentially with system size, our method typically requires less than 10 independent TS DFT optimizations per reaction, on average (Table 4). Since these guesses are independent, they can be executed in parallel with zero communication overhead, significantly reducing wall-clock time compared to coupled reaction-path algorithms.
The ablation results reveal that the geometric heuristics act cooperatively rather than independently. Removing either the dihedral-modification step or the ±φ sign sampling led to near-complete failure across all families, demonstrating that explicit control over torsional accessibility and attack direction is essential for placing the water molecule in a chemically meaningful orientation. In contrast, removing the d/e electronegativity ranking had a smaller, structure-dependent effect, with many carbonyl-based systems still converging successfully, while ethers showed greater sensitivity. Overall, this analysis confirms that dihedral modification and ±φ sampling are indispensable for robust TS generation, whereas the d/e ranking primarily improves consistency and reproducibility in asymmetric environments.
Across 91 diverse reactions spanning three hydrolysis families, the method achieved relatively high success rates: 96.9% for carbonyl-based reactions (esters, amides, and acyl halides), 86.2% for nitriles, and 72.4% for ethers, while requiring only a modest number of TSGs per case. These outcomes reflect the electronic “guidance” available at polarized or linear centers (C
O and C
N), and they clarify why ethers remain more challenging.
An ablation study demonstrates that two ingredients are essential for robust performance in solution: (i) small, targeted dihedral adjustments around the reaction center before water placement, and (ii) explicit ±φ sign sampling for dihedrals that control the approach of the water molecule from either side of the reactive plane of the substrate. Removing either step causes near-universal failures, whereas the electronegativity-based ranking of neighboring atoms (d/e) mainly fine-tunes orientation and improves reproducibility and robustness across asymmetric environments.
Although family parameters were extracted from CBS-QB3 case studies, they transferred well to ωB97X-D/jul-cc-pVTZ with SMD water for validation, indicating that the heuristics capture geometric, not method-specific, features of nucleophilic approach. By replacing manual trial-and-error water placement with these reproducible encoded internal-coordinate rules, the procedure turns condensed-phase TSG initialization into an automated, scalable step that can feed high-throughput kinetics workflows in pharmaceutical degradation and broader liquid-phase reactivity. This suggested approach bypasses the 3D orientation bottleneck that has historically required extensive expert intervention. With an average computational requirement of only ∼10 TS DFT optimizations in parallel per reaction, the workflow is well-suited for high-throughput kinetics in pharmaceutical degradation and broader liquid-phase reactivity.
Looking ahead, the same internal-coordinate atom-addition strategy is readily extensible to alternative solvation treatments and microsolvation, where additional explicit water molecules can be positioned sequentially. The method could therefore be extended to address additional challenging reaction families, including acid- or base-catalyzed hydrolysis and other multi-fragment pathways relevant to polymer degradation and complex condensed-phase chemistry. This modular approach allows for the automated construction of complex, multi-fragment TSs that have historically required extensive manual manipulation. The suggested chemically informed internal-coordinate placement approach provides a practical foundation for automated TS discovery in solution and opens a path to general, high-throughput TS search for bimolecular liquid-phase reactions.
The code for the ARC repository can be found at https://github.com/ReactionMechanismGenerator/ARC with DOI: https://doi.org/10.5281/zenodo.3356849.
| This journal is © The Royal Society of Chemistry 2026 |