The influence of model building schemes and molecular dynamics sampling on QM-cluster models: the chorismate mutase case study

Donatus A. Agbaglo; Thomas J. Summers; Qianyi Cheng; Nathan J. DeYonker

doi:10.1039/D3CP06100K

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D3CP06100K (Paper) Phys. Chem. Chem. Phys., 2024, 26, 12467-12482

The influence of model building schemes and molecular dynamics sampling on QM-cluster models: the chorismate mutase case study†

Donatus A. Agbaglo , Thomas J. Summers‡ , Qianyi Cheng and Nathan J. DeYonker *
Department of Chemistry, University of Memphis, Memphis, TN 38152, USA. E-mail: ndyonker@memphis.edu

Received 15th December 2023 , Accepted 2nd April 2024

First published on 5th April 2024

Abstract

Most QM-cluster models of enzymes are constructed based on X-ray crystal structures, which limits comparison to in vivo structure and mechanism. The active site of chorismate mutase from Bacillus subtilis and the enzymatic transformation of chorismate to prephenate is used as a case study to guide construction of QM-cluster models built first from the X-ray crystal structure, then from molecular dynamics (MD) simulation snapshots. The Residue Interaction Network ResidUe Selector (RINRUS) software toolkit, developed by our group to simplify and automate the construction of QM-cluster models, is expanded to handle MD to QM-cluster model workflows. Several options, some employing novel topological clustering from residue interaction network (RIN) information, are evaluated for generating conformational clustering from MD simulation. RINRUS then generates a statistical thermodynamic framework for QM-cluster modeling of the chorismate mutase mechanism via refining 250 MD frames with density functional theory (DFT). The 250 QM-cluster models sampled provide a mean ΔG^‡ of 10.3 ± 2.6 kcal mol⁻¹ compared to the experimental value of 15.4 kcal mol⁻¹ at 25 °C. While the difference between theory and experiment is consequential, the level of theory used is modest and therefore “chemical” accuracy is unexpected. More important are the comparisons made between QM-cluster models designed from the X-ray crystal structure versus those from MD frames. The large variations in kinetic and thermodynamic properties arise from geometric changes in the ensemble of QM-cluster models, rather from the composition of the QM-cluster models or from the active site-solvent interface. The findings open the way for further quantitative and reproducible calibration in the field of computational enzymology using the model construction framework afforded with the RINRUS software toolkit.

Introduction

Through multiscale QM/MM or QM-only “cluster model” studies, stationary points along a reaction mechanism can be optimized, which allows a structural probe of the enzyme kinetics that is impossible to directly observe experimentally.¹ As the reliability of computational enzymology and the tractable size of QM-regions increase, a greater focus on cyberinfrastructure is required for building consistent and reproducible atomic-level enzyme models. Our group has developed the Residue Interaction Network ResidUe Selector (RINRUS) software toolkit to facilitate studying the reaction mechanisms of enzymes with quantum chemistry.^2–4 Instead of relying on chemical intuition or distance-based criteria to prioritize the critical fragments within the enzyme active site, RINRUS algorithmically constructs enzyme models based on several possible qualitative and quantitative criteria. RINRUS infrastructure was first developed to build QM-cluster models of enzymes, but adapting the code to also build QM/MM enzyme models is in progress. In this work, we explore the enzyme chorismate mutase in conjunction with a proof-of-concept expansion of RINRUS capabilities: interfacing QM-cluster modeling with molecular dynamics (MD) techniques.

Chorismate mutase (CM) catalyzes the reaction of chorismate to prephenate, participating in the shikimate pathway that biologically produces phenylalanine and tyrosine amino acids (Scheme 1).^5–14 The shikimate pathway does not occur in the animal kingdom, and thus provides a target for the development of new antibiotics, fungicides, and herbicides.¹⁵ While chorimsate mutase has been widely studied experimentally and computationally, there are still mysteries to be unraveled with respect to the extraordinary kinetic enhancement of its active site. The chorismate mutase enzymatic reaction promotes a 10⁶-fold rate acceleration of prephenate production through a Claisen rearrangement in the catalytic elementary step.^16,17 This Claisen rearrangement is one of the few known examples of a naturally-occurring catalyzed pericyclic reaction.¹⁸ In 1993, Lipscomb and coworkers published an X-ray crystal structure of Bacillus subtilis chorismate mutase (BsCM, PDB: 2CHT) at 2.2 Å resolution that forms the basis of most theoretical works.¹⁹ This structure contains an endo-oxabicyclic transition state analogue (TSA), 8-hydroxy-2-oxa-bicyclo[3.3.1]non-6-ene-3,5-dicarboxylic acid, which offered structural insight into the enzyme mechanism. Since the pericyclic reaction does not involve covalent substrate-protein bonding or acid–base chemistry, CM makes an intriguing, and in some respects, simplified case study of enzyme catalysis.^20,21


	Scheme 1 Representation of the Claisen rearrangement of chorismate to prephenate.

Mutagenesis, computational enzymology, and biochemical kinetics have been indispensable tools to study the mechanism of the CM reaction, especially for exploring the transition state stabilization (TSS) and near-attack conformation (NAC) hypotheses or for describing manifestations of their complementary kinetic and thermodynamic behavior.^9,18,22–25 Mutagenesis experiments of Escherichia coli chorismate mutase (EcCM)^26–30 and BsCM revealed the catalytic importance of many charged active site residues for establishing hydrogen bonding with the negatively-charged substrate. For example, replacement of Arg90 with a positively-charged lysine still decreases the catalytic efficiency by at least three orders of magnitude in BsCM.^19,31

Theoretical studies of chorismate mutase with QM/MM-MD first emphasized the importance of a near attack conformation (NAC) as the main catalytic driving power behind the proposed mechanism.^20,32,33 Studies done by Bruice and co-worker showed that NAC rearrangement of chorismate structure is a result of activated carbon and oxygen ligand atoms approaching within the van der Waals contact distance at very small bond angles, creating a favorable orientation of π-orbital overlap.^20,32 The proponents of the NAC hypothesis focus on geometric distortion of the substrate in the active site. However, those who argue for the TSS hypothesis indicate that positively charged residues like Lys39 in EcCM and Arg90 in BsCM stabilize the developing negative charge during bond breaking at the ether oxygen.^25,31,34 Bond-breaking then leads to electrostatic stabilization of active site residues, lowering the activation energy. Subsequent QM/MM and QM-cluster model calculations have provided evidence that catalysis is due to both near attack conformation and transition state stabilization, but with TSS being the main driving force of the proposed mechanism.^21,22,35

While computational enzymology has advanced rapidly over the last two decades,^6,36–38 one persistent challenge in this research area is designing effective QM-regions that reliably predict catalytic activity, with kinetic and thermodynamic properties that can converge quickly with respect to model size. Ad hoc methods of selecting residues for inclusion in the QM regions of QM/MM models or in QM-cluster models are poorly reproducible and not well calibrated. One technique for QM region selection is to include all residues that are within a specific radial distance from the center of the active site or from the center of mass of the substrate. This construction paradigm is based on the idea that spherical active site models are appropriate. Several studies by our group and others reveal that this is not always the case,^3,4,39–46 though CM active sites are known to be fairly compact and spherical.

To facilitate improved benchmarking in computational enzymology, our group has created the RINRUS software toolkit to automate the process of generating QM-cluster models. Our goal is to address various community-wide challenges in computational enzymology, such as standardizing QM-cluster (and eventually QM/MM) model construction, lowering the learning curve for new users, and reducing trial and error caused by ad hoc model-building schemes. RINRUS uses an automated approach to trim and cap the active site fragments. With a given protein structure and a user-defined “seed”, which consists of the substrate and any active site fragments necessary to describe the chemical reaction, RINRUS identifies proximal fragments that have important non-covalent interactions with the seed using the graph theory concept of the residue interaction network (RIN).^47,48

To summarize the RINRUS procedure, a protein structure is converted into a RIN graph composed of only a subset of the fragments (referred to as “nodes” in graph theory) that have an identifiable electrostatic and/or steric interaction (referred to as “edges” in graph theory) with the seed nodes.^47,48 The RIN is then processed using one or more user-selected schemes that identify qualitative interaction types (structural interaction fingerprints, SIFs)⁴⁹ or quantitative schemes that utilize first-principles interaction energies like symmetry-adapted perturbation theory (SAPT or F/I-SAPT, see below).^50–53 RINRUS can also be used to rank fragments via distance-based criteria. Once a ranking scheme is chosen and fragment rank is enumerated, RINRUS will algorithmically construct QM-cluster models and provide input files formatted appropriately for several commercial and open-source quantum chemistry software packages.

This work has two major objectives. First, we analyze how specific residues influence the enzymatic reaction and contribute to the convergence of RINRUS-built QM-cluster models of chorismate mutase. Multiple fragment ranking schemes are explored and compared, with models built incrementally, growing by one fragment at a time. Second, we explore QM-cluster modeling in a quasi time-dependent fashion by sampling MD snapshots with refined QM-cluster models to account for conformational averaging. Thermally stable conformational change is one of the most important aspects of regulating protein structure and activity, and conformational sampling of enzymes is typically probed on the micro-second time scale via MD simulations.⁵⁴ We have selected 250 snapshots from a 20 ns MD simulation of BsCM and processed each with RINRUS to obtain 250 different QM-cluster models. The catalytic transition state for each of the 250 QM-cluster models is optimized, and via computation of the connected reactant and product structures, kinetic and thermodynamic data is obtained.

Methods

All computations were based on the X-ray crystal structure of the Bacillus subtilis chorismate mutase taken from PDB entry 2CHT. The 2CHT enzyme is trimeric with three active sites formed at the interface of adjacent monomer chains. The active site of the crystallographic A/C chain was used for QM-cluster model construction in this work. Further justification of using the chain A/C interface is provided in the ESI.† Hydrogen atoms were added to the enzyme using the reduce program.⁵⁵ For all QM-cluster models and MD simulations, the TSA found in the crystallographic active sites was replaced with the native substrate (chorismate).

Incremental QM-cluster model building with RINRUS

RINRUS identifies and ranks inter-residue interactions based upon two existing packages that compute the RIN and output node/edge information in machine and human-readable formats. Probe⁵⁶ rolls a small sphere over the internal van der Waals surface of a protein structure to identify and classify non-covalent interatomic interactions between fragments of a protein structure; arpeggio⁵⁷ uses interatomic distance and angle criteria to identify and classify interactions. Throughout this work, “seed”, “substrate”, and “ligand” synonymously refer to the chorismate molecule shown in Fig. S1 (ESI†).

A good fragment ranking scheme is needed to design reliable QM-cluster models, which is a core feature of the open-source RINRUS package.⁵⁸ There are three different fragment ranking schemes being tested in this work. The RINRUS-probe workflow ranks the importance of active site fragments based on the number of contact counts between each fragment and the seed. When incrementally building models, fragments (categorized as residue side chains, residue main chains, or solvent water molecules) are added to the model one at a time in order from the fragment with the highest number of contacts with substrate to the lowest. While probe parses interaction types into five simple SIF categories, arpeggio classifies fourteen different chemical interaction type, based on the CREDO set of protein–substrate interactions.⁵⁹ While arpeggio also accounts for typical interaction types like hydrogen bonding and hydrophobic contacts, it can also more flexibly account for weaker inter-residue interactions such as aromatic π-stacking or less common interactions such as halogen bonds. It should be noted that the proximal interactions computed by arpeggio are ignored in this study because the focus is on fragments that have recognized intermolecular forces with the chorismate substrate, rather than distance-based metrics.

Symmetry adapted perturbation theory (SAPT) has become an increasingly popular approach for computing non-covalent interaction energies between two molecules or fragments.^{50–52,60–63} SAPT calculations are especially useful in that the interaction energies are readily decomposed into electrostatic, exchange–repulsion, induction, and dispersion components. Functional-group SAPT (F-SAPT)⁵² is an extension of SAPT that provides an effective secondary two-body partition of the SAPT components. This additional partitioning allows computation of interaction energy between a fragment A (in this case study, the chorismate ligand) and user-defined sub-fragments of a fragment B (the various side chain and backbone fragments of the active site). F-SAPT is leveraged to decompose the interaction energy between chorismate and individual residue main chains or side chains, without cutting or capping fragments differently from what is used in the parent QM-cluster models. We will use the F-SAPT interaction energies between chorismate and surrounding residue fragments to rank incremental QM-cluster model building. This work uses the zeroth-order formulation of F-SAPT, F-SAPT0, described by the equation:


E_int = E⁽¹⁾_elec + E⁽¹⁾_exch + [E⁽²⁾_ind + E⁽²⁾_exch–ind + δE⁽²⁾_HF]_ind + [E_disp + E⁽²⁾_exch–disp]_disp	(1)

F-SAPT0 computations employed the jun-cc-pVDZ basis set^51,52 for all atoms and frozen core electrons via the PSI4 v1.3 package.⁶⁴ The jun-cc-pVDZ basis set has been demonstrated to provide reliable SAPT interaction energies.⁶⁵

In recent work, a poor correlation between number of probe contacts and F-SAPT interaction energies was observed.^66,67 We then hypothesized that F-SAPT interaction energies will be a more quantitatively reliable metric for ranking the importance of active site residues. However, SAPT calculations are computationally expensive (days of CPU time) compared to the near-negligible effort required to compute and parse a RIN from probe or arpeggio ranking (<20 seconds of CPU time).

QM-cluster models were generated using the RINRUS software.⁵⁸ Trimming of residue fragments is performed algorithimcally by RINRUS depending on if the backbone NH, backbone CO, and/or side chain of a residue has interatomic contacts with chorismate. Where covalent bonds are broken in the trimming procedure (typically across C_α atoms), RINRUS automatically adds hydrogen atoms to satisfy carbon valency. We refer throughout to the QM-cluster model that contains all fragments with a quantifiable interaction with the chorismate ligand as a “maximal model”. Trimming details for the maximal model of the X-ray crystal structure active site are shown in Table S1 (ESI†). To maintain the general shape and mimic the semi-rigid character of the protein tertiary structure, all C_α atoms, along with the C_β atoms of any Arg, Lys, Glu, Met, Trp, and Phe side chains were frozen to their crystallographic positions (if obtained from the X-ray crystal structure) or frozen at their positions in the respective MD frame (if obtained from MD simulation). All chorismate atoms were unconstrained in the QM-cluster model computations.

The QM computations were carried out using the Gaussian16 software package.⁶⁸ The geometries of the models were optimized using density functional theory (DFT) with the B3LYP exchange–correlation functional.^69,70 The 6-31G(d′) basis set was used for N, O, and S,⁷¹ and the 6-31G basis set was used for C and H atoms.⁶⁰ The Grimme D3 (Becke–Johnson) dispersion correction (GB3BJ) was also included,⁷² along with a conductor-like polarizable continuum model (CPCM) using UAKS sets of atomic radii, a non-default electronic scaling factor of 1.2, and default cavity parameters for water but with an attenuated dielectric constant of ε = 4.^73,74 Transition states were located for the elementary step of the proposed mechanism, and the reactants and products were then confirmed by following the intrinsic reaction coordinate (IRC).§ The zero-point energies (ZPE) and thermal enthalpy/free energy corrections were calculated at 1 atm and 298.15 K.

MD trajectory-based QM-cluster models

For the MD simulations, some pre-processing of the X-ray crystal structure was necessary.⁶⁷ Missing residues in the 2CHT X-ray crystal structure were added from the C-terminus using PDB entry 1DBF,⁷⁵ a BsCM structure without substrate or TSA in complex with the protein. The two structures were globally aligned and atomic coordinates from 1DBF were added to the 2CHT structure based upon the point where the two structures begin a common structural alignment. Specifically, residues 1 and 116–127 from 1DBF were added to 2CHT for chain A, residues 1 and 115–127 were added for chain B, and residues 1–2 and 115–127 were added for chain C (with residues 2 and 115–119 of 2CHT chain C being replaced with the corresponding coordinates from 1DBF). Hydrogen atoms were added to this structure via the H++ server using default parameters.⁷⁶ The native substrate chorismate in a pre-reactive conformation was used in MD simulations instead of the TSA. The AMBER18 MD package⁷⁷ was used to run the MD simulations, and the AMBER force field ff14SB was used with periodic boundary conditions and a cutoff value of 9 Å for non-bonded interactions. The Antechamber package was employed to parameterize the chorismate substrate with the Generalized Amber Force Field (GAFF).^77,78 The protonated structure with chorismate was solvated in a cubic 10 Å box of water with the explicit solvent model TIP3P.⁷⁹ The MD model charge was neutralized by adding 9 Na⁺ ions.⁸⁰

An energy minimization of the system was first carried out with protein heavy atoms constrained to their crystallographic coordinates using a harmonic positional restraint (k_pos) of 200 kcal mol⁻¹ Å⁻² allowing the solvent bath to be initially relaxed and the hydrogen bonding networks to be established. The protein heavy atom constraints were then iteratively relaxed over five 20 ps simulations using Langevin dynamics under constant-temperature, constant-pressure (NPT) conditions at 300 K and 1 atm; the SHAKE algorithm⁸¹ was used to constrain all bonds involving hydrogen atoms for the initial equilibration simulation. The protein was then allowed to move freely for a 20 ns production-level run. The timescale of each frame was 1 ps, for a total of 20 [thin space (1/6-em)] 000 frames. The protein RMSDs of MD trajectories were calculated using the cpptraj module of AMBER18.⁸²

Schemes for selection of frames for the QM-cluster models from MD trajectories

Designing QM-cluster models from a large number of MD frames will allow consideration of conformational influence on kinetic and thermodynamic quantities. Eight schemes are considered in an attempt to cover a diverse sampling of conformations and non-equilibrium structures. From each scheme, 20 to 40 MD frames are selected and then used to construct a QM-cluster model of the active site. The first scheme considered (S₁) is perhaps the most common scheme for MD simulation sampling, and involves selecting MD frames at equal intervals over the course of an equilibrated simulation. This approach is effectively random and unbiased. For the next set of schemes (S₂, S₃, and S₄) we chose frames similar to the X-ray crystal structure. Furthermore, it may be better to consider only the structural variations of the active site residues rather than of the whole protein, and this idea is incorporated into S₃, S₄, S₆, S₇, and S₈. For the final set of schemes (S₅, S₆, S₇, and S₈) frames were grouped by a specific metric and then k-means clustering divided the frames into 3 or 4 clusters. These schemes should increase the structural diversity of QM-cluster model refinement. Again, note that the Chain A/C interface was used to construct the QM-cluster models from each selected MD frame. Detailed frame selection criteria are as follows.

S ₁ – Twenty frames were selected from the MD simulation at equal intervals of 1000 ps over the entire 20 ns equilibrated simulation.

S ₂ – The RMSD of the backbone atoms (C, O, C_α, N, and H) of the entire protein structure compared to the X-ray crystal structure was measured for each frame. Frames with an RMSD within ±1 standard deviation (0.76 Å) of the mean RMSD (2.66 Å) were isolated, and a random number generator was used to select 30 frames from this data set.

S ₃ – The RMSD of the backbone atoms of a selection of active site residues compared to the X-ray crystal structure was measured for each frame. The subset of active site residues was defined as all residues present in any of the QM-cluster models obtained from S₁: Arg7, Glu78, Arg90, Tyr108, Leu115, Phe57, Ala59, Lys60, Arg63, Val73, Thr74, and Cys75. Frames with an active site backbone RMSD within ±1 standard deviation (0.09 Å) of the mean RMSD (0.84 Å) were isolated, and a random number generator was used to select 30 frames from this data set.

S ₄ – This scheme used the RMSD of the side chain atoms of the active site residues (listed in S₃) compared to the X-ray crystal structure. Frames with a side chain backbone RMSD within ±1 standard deviation (0.16 Å) of the mean RMSD (1.66 Å) were isolated, and a random number generator was used to select 30 frames from this data set.

S ₅ – The RMSD of all heavy (non-hydrogen) atoms of protein and chorismate compared to the X-ray crystal structure was measured for each frame. k-Means clustering was used to group the frames into three distinct clusters based on the gap statistic and elbow plots shown in Fig. S2 (ESI†), and a random number generator was used to select 10 frames from each of the three clusters.

S ₆ – The RMSD of the backbone atoms of only the active site residues (from S₃) compared to the X-ray crystal structure was measured for each frame. Based on analysis of the RMSD using the gap statistic and elbow plots in Fig. S3 (ESI†), it became apparent that there is only one unique k-means cluster. We then subdivided the data into four clusters and randomly selected 10 frames from each of the four clusters.

S ₇ – This scheme used the RMSD of the side chain atoms of the active site residues compared to the X-ray crystal structure instead of the backbone atoms. Similar to S₆, k-means clustering with the active site side chain atom RMSD values was not a useful technique (Fig. S4, ESI†). The MD frames were still split into another four arbitrary clusters and randomly selected to provide an unbiased sampling of 40 additional MD frames.

S ₈ – The number of probe contacts between chorismate and surrounding residues was measured for each frame of the MD trajectory. k-means clustering grouped the frames into distinct clusters. However, the gap statistics and elbow plots shown in Fig. S5 (ESI†) indicate our MD frames are not easily clustered into less than 10 sets, so the clustering is truncated at k = 3. A random number generator was used to randomly select 10 frames from each of the three clusters.

From the eight selection schemes, a total of 250 unique MD frames were chosen and then refined into QM-cluster models generated by RINRUS. Note that the composition of the QM-cluster models is not uniform. Each QM-cluster model constructed from MD includes all fragments recognized by the probe software as having inter-residue interactions with chorismate for that specific MD frame. Interestingly, nearly adjacent and even adjacent frames that were selected by the various schemes showed non-uniform RIN composition [frames 159 (S₈) and 161 (S₇); 1218 (S₆) and 1221 (S₇); 2473 (S₈) and 2475 (S₆); 7603 (S₆) and 7607 (S₅); 9748 (S₆) and 9750 (S₂); 12378 (S₆) and 12379 (S₅), 19719 (S₂) and 19721 (S₇)]. Active site RIN composition of the adjacent frames 12 [thin space (1/6-em)] 378 and 12379 is shown in Table S2 (ESI†).

Results and discussion

Building QM-cluster models with different ranking schemes

We began by examining how different schemes to prioritize residue interactions affect the construction of QM-cluster models and the convergence of predicted reaction properties. Full information about model size, model charge, and kinetic and thermodynamic properties are provided for all iterative building schemes in Tables S1 and S2 (ESI†). Using the X-ray crystal structure, 12 QM-cluster models were built by incrementally adding one residue based on their cumulative probe contact counts with chorismate within the X-ray crystal structure. The maximal probe-derived model, which includes all residues with any probe contact with the chorismate, contains 203 atoms and is shown in Fig. 1.


	Fig. 1 3D representation of the RINRUS maximal model, from the X-ray crystal structure of Bacillus subtilis chorismate mutase, using the probe ranking scheme. Substrate carbon atoms are colored in magenta. Except for those of the crystallographically resolved water molecule, hydrogen atoms are omitted for clarity.

The computed ΔG^‡ and ΔG_rxn values for the Claisen rearrangement reaction are plotted in Fig. 2 for the probe-based model building scheme. First, the highest probe-ranked fragment, Phe57 side chain, is added to the chorismate to constitute the first QM-cluster model, containing 42 atoms and a total model charge of −2. The second QM-cluster model is generated by adding the Arg7 side chain to the first model. The second model now contains 64 atoms and a total model charge of −1. The list of fragments provided in Table S1 (ESI†) gives details about subsequent models, culminating in our “maximal model” of 203 atoms.


	Fig. 2 Kinetics and thermodynamics of the iteratively grown QM-cluster models using the probe ranking scheme. Computed ΔG^‡ values are represented by circles and ΔG_rxn values by triangles. The black dashed line shows the experimental ΔG^‡ value from ref. 86.

For this study, we define models as being converged for a given building scheme if both ΔG^‡ and ΔG_rxn of a model and all subsequent larger models are within ±1 (tight convergence criteria) or ±3 kcal mol⁻¹ (loose convergence criteria) of their reference values (ΔG^‡ and ΔG_rxn of the maximal model), respectively. The maximal probe-based RINRUS-designed QM-cluster model has values of ΔG^‡ = 9.1 kcal mol⁻¹ and ΔG_rxn = −16.3 kcal mol⁻¹. As the size of the model increases, the predicted ΔG^‡ and ΔG_rxn become converged at the 155-atom model within the defined metric of convergence of ±3 kcal mol⁻¹, after Arg63 was added to the 133-atom model. None of the probe models converge both ΔG^‡ and ΔG_rxn to within 1.0 kcal mol⁻¹ of the maximal QM-cluster model. Overall, the ΔG_rxn value qualitatively agrees with other QM-cluster model and QM/MM studies of the chorismate mutase catalytic step that exhibited strongly exergonic reaction free energies.^{5,12,16,22,35,83,84} Below, we will explain why the computed ΔG^‡ of converged and maximal QM-cluster models is significantly lower than the known experimental value.

The arpeggio interaction classification may be more robust than probe in that it is not inherently limited to only the local interatomic contacts. Indeed, all residues identified in the probe ranking scheme are included in the arpeggio ranking scheme in addition to Glu78, as well as the side chains of Val73, where only backbone atoms had been included in the probe-based models as shown in Table S4 (ESI†). Due to differences in which specific residue atoms interact with chorismate, some fragments in the arpeggio-based models are trimmed and capped differently. Additional details of the arpeggio trimming scheme are shown in Table S1 (ESI†). The maximal arpeggio-based model has 245 atoms, which makes it somewhat larger than the maximal probe-based model (203 atoms). Fig. 3 shows the computed values of ΔG^‡ and ΔG_rxn when employing the arpeggio-based RIN to construct the QM-cluster models. The maximal arpeggio-based model (used as the reference for convergence tests) has ΔG^‡ = 10.2 kcal mol⁻¹ and ΔG_rxn = −16.1 kcal mol⁻¹.


	Fig. 3 Kinetics and thermodynamics of the iteratively grown QM-cluster models using the arpeggio ranking scheme. Computed ΔG^‡ values are represented by circles and stars, and ΔG_rxn values by crosses and triangles. The original ranking is given in magenta, while values from the tie-breaking scheme are given in brown. The black dashed line shows the experimental ΔG^‡ value from ref. 86.

Arpeggio-based models predict satisfactory convergence for ΔG^‡ and ΔG_rxn (Fig. 3, magenta plot) once QM-cluster models are larger than 200 atoms. However, we see a dramatic disruption of convergence in the reaction free energy when Thr74 is added to form the 136-atom model. The computed ΔG_rxn of −36.3 kcal mol⁻¹ is artificially too negative because the chorismate translates far out of the active site in the optimized product structure. Once Arg90 is added to form the 158-atom model, the chorismate is properly posed; all substrate-arginine hydrogen bonds seen in the maximal model are accounted for. While no arpeggio-based models have both ΔG_rxn and ΔG^‡ converged within ±1 kcal mol⁻¹ of the maximal model, convergence to the looser ±3 kcal mol⁻¹ threshold appears once the 177-atom model is constructed.

One limitation with the arpeggio scheme is the more frequent occurrence of tie scores for the number of interaction counts. While the number of probe contacts can vary over 3–4 orders of magnitude as it is linked to the continuous inter-residue surface area, arpeggio interaction count scores will be much smaller as values arise from summing the categorical presence/absence of interaction types. The RINRUS code does not yet preferentially discern between fragments with tied rankings, so there is no chemical significance to the output ordering for those residues. However, depending on which residues are selected in a tie situation, the convergence of ΔG^‡ and ΔG_rxn values can be affected.

In situations where there is a tie in the number of arpeggio contact counts, we have manually reordered the RINRUS ranking list. First, the number of arpeggio contact types are used to break the tie. However, if there are fragments where the number of contact types is also tied, the following convention was used to manually prioritize ranking: charged residues > polar > non-polar residues. In situations where there is still a tie between residues of the same category, the probe-based contact count ranking was used to break the tie, as in the case of Thr74 being added before Tyr108. Improvements to the RINRUS code to automatically account for tie-breaking in either probe or arpeggio rankings are currently in development. Further details about probe, arpeggio, and tie-broken arpeggio QM-cluster models is given in Table S1 (ESI†).

The tie-broken arpeggio-based models (Fig. 3, brown plot) show quicker convergence to the ΔG^‡ value of the maximal model (10.2 kcal mol⁻¹) than in the original arpeggio building scheme. Adding Arg90 before Thr74 via the tie-breaking scheme also eliminates the odd disruption of ΔG_rxn convergence. However, there is still no model where both the ΔG_rxn and ΔG^‡ are converged to within ±1 kcal mol⁻¹ of the maximal model. Tie-broken arpeggio-based models have kinetic and thermodynamic values converging to the loose ±3 kcal mol⁻¹ threshold starting with the 191-atom model. The tie-broken models do not have a significant effect on kinetic or thermodynamic convergence beyond fixing the spurious ΔG^‡ value. However, avoiding random ordering of fragments that have tied contact count or contact type values seems prudent until an automated approach is available.

A third scheme using quantitative chorismate-residue interaction energies as a ranking method was evaluated. As observed in previous work,^66,67 the F-SAPT interaction energies prioritize important charged residues which play a key role in transition state stabilization. Our analysis of several proteins (including chorismate mutase) indicated no apparent correlation between number of probe contact counts and E_int between noncovalently interacting biochemical fragments, raising concern that probe may de-emphasize residues that have a strong, but directional electrostatic interaction with seed fragments. The substrate-residue interaction energies were computed using F-SAPT0, and a series of 11 QM-cluster models were first constructed by adding fragments ranked from largest negative E_int with the chorismate substrate to the largest positive E_int value (Table S5, ESI†). It must be recognized that a negative total F-SAPT interaction energy signifies a favorable interaction between a residue fragment and chorismate, while a positive total F-SAPT interaction energy describes a repulsive interaction. Given a dianionic chorismate substrate, it was expected that positively charged residues will be ranked first, then polar residues, then nonpolar residues, then negatively charged residues. The initial F-SAPT scheme ranked the four positively charged residues highest; Arg7 is first (E_int = −140.5 kcal mol⁻¹), followed by Arg63 (E_int = −133.2 kcal mol⁻¹), then Arg90 (E_int = −113.0 kcal mol⁻¹), and Lys60 (E_int = −78.1 kcal mol⁻¹). For comparison to a few polar residues, the E_int of Tyr108 and Thr74 are −15.5 kcal mol⁻¹ and +10.9 kcal mol⁻¹, respectively.

Matching literature precedence, the probe and arpeggio schemes for constructing QM-cluster models frequently de-prioritize charged residues compared to F-SAPT.⁶⁷ While Arg7 is ranked first or second in all three schemes, Arg90 is ranked 3rd by F-SAPT, 5th by probe, and 6th by arpeggio, as illustrated in Tables S3 and S4 (ESI†). Arg63 (1st by F-SAPT) was ranked 8th by probe and 3rd by arpeggio. A visual relationship between probe contacts and the orientation of important charged active site arginine residues can be seen in Fig. S6 (ESI†). Phe57 is ranked first in the probe ranking scheme with a total of 288 contacts with chorismate (highlighted in grey in Fig. S6, ESI†), but only has an F-SAPT E_int of −2.0 kcal mol⁻¹ and is ranked 10th. Arg63 has only 97 probe (highlighted in yellow) interaction counts, making it the 8th ranked fragment, but again has the second largest negative F-SAPT interaction energy. Charged active site amino acid residues are crucial for both NAC and TSS of the chorismate substrate. Yet Arg7 is the only one of four positively charged residues in the BsCM active site that is ranked consistently high in the probe, arpeggio, and F-SAPT schemes. Our F-SAPT results strongly suggest large residue side chains can be oriented in such a way that they provide strong hydrogen bonds within an active site, but have low RIN contact count values.

In a recent analysis of glycine-N-methyltransferase,⁸⁵ we recognized that residues with strongly unfavorable (positive) interaction energies should be ranked higher than residues with near-zero F-SAPT interaction energies. Ranking fragments by |E_int| will thus prioritize negatively charged active site fragments that have a large, but unfavorable interaction with the dianionic substrate before fragments that have a small or negligible interaction with the substrate. Semantically, the difference between F-SAPT schemes is subtle, but the quality of QM-cluster models could be substantially affected by this choice. The ΔG^‡ and ΔG_rxn values for the two F-SAPT ranking schemes (signed in magenta and unsigned in brown) are overlaid in Fig. 4. Both schemes overlap until the 139-atom model, where Ala59 is next added in the signed scheme and Thr74 is added in the unsigned scheme.


	Fig. 4 Kinetics and thermodynamics of the iteratively grown QM-cluster models using the F-SAPT ranking scheme. Computed ΔG^‡ values are represented by circles and stars, and ΔG_rxn values by crosses and triangles. The signed ranking order is given in magenta, while the unsigned ranking order is given in brown. The black dashed line shows the experimental ΔG^‡ value from ref. 86.

As the F-SAPT calculations were derived from the maximal probe model, the probe, signed and unsigned F-SAPT schemes will all have an equivalent maximal model (ΔG^‡ = 9.1 kcal mol⁻¹ and ΔG_rxn = −16.3 kcal mol⁻¹) that does not need to be recomputed. The unsigned F-SAPT ranking scheme exhibits slightly improved convergence over the signed scheme, as the last three unsigned models were within ±3 kcal mol⁻¹ of the maximal model for both ΔG^‡ and ΔG_rxn values (Table S3, ESI†). Despite the expectation that QM-cluster models derived from F-SAPT rankings would be optimal, none of the truncated F-SAPT models are within ±1 kcal mol⁻¹ of both the ΔG^‡ and ΔG_rxn values. Thus, there is little qualitative difference between the largest QM-cluster models built with the F-SAPT, probe, or arpeggio ranking schemes. The F-SAPT scheme is also quite computationally expensive on the front end compared to probe and arpeggio schemes. Generally, only QM-cluster models of chorismate mutase that closely resemble the maximal models are reliable. To ascertain how more liberally truncated models can appropriately reproduce NAC or TSS phenomena, a brute force or combinatorial approach (like the RINRUS-based investigation of catechol-O-methyltransferase)⁴ would need to be carried out on the chorismate mutase active site.

Previous eznymology studies done by our group have shown that B3LYP generally underestimates free energies of activation compared to experiment.^2–4,85 Accordingly, all ranking schemes had maximal QM-cluster models of the chorismate mutase active site that exhibited ΔG^‡ values significantly lower than the experimental value⁸⁶ of 15.4 ± 0.5 kcal mol⁻¹. The maximal F-SAPT/probe-based model predicted an activation free energy of 9.1 kcal mol⁻¹, while the maximal arpeggio-based model predicted 10.2 kcal mol⁻¹. QM-cluster models reported by Burschowsky and coauthors at the B3LYP/6-31G(d)//B3LYP/6-311+G(d,p) level of theory arrived at an even lower ΔG^‡ value for the chorismate mutase catalysis (8.6 kcal mol⁻¹).³⁵ It is important to stress that this work is not concerned with accuracy of the QM-cluster models, but focused on understanding how kinetics and thermodynamics are influenced by the decisions involved in QM-cluster model construction.

Our lab (and others) are exploring much-needed benchmarks of one-electron basis set and density functional on enzyme models.^{45,46,87–92} To avoid model construction contributing to kinetic and thermodynamic errors, the current study demonstrates that QM-cluster models require, at minimum, over ∼150 atoms. This lower bound to model size unfortunately guarantees that employing large basis sets and double-hybrid density functionals will be intractable for most production-level exploration of enzyme chemical mechanisms. Ideally, the community will arrive at a consensus on methodological best practices in QM-cluster modeling to accurately and efficiently compare to experimental observation. Until then, dispersion-corrected B3LYP with small Pople-style basis sets is an efficient and mostly reliable level of theory for calibrating the error arising from QM-cluster model composition.

Building QM-cluster models from MD frames

Next, we explore the impact that fluctuations of residue and substrate positioning can have on both the design of QM-cluster modeling and the resulting kinetic and thermodynamic properties. First, 250 frames from a 20 ns MD simulation of solvated chorismate mutase were sampled to construct maximal QM-cluster models of the active site using probe contacts. Structures from MD simulations can be advantageous over crystallographic structures in their unambiguous hydration shells and energy relaxation of the active site structure based on in vivo substrates rather than inhibitors or transition state analogues. However, building QM-models from MD simulations will incorporate statistical uncertainty, as sampling many MD frames are required to represent the diversity of structural conformations.^93–95 In particular, we examine three features particularly relevant for QM-cluster modeling that are expected to cause variation in the predicted reaction properties: (1) the number and identity of residues included in the model, (2) the number of waters included in the model, and (3) the statistical ensemble of sampled frames.

In plotting the activation and reaction free energies for all 250 MD-derived QM-cluster models (Fig. 5, Table 1, and Fig. S7, ESI†), there is a wide range of values wherein the mean activation free energy is 10.3 ± 2.6 kcal mol⁻¹ and the mean free energy of reaction is −15.4 ± 3.4 kcal mol⁻¹. These ranges encompass the converged values observed for QM-cluster models built from the X-ray crystal structure, though this is unsurprising given the large standard deviation observed in the ensemble of refined MD frames. The size of the maximal QM-cluster models ranges from 158 to 240 atoms, with the five smallest models containing only 8 residues and 5 or 6 waters and the largest model containing 13 residues and 3 waters.


	Fig. 5 Computed values of ΔG^‡ (circle) and ΔG_rxn (triangle) for the 250 maximal QM-cluster models plotted against the select frame number (each representing a time scale of 1 ps). The black dashed line at the top is the experimental value from ref. 86.

Table 1 Mean free energies of activation and reaction for the various MD frame selection schemes. k-Means clusters are labelled with a C (all in kcal mol⁻¹)

Scheme	Cluster	# of frames	ΔG^‡	σ	ΔG_rxn	σ
S ₁		20	10.07	±2.87	−16.23	±3.90
S ₂		30	10.12	±2.39	−16.28	±3.82
S ₃		30	10.03	±2.83	−14.99	±3.06
S ₄		30	10.06	±1.88	−15.92	±2.86
S ₅		30	10.29	±3.05	−15.57	±3.09
	C₁	10	9.90	±2.74	−16.78	±3.12
	C₂	10	11.03	±3.98	−15.54	±3.48
	C₃	10	9.95	±1.92	−14.40	±2.02
S ₆		40	10.23	±2.38	−15.24	±3.04
S ₇		40	10.74	±2.69	−15.35	±3.52
S ₈		30	10.96	±2.69	−13.82	±3.36
	C₁	10	10.83	±1.95	−13.85	±2.30
	C₂	10	10.80	±3.94	−13.57	±4.15
	C₃	10	11.49	±1.41	−14.06	±3.34
Combined		250	10.34	±2.62	−15.38	±3.40

Using probe to identify active site fragments, a total of 22 residues were identified as having at least one contact interaction with the substrate in at least one frame over the course of the entire MD simulation. Table S6 (ESI†) shows the mean interaction counts of each identified residue with chorismate. There is precedence that crystal packing leads to an increase in protein–substrate contact counts.^67,96 However, replacement of the TSA with chorismate in the X-ray crystal structure without a subsequent geometry relaxation does not create steric clashes with the protein, which, if present, might have nonphysically amplified the contact counts. As expected, the Arg90 and Arg7 residues have the highest mean contact counts, 116.3 and 78.3, respectively. Several residues appear in RINs during the entire MD run with very low mean interaction counts (<0.02) such as Ala9, Pro117, and residues 242–245. None of these residues have inter-residue contacts with the TSA in the X-ray crystal structure. Pro117 is the only “rare” residue from the entire MD simulation that also appears in the 250 selected frames that were refined to QM-cluster models. The mean interaction counts of residues modeled in the 250 QM-cluster models is similar to those observed in the 20 [thin space (1/6-em)] 000 RINs of the MD simulation (Table S6, ESI†). This similarity affirms that the selection schemes used to refine MD frames into QM-cluster models are representative of the entire MD simulation. From Tables S6 and S7 (ESI†), we find that consistently high-ranking active site residues common to probe, arpeggio, and F-SAPT schemes can occasionally be missing entirely from specific MD frames.

Surprisingly, QM-cluster models with atypical composition do not necessarily create kinetic or thermodynamic outliers. Frame 394 is the only member of the 250-frame subset to not have any probe contacts with the Arg90 side chain. It also does not contain an Arg63 fragment, making it the only QM-cluster model with net −2 charge. The missing fragments result in a spuriously high free energy of activation (see below). The QM-cluster models made from frames 9464, 14 [thin space (1/6-em)] 007, 16450 are the only three of the 250 that have no probe contacts between substrate and Leu115, yet all three have kinetic/thermodynamic properties within the uncertainty range of the total set. Frames with rare residues have a small impact on the overall kinetic and thermodynamic values. For example, the five QM-cluster models that contain Pro117 have mean ΔG^‡ and ΔG_rxn values of 11.2 ± 2.9 kcal mol⁻¹ and −15.1 ± 4.3 kcal mol⁻¹, respectively.

Mean probe contact counts of the 250 QM-cluster models arising from MD sampling emphasize charged residues more than the X-ray crystal structure, but interestingly, Fig. S8 (ESI†) still shows a lack of correlation with F-SAPT |E_int| values computed at the X-ray crystal structure. MD-averaged probe counts rank the first five residues as Arg90, Arg7, Leu115, Ala59, and Arg63. The Lys60 residue has a mean contact count of only 2.9, but as demonstrated earlier, has the 4th-largest |E_int| with the substrate. The mean probe contact counts for Leu115 are large (72.3), but it has the smallest absolute F-SAPT interaction energy. Of the uncharged side chain fragments, Tyr108 has the smallest mean probe count (28.8) and the largest |E_int| value. These conflicting results demonstrate how various schemes rank residue importance differently. Great challenges remain in quantifying the impact of specific amino acid fragments on protein–substrate reactivity.

The catalytic activity of chorismate mutase is particularly driven by charge stabilization interactions, which might be susceptible to differences in net model charge. Thus, it is of interest to examine whether differences in model charge of QM-cluster models refined from individual MD frames can account for the broad range of activation and reaction free energies observed. Fig. 6 shows the distribution of the net model charges for the 250 QM-cluster models compared to the range of ΔG^‡ and ΔG_rxn values for each model. The net charge of our 250 QM-cluster models varies from −2 to +2, with the majority (200 models) having an overall neutral charge. QM-cluster models with a neutral model charge had mean ΔG^‡ and ΔG_rxn values of 10.1 ± 2.4 and −15.7 ± 3.3 kcal mol⁻¹, respectively. Only one MD-based QM-cluster model (frame 394) has a −2 net charge model and it provides anomalously high values of ΔG^‡ and ΔG_rxn, 20.0 and −7.7 kcal mol⁻¹, respectively. The outlying energetics of frame 394 are likely due to missing Arg90 and Arg63 fragments, which have proven to be critical for the enzyme catalysis.^19,31 The 33 QM-cluster models with net +1 charge show the largest range of ΔG^‡ values, encompassing the highest (19.1 kcal mol⁻¹, frame 4114) and lowest (4.1 kcal mol⁻¹, frame 8310) values. However, the mean energetic values are in reasonable agreement with the complete set, 11.1 ± 3.6 kcal mol⁻¹ for ΔG^‡ and −14.0 ± 3.3 kcal mol⁻¹ for ΔG_rxn. The net charge of the QM-cluster models does not systematically influence the ΔG^‡ and ΔG_rxn values.


	Fig. 6 Charge distribution for the 250 QM-cluster models refined from MD frames. The corresponding number of QM-cluster models for each net model charge is: charge −2 = 1 QM-cluster model, charge −1 = 14 QM-cluster models, charge 0 = 200 QM-cluster models, charge +1 = 33 QM cluster models, and charge +2 = 2 QM-cluster models.

We have shown the maximal QM-cluster models based on the X-ray crystal structure, from any of our building schemes, are expected to provide kinetics and reaction thermodynamics that are reliably converged at a given level of theory (Fig. 2). The 250 maximal QM-cluster models derived from MD will have significant variations in the residues that are included in each RIN. This heterogeneity opens the question: when comparing QM-cluster models with the same fragment composition but with different active site conformation and/or relative frozen atom positions, will the computed reaction kinetics and thermodynamics show consistent values or large variance? To disentangle model composition from model structure, the dataset is trimmed to only include MD-derived QM-cluster models that have an identical composition. This data filtering ignores distinguishing models with different water molecule positioning. The subset contained 144 total models in 37 different bins (Fig. S9, ESI†). Among the groups of models with identical designs but taken from different snapshots, the groups still show a wide distribution of ΔG^‡ and ΔG_rxn values, with ranges from 4.1 to 16.4 kcal mol⁻¹ for ΔG^‡ and −28.8 to −6.7 kcal mol⁻¹ for ΔG_rxn. No patterns seem to emerge from this data. If the bins in Fig. S9 (ESI†) showed a narrow distribution of kinetics and thermodynamics, we would conclude that the observed wide distribution of values in the 250 QM-cluster models manifested from differences in active site fragment composition. However, data in Fig. S9 (ESI†) match the large variation of the total set of QM-cluster models refined from the MD simulation. The variation must be due to conformational fluctuation of active site residues and water molecules during the course of the MD trajectory.

The active site RIN from the X-ray crystal structure contains only a single crystallographically resolved water molecule shown to have interactions with the substrate captured by probe. The chorismate mutase active site is small and quite solvent-exposed, but the lack of crystallographically resolved water molecules is unsurprising (though rarely quantified in the literature). The 3D protein structure is typically of greater interest than the poorly resolved oxygen nuclei of the bulk solvent. In contrast, the QM-cluster models generated from the MD simulation encompass a comprehensive hydration shell. In the 250 MD frames selected for QM-cluster model refinement, 2 to 10 water molecules are identified by probe as having an interaction with chorismate (Fig. 7). Intriguingly, the RINRUS-built QM-cluster models of chorismate mutase derived from MD frames have on average 5.6 water molecules interacting with the substrate. Frame 6981 is the only QM-cluster model with 2 waters in the active site, and ΔG^‡ is predicted to be 10.8 kcal mol⁻¹. At the other extreme, the two QM-cluster models with 10 waters have a mean ΔG^‡ value of 11.3 kcal mol⁻¹. Only 29 models total have 2, 3, 8, 9, or 10 water molecules in the RIN. Despite low occurrence in the sampled MD frames, these models have mean predicted ΔG^‡ and ΔG_rxn values of 10.9 ± 3.0 kcal mol⁻¹ and −14.8 ± 2.9 kcal mol⁻¹, respectively; kinetics and thermodynamics are within uncertainties of the total set of 250 models. The 221 QM-cluster models with 4 to 7 water molecules are qualitatively similar, 10.2 ± 2.6 kcal mol⁻¹ for ΔG^‡ and −15.4 ± 3.4 kcal mol⁻¹ for ΔG_rxn. Clearly, the number of waters in the BsCM active site has minimal influence on the kinetic and thermodynamic properties of QM-cluster models. However, the inclusion of any type of water network at the active site-solvent boundary in our MD-derived QM-cluster models may be a factor in the ∼2 kcal mol⁻¹ higher free energies of activation observed compared to models constructed from the X-ray crystal structure.


	Fig. 7 Mean activation free energy (brown), reaction free energy (magenta), and number of QM-cluster models with a given number of explicit water molecules (cyan) identified as having interatomic contacts with the chorismate for the 250 QM-cluster models built from selected MD frames.

Finally, we analyze groupings of the statistical ensemble of QM-cluster models (Table 1), which showed minimal statistical difference with the overall mean kinetic and thermodynamic values (ΔG^‡ = 10.3 ± 2.6 kcal mol⁻¹ and ΔG_rxn = −15.4 ± 3.4 kcal mol⁻¹). Schemes labeled XS₂ to XS₈, are expanded versions of S₂ to S₈, and include all frames from the 250 QM-cluster models that fit the criteria of each Scheme. For example, XS₂ includes the 30 frames from S₂ and the additional 118 frames from the 250 frame set that have an RMSD within 0.76 Å of the mean backbone atom RMSD. Kinetic and thermodynamic results for the expanded schemes are given in Table 2.

Table 2 Mean free energies of activation and reaction for the expanded schemes. The individual k-means clusters are labelled XC (all in kcal mol⁻¹)

Scheme	Cluster	# of frames	ΔG^‡	σ	ΔG_rxn	σ
S ₁ + S ₆ + S ₇		100	10.40	±2.63	−15.48	±3.44
XS ₂		148	10.17	±2.75	−15.64	±3.44
XS ₃		173	10.35	±2.70	−15.46	±3.50
XS ₄		186	10.39	±2.64	−15.36	±3.37
XS ₅
	XC₁	92	10.25	±2.63	−16.16	±3.92
	XC₂	89	10.30	±2.85	−15.00	±3.05
	XC₃	69	10.50	±2.28	−14.85	±2.83
XS ₈
	XC₁	77	10.42	±2.52	−15.38	±3.54
	XC₂	81	10.27	±2.86	−15.05	±3.39
	XC₃	92	10.32	±2.49	−15.69	±3.24
Combined		250	10.34	±2.62	−15.38	±3.40

The first scheme, S₁, contains 20 frames and should be representative of a random and unbiased distribution of activation and reaction free energies over the course of the entire MD simulation. Mean ΔG^‡ and ΔG_rxn values of the 20 frames used in S₁ are lower than the total set, but in reasonable agreement. Establishing that k-means clustering of S₆ and S₇ was invalid, these two schemes also represent a random selection of frames. We combined the frames of S₁, S₆, and S₇ (100 total) into an expanded Scheme (S₁ + S₆ + S₇) in Table 2. Interestingly, the kinetic and thermodynamic values of S₁ + S₆ + S₇ are within 0.10 kcal mol⁻¹ of the entire data set. This improved agreement suggests 20 randomly selected frames (8% of the total data set) may not be a robust amount. Since most of the expanded schemes have mean kinetic and thermodynamic values very similar to the total set of 250 MD frames, then a sample of 100 frames (40% of the data points in total set) may be an upper bound needed to emulate the total set.

The next sets of schemes (S₂, S₃, and S₄), take into account the fluctuation of the active site residues and discard MD frames geometrically dissimilar to the X-ray crystal structure. All three schemes predict mean ΔG^‡ values slightly lower than the entire dataset. S₂ and S₄ mean ΔG_rxn values are lower than the total mean, while the S₄ mean is slightly higher than S₂ and S₃. The extended XS₃ and XS₄ schemes (Table 2) are closer to the total mean statistics than XS₂.

The S₅ scheme used k-means clustering of the RMSDs (ranging from 1.46 to 4.22 Å shown in Table S8, ESI†) of the active site residues to group similar frames into clusters. The three clusters for S₅ are ordered from largest centroid RMSD value (S₅–C₁) to the lowest (S₅–C₃). The (S₅–C₁) and (S₅–C₃) clusters have nearly the same mean ΔG^‡ value, below the mean ΔG^‡ value of the total data set. The (S₅–C₂) cluster in contrast, is higher (11.0 kcal mol⁻¹) than the total data set. Values of ΔG_rxn become less negative as the centroid RMSD value decreases from C₃ to C₁, and the extended Scheme XS₅–C₃ to XS₅–C₁ follows the same pattern. Scheme S₅ gives a mean ΔG^‡ value closest to that of the total data set. The mean ΔG_rxn value for S₅ is also quite close, but effectively random sampling in S₆ and S₇ give a slightly better match to the total set.

The last scheme (S₈) classified MD frames with k-means clustering according to probe interatomic contacts between the chorismate ligand and surrounding residues. All three clusters of S₈ predicted the mean ΔG^‡ value to be 0.46–1.15 kcal mol⁻¹ higher than the mean of the total dataset. The statistics of the expanded clusters of XS₈ are much closer to the total dataset. Notwithstanding, the largest magnitude differences between any frame selection scheme and mean values of the 250 QM-cluster models are 0.62 kcal mol⁻¹ for ΔG^‡ and 1.56 kcal mol⁻¹ for ΔG_rxn. For the expanded schemes, the largest absolute differences decrease to 0.17 kcal mol⁻¹ for mean ΔG^‡ and 0.78 kcal mol⁻¹ for mean ΔG_rxn.

In summary, efforts to find a subset of MD frame selection schemes that best reflect the kinetic and thermodynamic values of a large statistical ensemble were inconclusive, yet promising. All eight schemes shown in Table 1, with 20–40 MD frames in each refined to QM-cluster models, give reasonable approximations to the larger set of 250 MD frames. Expanded schemes with 69–186 selected MD frames give mean values even closer to the larger data set. Schemes employing k-means clustering to partition frames via structural metrics did not perform better than schemes with completely random selected MD frames. However, the QM-cluster models were built from one of three trimeric BsCM active sites (the Chain A/C interface) that exhibited the least conformational fluctuation during the course of the 20 ns MD simulation. Machine-learned selection procedures like k-means clustering may be more beneficial for enzymes with more disordered regions or that undergo substantial conformational changes during the simulation time.

Conclusions

Over 50 QM-cluster models of Bacillus subtilis chorismate mutase based on the X-ray crystal structure, and an additional 250 QM-cluster models obtained from sampling MD frames were extensively tested with the RINRUS software package being developed by our group. RINRUS automatically identifies and trims fragments that interact with a substrate, allowing quantitative and reproducible analysis of how the active site fragments affect enzyme catalysis.

The smallest QM-cluster models built with probe, arpeggio and F-SAPT schemes showed critical differences in how the kinetic and thermodynamics were altered by subsequent addition of residues. Once model building schemes approach the size of the “maximal” model, all three iterative schemes behaved similarly. We have seen some methodological issues with the arpeggio ranking scheme where ties can occur in the number of contact counts or contact types. The tie issue in arpeggio was resolved manually, and fixed an outlying reaction free energy that was observed in one of the smaller QM-cluster models. The solution to tied interaction counts or types will require more automation to be incorporated into RINRUS functionality.

The F-SAPT-based interaction energies highlight the importance of active site charged residues. We recommend always using absolute values of F-SAPT interaction energies to rank active site fragments in QM-cluster model construction. Rankings via signed interaction energies may de-prioritize important active site fragments that exhibit electrostatic repulsion with a substrate. The unsigned F-SAPT ranking scheme showed slight improvement of convergence compared to probe and arpeggio schemes, but no truncated models in any of the schemes converged to within 1 kcal mol⁻¹ of the respective maximal models. We again validate that there is no correlation between the number of probe contact counts and E_int obtained from F-SAPT computations. More case studies are required to determine if the small performance differences between schemes is related to the compact size of the BsCM active site. Nevertheless, probe-based models, arpeggio and F-SAPT maximal models are similar, providing evidence that the largest RINRUS-generated QM-cluster models are robust and reliable.

As is widely known in the community and seen in our previous studies, B3LYP-GD3BJ with small Pople-style basis sets and implicit solvation with CPCM systematically underestimates the free energies of activation of enzyme mechanisms compared to the experimental kinetic value. A focus on the quality of the quantum chemical level of theory is purposefully avoided in this work, to instead efficiently provide insight about QM-cluster model building approaches.

The crystallographic protein structure was then solvated within an explicit water bath and, over a 20 ns equilibrated MD simulation, 250 frames were selected to construct 250 QM-cluster models of the active site. The proposed catalyzed Claisen rearrangement mechanism was computed for all QM-cluster 250 models, and the reaction thermodynamics are observed to fluctuate, with the activation free energy spanning 10.34 ± 2.62 kcal mol⁻¹ and the reaction free energy spanning −15.38 ± 3.40 kcal mol⁻¹. The variation is shown to be primarily due to the changes in residue/solvent/ligand positioning and conformation that occur over the MD simulation, rather than differences in residue composition among the models. For example, we noted that some active site residues highly ranked in the probe, arpeggio, and F-SAPT schemes can be absent from specific MD frames when the residues shift to different placements, but the computed kinetic and thermodynamic properties of those complexes can still be reasonable given the QM-cluster model is suitably constructed. Furthermore, while the catalytic mechanism is largely derived from charge stabilization interactions, we might expect the QM-cluster models to be very sensitive to changes in net model charge. The results show most of the variation in ΔG^‡ and ΔG_rxn values is largely among models with neutral net charge and a general insensitivity in predicted values with net charge between ±1 was observed. The active site interface with bulk solvent is shown to influence kinetics and thermodynamics of the QM-cluster models. However, the number of explicit water molecules included in the models appear to be inconsequential.

Collectively, results from the MD to QM-cluster model refinement point to the changing molecular positioning rather than model composition as the main source for changing reaction thermodynamics over the sampled times. We attempted to trace the thermodynamic differences to simple, easily quantifiable structural differences among the models, specifically by grouping models based upon RMSDs in backbone or side chain atoms. Ultimately, none of the metrics were better than random selection for acceptably sampling a statistical ensemble of structures. A more multifaceted technique will be required to efficiently cluster MD frames for QM-cluster model refinement, especially if the enzyme undergoes major conformational changes during the MD simulation.

This study exemplifies diverse features of the RINRUS toolkit by comparing the structural variation between X-ray crystal structure-based models and MD-based models of bacterial chorismate mutase. Composition of QM-cluster models, or the QM region of a QM/MM model is an essential part of reliability and accuracy in computational enzymology. For far too long, a lack of automated model building techniques and software has hampered advancement of the field as well as the reproducibility of seminal work. Here, QM-cluster modeling provided insight into the enzymatic activity of chorismate mutase by connecting the model composition, the contribution of charged residues, the influence of explicit solvent water molecules, and positioning and orientation of active site residues to the computed kinetic and thermodynamic values. Accompanying data can be easily used to perform further cheminformatic analysis or to calibrate accuracy with more reliable quantum chemistry methodologies; RINRUS was designed with reproducibility as a core feature.

Author contributions

Donatus A. Agbaglo: conceptualization (equal); formal analysis (lead); investigation (lead); methodology (equal); visualization (lead); writing – original draft (lead); writing – review & editing (equal). Thomas J. Summers: conceptualization (equal); formal analysis (supporting); methodology (equal); visualization (supporting); writing – original draft (supporting); writing – review & editing (equal). Qianyi Cheng: writing – review & editing (equal); investigation (supporting). Nathan J. DeYonker: conceptualization (equal); resources (lead); writing – original draft (supporting); writing – review & editing (equal).

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

This work is supported by the National Science Foundation (NSF) CAREER BIO-1846408 (for NJD and DAA), the Department of Energy (DOE) BES SBIR DE-SC0021568 (for NJD), the National Institute of General Medical Sciences of the National Institutes of Health under award number 1R35GM145206-01 (for QC), and the NSF Graduate Research Fellowship Program under Grant No. 1451514 (for TJS). The authors are grateful for the support provided by the University of Memphis High Performance Computing Center.

References

D. A. Kraut, K. S. Carroll and D. Herschlag, Challenges in enzyme mechanism and energetics, Annu. Rev. Biochem., 2003, 72, 517–571 CrossRef CAS PubMed.
Q. Cheng and N. J. DeYonker, Acylation and deacylation mechanism and kinetics of penicillin G reaction with Streptomyces R61 DD-peptidase, J. Comput. Chem., 2020, 41, 1685–1697 CrossRef CAS PubMed.
Q. Cheng and N. J. DeYonker, QM-Cluster Model Study of the Guaiacol Hydrogen Atom Transfer and Oxygen Rebound with Cytochrome P450 Enzyme GcoA, J. Phys. Chem. B, 2021, 125, 3296–3306 CrossRef CAS PubMed.
T. J. Summers, Q. Cheng, M. A. Palma, D.-T. Pham, D. K. Kelso III, C. E. Webster and N. J. DeYonker, Biophys Cheminformatic quantum mechanical enzyme model design: A catechol-O-methyltransferase case study, Biophys. J, 2021, 120, 3577–3587 CrossRef CAS PubMed.
H. L. Woodcock, M. Hodošcek, P. Sherwood, Y. S. Lee, H. F. SchaeferIII and B. R. Brooks, Exploring the quantum mechanical/molecular mechanical replica path method: a pathway optimization of the chorismate to prephenate Claisen rearrangement catalyzed by chorismate mutase, Theor. Chem. Acc., 2003, 109, 140–148 Search PubMed.
Y. S. Lee, S. E. Worthington, M. Krauss and B. R. Brooks, Reaction mechanism of chorismate mutase studied by the combined potentials of quantum mechanics and molecular mechanics, J. Phys. Chem. B, 2002, 106, 12059–12065 CrossRef CAS.
O. Wiest and K. Houk, On the transition state of the chorismate-prephenate rearrangement, J. Org. Chem., 1994, 59, 7582–7584 CrossRef CAS.
O. Wiest and K. Houk, Tabilization of the transition state of the chorismate-prephenate rearrangement: An ab initio study of enzyme and antibody catalysis, J. Am. Chem. Soc, 1995, 117, 11628–11639 CrossRef CAS.
P. D. Lyne and A. J. Mulholland, and W. G. Insights into chorismate mutase catalysis from a combined QM/MM simulation of the enzyme reaction, J. Am. Chem. Soc., 1995, 117, 11345–11350 CrossRef CAS.
M. Davidson, J. Guest, J. Simon Craw, I. Hillier and M. Vincent, Conformational and solvation aspects of the chorismate–prephenate rearrangement studied by ab initio electronic structure and simulation methods, J. Chem. Soc., Perkin Trans. 2, 1997, 1395–1400 RSC.
R. J. Hall, S. A. Hindle, N. A. Burton and I. H. Hillier, Aspects of hybrid QM/MM calculations: the treatment of the QM/MM interface region and geometry optimization with an application to chorismate mutase, J. Comput. Chem., 2000, 21, 1433–1441 CrossRef CAS.
N. A. Khanjin, J. P. Snyder and F. Menger, Mechanism of chorismate mutase: Contribution of conformational restriction to catalysis in the Claisen rearrangement, J. Am. Chem. Soc., 1999, 121, 11831–11846 CrossRef CAS.
S. Madurga and E. Vilaseca, SCRF study of the conformational equilibrium of chorismate in water, Phys. Chem. Chem. Phys., 2001, 3, 3548–3554 RSC.
A. Crespo, D. A. Scherlis, M. A. Mart, P. Ordejón, A. E. Roitberg and D. A. Estrin, A DFT-based QM-MM approach designed for the treatment of large molecular systems: Application to chorismate mutase, J. Phys. Chem. B, 2003, 107, 13728–13736 CrossRef CAS.
P. M. Dewick, The biosynthesis of shikimate metabolites, Nat. Prod. Rep., 1995, 12, 101–133 RSC.
P. Andrews, G. D. Smith and I. Young, Transition-state stabilization and enzymic catalysis. Kinetic and molecular orbital studies of the rearrangement of chorismate to prephenate, Biochemistry, 1973, 12, 3492–3498 CrossRef CAS PubMed.
H. Gorisch, Mechanism of chorismate mutase reaction, Biochemistry, 1978, 17, 3700–3705 CrossRef CAS PubMed.
M. Freindorf, Y. Tao, D. Sethio, D. Cremer and E. Kraka, New mechanistic insights into the Claisen rearrangement of chorismate–a Unified Reaction Valley Approach study, Mol. Phys., 2019, 117, 1172–1192 CrossRef CAS.
Y. M. Chook, H. Ke and W. N. Lipscomb, Crystal structures of the monofunctional chorismate mutase from Bacillus subtilis and its complex with a transition state analog, Proc. Natl. Acad. Sci. U. S. A., 1993, 90, 8600–8603 CrossRef CAS PubMed.
T. C. Bruice, A view at the millennium: the efficiency of enzymatic catalysis, Acc. Chem. Res., 2002, 35, 139–148 CrossRef CAS PubMed.
F. Claeyssens, K. E. Ranaghan, N. Lawan, S. J. Macrae, F. R. Manby, J. N. Harvey and A. J. Mulholland, Analysis of chorismate mutase catalysis by QM/MM modelling of enzyme-catalysed and uncatalysed reactions, Org. Biomol. Chem., 2011, 9, 1578–1590 RSC.
D. Burschowsky, A. van Eerde, M. Ökvist, A. Kienhöfer, P. Kast, D. Hilvert and U. Krengel, Electrostatic transition state stabilization rather than reactant destabilization provides the chemical basis for efficient chorismate mutase catalysis, Proc. Natl. Acad. Sci. U. S. A., 2014, 111, 17516–17521 CrossRef CAS PubMed.
X. Zhang, X. Zhang and T. C. Bruice, A definitive mechanism for chorismate mutase, Biochemistry, 2005, 44, 10443–10448 CrossRef CAS PubMed.
A. L. Lamb, Pericyclic reactions catalyzed by chorismate-utilizing enzymes, Biochemistry, 2011, 50, 7476–7483 CrossRef CAS PubMed.
M. Štrajbl, A. Shurki, M. Kato and A. Warshel, Apparent NAC effect in chorismate mutase reflects electrostatic transition state stabilization, J. Am. Chem. Soc., 2003, 125, 10228–10237 CrossRef PubMed.
C. C. Galopin, S. Zhang, D. B. Wilson and B. Ganem, On the mechanism of chorismate mutases: clues from wild-type E. coli enzyme and a site-directed mutant related to yeast chorismate mutase, Tetrahedron Lett., 1996, 37, 8675–8678 CrossRef CAS.
D. R. Liu, S. T. Cload, R. M. Pastor and P. G. Analysis, of active site residues in Escherichia coli chorismate mutase by site-directed mutagenesis, J. Am. Chem. Soc, 1996, 118, 1789–1790 CrossRef CAS.
S. Zhang, P. Kongsaeree, J. Clardy, D. B. Wilson and B. Ganem, Site-directed mutagenesis of monofunctional chorismate mutase engineered from the E coli P-protein, Bioorg. Med. Chem., 1996, 4, 1015–1020 CrossRef CAS PubMed.
G. Schnappauf, N. Sträter, W. N. Lipscomb and G. H. Braus, A glutamate residue in the catalytic center of the yeast chorismate mutase restricts enzyme activity to acidic conditions, Proc. Natl. Acad. Sci. U. S. A., 1997, 94, 8491–8496 CrossRef CAS PubMed.
J. K. Lassila, J. R. Keeffe, P. Kast and S. L. Mayo, Exhaustive mutagenesis of six secondary active-site residues in Escherichia coli chorismate mutase shows the importance of hydrophobic side chains and a helix N-capping position for stability and catalysis, Biochemistry, 2007, 46, 6883–6891 CrossRef CAS PubMed.
A. Kienhöfer, P. Kast and D. Hilvert, Selective stabilization of the chorismate mutase transition state by a positively charged hydrogen bond donor, J. Am. Chem. Soc., 2003, 125, 3206–3207 CrossRef PubMed.
S. Hur and T. C. Bruice, Enzymes do what is expected (chalcone isomerase versus chorismate mutase), J. Am. Chem. Soc., 2003, 125, 1472–1473 CrossRef CAS PubMed.
X. Zhang and T. C. Bruice, The proficiency of a thermophilic chorismate mutase enzyme is solely through an entropic advantage in the enzyme reaction, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 18356–18360 CrossRef CAS PubMed.
A. Shurki, M. Štrajbl, J. Villa and A. Warshel, How much do enzymes really gain by restraining their reacting fragments?, J. Am. Chem. Soc., 2002, 124, 4097–4107 CrossRef CAS PubMed.
D. Burschowsky, U. Krengel, E. Uggerud and D. Balcells, Quantum chemical modeling of the reaction path of chorismate mutase based on the experimental substrate/product complex, FEBS. Open Bio., 2017, 7, 789–797 CrossRef CAS PubMed.
A. J. Mulholland, Computational enzymology: modelling the mechanisms of biological catalysts, 2008 Search PubMed.
S. Ahmadi, L. Barrios Herrera, M. Chehelamirani, J. Hostaš, S. Jalife and D. R. Multiscale, modeling of enzymes: QM-cluster, QM/MM, and QM/MM/MD: A tutorial review, Int. J. Quant. Chem., 2018, 118, e25558 CrossRef.
H. Guo and N. Rao, Chorismate-Mutase-Catalyzed Claisen Rearrangement, Claisen Rearrange., 2007, 1–23 Search PubMed.
H. J. Kulik, J. Zhang, J. P. Klinman and T. J. Martinez, How large should the QM region be in QM/MM calculations? The case of catechol O-methyltransferase, J. Phys. Chem. B, 2016, 120, 11381–11394 CrossRef CAS PubMed.
H. J. Kulik, N. Luehr, I. S. Ufimtsev and T. J. Martinez, Ab initio quantum chemistry for protein structures, J. Phys. Chem. B, 2012, 116, 12501–12509 CrossRef CAS PubMed.
M. Karelina and H. J. Kulik, Systematic quantum mechanical region determination in QM/MM simulation, J. Chem. Theory Comput., 2017, 13, 563–576 CrossRef CAS PubMed.
S. Sumner, P. Soderhjelm and U. Ryde, Effect of geometry optimizations on QM-cluster and QM/MM studies of reaction energies in proteins, J. Chem. Theory Comput., 2013, 9, 4205–4214 CrossRef CAS PubMed.
L. Hu, J. Eliasson, J. Heimdal and U. Ryde, Do quantum mechanical energies calculated for small models of protein-active sites converge?, J. Phys. Chem. A, 2009, 113, 11793–11800 CrossRef CAS PubMed.
M. G. Delcey, K. Pierloot, Q. M. Phung, S. Vancoillie, R. Lindh and U. Ryde, Accurate calculations of geometries and singlet–triplet energy differences for active-site models of [NiFe] hydrogenase, Phys. Chem. Chem. Phys., 2014, 16, 7927–7938 RSC.
D. A. Wappett and L. Goerigk, A guide to benchmarking enzymatically catalysed reactions: the importance of accurate reference energies and the chemical environment, Theor. Chem. Acc., 2021, 140, 1–15 Search PubMed.
J. C. Kromann, A. S. Christensen, Q. Cui and J. H. Jensen, Towards a barrier height benchmark set for biologically relevant systems, PeerJ, 2016, 4, e1994 Search PubMed.
L. Di Paola, M. De Ruvo, P. Paci, D. Santoni and A. Giuliani, A. Protein contact networks: an emerging paradigm in chemistry, Chem. Rev., 2013, 113, 1598–1613 CrossRef CAS PubMed.
N. T. Doncheva, K. Klein, F. S. Domingues and M. Albrecht, Analyzing and visualizing residue networks of protein structures, Trends Biochem. Sci., 2011, 36, 179–182 CrossRef CAS PubMed.
Z. Deng, C. Chuaqui and J. Singh, Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions, J. Med. Chem., 2004, 47, 337–344 CrossRef CAS PubMed.
K. Szalewicz, Symmetry-adapted perturbation theory of intermolecular forces, Wiley Interdiscip. Rev.: Comput. Mol. Sci., 2012, 2, 254–272 CAS.
T. M. Parker, L. A. Burns, R. M. Parrish, A. G. Ryno and C. D. Sherrill, Levels of symmetry adapted perturbation theory (SAPT). I. Efficiency and performance for interaction energies, J. Chem. Phys., 2014, 140, 094106 CrossRef PubMed.
R. M. Parrish, T. M. Parker and C. D. Sherrill, Chemical assignment of symmetry-adapted perturbation theory interaction energy components: the functional-group SAPT partition, J. Chem. Theory Comput., 2014, 10, 4417–4431 CrossRef CAS PubMed.
R. M. Parrish, K. C. Thompson and T. J. Martinez, Large-scale functional group symmetry-adapted perturbation theory on graphical processing units, J. Chem. Theory Comput., 2018, 14, 1737–1753 CrossRef CAS PubMed.
N. Heilmann, M. Wolf, M. Kozlowska, E. Sedghamiz, J. Setzler, M. Brieg and W. Wenzel, Sampling of the conformational landscape of small proteins with Monte Carlo methods, Sci. Rep., 2020, 10, 1–13 CrossRef PubMed.
J. M. Word, S. C. Lovell, J. S. Richardson and D. C. Richardson, Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation, J. Mol. Biol., 1999, 285, 1735–1747 CrossRef CAS PubMed.
J. M. Word, S. C. Lovell, T. H. LaBean, H. C. Taylor, M. E. Zalis, B. K. Presley, J. S. Richardson and D. C. Richardson, Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms, J. Mol. Biol., 1999, 285, 1711–1733 CrossRef CAS PubMed.
H. C. Jubb, A. P. Higueruelo, B. Ochoa-Montaño, W. R. Pitt, D. B. Ascher and T. L. Blundell, Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures, J. Mol. Biol., 2017, 429, 365–371 CrossRef CAS PubMed.
Q. Cheng, N. J. DeYonker, T. J. Summers, D. A. Agbaglo, T. Suhagia, T. J. Santaloci and M. A. Palma, GitHub - natedey/RINRUS: Residue Interaction Network ResidUe Selector (RINRUS) public release. https://github.com/natedey/RINRUS (accessed 2024-04-01). 2024.
A. Schreyer and T. Blundell, CREDO: a protein–ligand interaction database for drug discovery, Chem. Biol. Drug Des., 2009, 73, 157–167 CrossRef CAS PubMed.
W. J. Hehre, R. Ditchfield and J. A. Pople, Self–consistent molecular orbital methods. XII. Further extensions of Gaussian–type basis sets for use in molecular orbital studies of organic molecules, J. Chem. Phys., 1972, 56, 2257–2261 CrossRef CAS.
A. J. Misquitta, R. Podeszwa, B. Jeziorski and K. Szalewicz, Intermolecular potentials based on symmetry-adapted perturbation theory with dispersion energies from time dependent density-functional calculations, J. Chem. Phys., 2005, 123, 214103 CrossRef PubMed.
R. Bukowski, W. Cencek, P. Jankowski, M. Jeziorska, B. Jeziorski, S. Kucharski, V. Lotrich, A. Misquitta, R. Moszynski and K. Patkowski, et al., SAPT2008: An abinitio program for many-body symmetry-adapted perturbation theory calculations of intermolecular interaction energies, University of Delaware and University of Warsaw, 2008 Search PubMed.
S. A. Spronk, Z. L. Glick, D. P. Metcalf, C. D. Sherrill and D. L. Chemey, A quantum chemical interaction energy dataset for accurately modeling protein-ligand interactions, Sci. Data, 2023, 10, 619 CrossRef CAS PubMed.
R. M. Parrish, L. A. Burns, D. G. Smith, A. C. Simmonett, A. E. DePrince III, E. G. Hohenstein, U. Bozkaya, A. Y. Sokolov, R. Di Remigio and R. M. Richard, et al., Psi4 1.1: An open-source electronic structure program emphasizing automation, advanced libraries, and interoperability, J. Chem. Theory Comput., 2017, 13, 3185–3197 CrossRef CAS PubMed.
L. A. Burns, J. C. Faver, Z. Zheng, M. S. Marshall, D. G. Smith, K. Vanommeslaeghe, A. D. MacKerell Jr, K. M. Merz Jr and C. D. Sherrill, The BioFragment Database (BFDb): An open-data platform for computational chemistry analysis of noncovalent interactions, J. Chem. Phys, 2017, 147, 161727 CrossRef PubMed.
T. J. Summers, B. P. Daniel, Q. Cheng and N. J. DeYonker, Quantifying Inter-Residue Contacts through Interaction Energies, J. Chem. Inf. Model., 2019, 59, 5034–5044 CrossRef CAS PubMed.
T. J. Summers, R. Hemmati, J. E. Miller, D. A. Agbaglo, Q. Cheng and N. J. DeYonker, Evaluating the Active Site-Substrate Interplay Between X-ray Crystal Structure and Molecular Dynamics in Chorismate Mutase, J. Chem. Phys., 2023, 158, 065101 CrossRef CAS PubMed.
M. Frisch, G. Trucks, H. Schlegel, G. Scuseria, M. Robb, J. Cheeseman, G. Scalmani, V. Barone, G. Petersson and H. Nakatsuji, et al., Gaussian 16, 2016 Search PubMed.
A. D. Beck, Density-functional thermochemistry. III. The role of exact exchange, J. Chem. Phys., 1993, 98, 5648 CrossRef.
C. Lee, W. Yang and R. G. Parr, Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density, Phys. Rev. B, 1988, 37, 785 CrossRef CAS PubMed.
G. Petersson and M. A. Al-Laham, A complete basis set model chemistry. II. Open-shell systems and the total energies of the first-row atoms, J. Chem. Phys, 1991, 94, 6081–6090 CrossRef CAS.
S. Grimme, S. Ehrlich and L. Goerigk, Effect of the damping function in dispersion corrected density functional theory, J. Comput. Chem., 2011, 32, 1456–1465 CrossRef CAS PubMed.
V. Barone and M. Cossi, Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model, J. Phys. Chem. A, 1998, 102, 1995–2001 CrossRef CAS.
M. Cossi, N. Rega, G. Scalmani and V. Barone, Energies, structures, and electronic properties of molecules in solution with the C-PCM solvation model, J. Comput. Chem., 2003, 24, 669–681 CrossRef CAS PubMed.
S. E. Worthington, A. E. Roitberg and M. Krauss, An MD/QM study of the chorismate mutase-catalyzed Claisen rearrangement reaction, J. Phys. Chem. B, 2001, 105, 7087–7095 CrossRef CAS.
J. C. Gordon, J. B. Myers, T. Folta, V. Shoja, L. S. Heath and A. Onufriev, H++: a server for estimating p K as and adding missing hydrogens to macromolecules, Nucleic Acids Res., 2005, 33, W368–W371 CrossRef CAS PubMed.
D. Case, D. Cerutti, T. Cheatham, T. Darden, R. Duke, T. Giese, H. Gohlke, A. Goetz, D. Greene and N. Homeyer, et al., Amber18, University of San Francisco, 2017 Search PubMed.
J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, Development and testing of a general amber force field, J. Comput. Chem., 2004, 25, 1157–1174 CrossRef CAS PubMed.
W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, Comparision of simple potential functions for simulating liquid water, J. Chem. Phys., 1983, 79, 926–935 CrossRef CAS.
I. S. Joung and T. E. Cheatham III, Determination of alkali and halide monovalention parameters for use in explicitly solvated biomolecular simulations, J. Phys. Chem. B, 2008, 112, 9020–9041 CrossRef CAS PubMed.
H. C. Andersen, Rattle: A “velocity” version of the shake algorithm for molecular dynamics calculations, J. Comput. Phys., 1983, 52, 24–34 CrossRef CAS.
D. R. Roe and T. E. Cheatham III, PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data, J. Chem. Theory Comput., 2013, 9, 3084–3095 CrossRef CAS PubMed.
T. Ishida, robing protein environment in an enzymatic process: All-electron quantum chemical analysis combined with ab initio quantum mechanical/molecular mechanical modeling of chorismate mutase, J. Chem. Phys., 2008, 129, 09B618 CrossRef PubMed.
T. Ishida, Effects of point mutation on enzymatic activity: correlation between protein electronic structure and motion in chorismate mutase reaction, J. Am. Chem. Soc., 2010, 132, 7104–7118 CrossRef CAS PubMed.
Q. Cheng and N. J. DeYonker, The Glycine N-Methyltransferase Case Study: Another Challenge for QM-Cluster Models?, J. Phys. Chem. B, 2023, 127, 9282–9294 CrossRef CAS PubMed.
P. Kast, M. Asif-Ullah and D. Hilvert, Is chorismate mutase a prototypic entropy trap?- Activation parameters for the Bacillus subtilis enzyme, Tetrahedron Lett., 1996, 37, 2691–2694 CrossRef CAS.
D. A. Wappett and L. Goerigk, Toward a quantum-chemical benchmark set for enzymatically catalyzed reactions: important steps and insights, J. Phys. Chem. A, 2019, 123, 7057–7074 CrossRef CAS PubMed.
L. Goerigk and S. Grimme, A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions, Phys. Chem. Chem. Phys., 2011, 13, 6670–6688 RSC.
P. Jurecka, J. Sponer, J. Cerny and P. Hobza, Benchmark database of accurate (MP2 and CCSD (T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs, Phys. Chem. Chem. Phys., 2006, 8, 1985–1993 RSC.
J. Antony and S. Grimme, Density functional theory including dispersion corrections for intermolecular interactions in a large benchmark set of biologically relevant molecules, Phys. Chem. Chem. Phys., 2006, 8, 5287–5293 RSC.
M. K. Kesharwani, A. Karton and J. M. L. Martin, Benchmark ab initio conformational energies for the proteinogenic amino acids through explicitly correlated methods. Assessment of density functional methods, J. Chem. Theory Comput., 2016, 12, 444–454 CrossRef CAS PubMed.
P. Paiva, M. J. Ramos and P. A. Fernandes, Assessing the validity of DLPNO-CCSD (T) in the calculation of activation and reaction energies of ubiquitous enzymatic reactions, J. Comput. Chem., 2020, 41, 2459–2468 CrossRef CAS PubMed.
A. J. Ribeiro, D. Santos-Martins, N. Russo, M. J. Ramos and P. A. Fernandes, Enzymatic flexibility and reaction rate: a QM/MM study of HIV-1 protease, ACS Catal., 2015, 5, 5617–5626 CrossRef CAS.
U. Ryde, How many conformations need to be sampled to obtain converged QM/MM energies? The curse of exponential averaging, J. Chem. Theory Comput., 2017, 13, 5745–5752 CrossRef CAS PubMed.
H. Dokainish and J. Gauld, Computational Approach Choice in Modeling Flexible Enzyme Active Sites, ChemRxiv, 2019, preprint DOI:10.26434/chemrxiv.10316177.v1.
Z. Mei, J. D. Treado, A. T. Grigas, Z. A. Levine, L. Regan and C. S. O'hern, Analyses of protein cores reveal fundamental differences between solution and crystal structures, Proteins, 2020, 88, 1154–1161 CrossRef CAS PubMed.

Footnotes

† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp06100k

‡ Present address: Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

§ It is important to note that our group employs the “freeze code” scheme in Gaussian16, in which all Hessian elements are zero when two frozen Cartesian coordinates are involved. The phenomenon in which several small magnitude imaginary vibrational frequencies appear in thermochemical analysis does not occur in our treatment of the Hessian matrix.

Click here to see how this site uses Cookies. View our privacy policy here.