Who's on base? Revealing the catalytic mechanism of inverting family 6 glycoside hydrolases

Unbiased simulations reveal a water wire enabling and rescuing the catalytic base of an inverting glycoside hydrolase.

simulations utilized a 1 fs timestep while MM-only simulations utilized a 2 fs timestep. The CHAMBER13 13 program was used to convert the CHARMM protein structure files, coordinate files, and force field files to Amber topology file and coordinate files.

WT Hydrolysis
As discussed in the manuscript Introduction, we used AS to collect an ensemble of transition states for glycosidic cleavage by hydrolysis. An initial trajectory connecting a reactant basin (A) to a product basin (B) is required to begin transition path sampling. The Hamiltonian used to generate the initial path can be biased, provided that equilibration runs are completed before data is collected. [14][15][16] In this section, we describe how we modeled the QM region, minimized and equilibrated the system, used targeted MD to generate candidate conformations between the reactant and product basins, performed AS to collect an ensemble of transition states, used that data to chose a reaction coordinate based on Likelihood Maximization, and used the reaction coordinate to guide Equilibrium Path Sampling (EPS) to chart the PMF for the reaction. Details on the procedure for each of these steps are provided below. Orange arrows indicate which bonds will be formed and blue lines indicate which bonds will be cleaved. The dashed black lines reflect hydrogen bonding between the D401 backbone, nucleophilic water, and S181 side chain. This scheme further reflects that the −1 glucopyranoside ring is puckered to the 2,5 B conformation in the reactant conformation due to steric interactions with Y169.
charge density functional tight-binding (SCC-DFTB) method with second-order terms in the charge density fluctuations, 19,20 which is referred to as SCC-DFTB2 to distinguish it from the implementation with third-order corrections (SCC-DFTB3). 21 The SCC-DFTB methods are approximate methods based on DFT calculations, in contrast with traditional semi-empirical methods such as AM1 and PM6, which are based on Hartree-Fock theory using empirical data from a particular chemical environment. 22 SCC-DFTB has become a popular choice for modeling biomolecules, as it provides the computational efficiency required to capture the time and length scales necessary to investigate biologically important phenomena, while providing greater accuracy than the traditional semi-empirical methods. Improvements include a more reliable description of weak-binding forces, reaction energies, geometries, and frequencies, benchmarked against biomolecules such as peptides and DNA fragments, often used in combination with a CHARMM force field. 19,[22][23][24][25][26] SCC-DFTB has known limitations, particularly evident in describing excess protons in bulk water, such as overestimating the number of solvation shell water molecules coordinating with excess protons. 27,28 The QM region in our system includes only the active site, and thus our results are not expected to be tainted by the shortcomings of modeling bulk water. Interestingly, the SCC-DFTB2 method implemented in Amber has yielded proton transfer barriers similar to Car-Parrinello molecular dynamics simulations using the BLYP functional, 29 providing further confidence that our methodology allows meaningful mechanistic analysis for proton transfer mechanisms in the enzyme active site. 30 These studies indicate that SCC-DFTB2 is well suited for the purpose of our current work: to allow extensive sampling of a highly multidimensional energy landscape that can uncover a complex reaction path that includes proton transfer events. Our primary goal is to determine the mechanism; precisely calculating final energetics is a secondary concern. Computational determination of the TrCel7A hydrolysis rate with the same procedures used 31 show remarkable agreement with experimental rates recently published, 32 providing confidence in our choice for computational methods in this work.

Minimization and Equilibration
We found that even in short (100 ps) simulations, water molecules observed in the active site (see Fig.  2 in the manuscript) exchanged with other nearby water molecules also identified within the experimental crystal structure. Keeping the identity of these two water molecules consistent across many simulations would provide simplicity without biasing the results. Thus, their positions were restrained during these equilibration and minimization steps, each of which consisted of 2500 cycles. First, only the added water molecules and ions were unrestrained during minimization with MM only, with the rest of the system held fixed by Cartesian restraints with a high force constant (10,000 kcal/mol-Å 2 ). Then, all but the ligand and noted active site waters were unrestrained during an MM-only minimization. A final MM-only minimization was performed with the entire system unrestrained. This was followed by a 100 ps MM-only equilibration run. To prevent the active site's water molecules from exchanging with nearby waters, restraints were enforced on the distances shown in Scheme S2, in the shape of a flat-bottomedwell with a force constant of 25 kcal/mol-Å 2 in the parabolic region. The QM region was then defined and a QM/MM minimization of 2500 cycles was completed without any restraints. A 100 ps QM/MM equilibration step was then completed using the settings and constraints as used during the MM equilibration. Longer equilibration simulations were not run to prevent moving far from the Michaelis complex captured in the crystal structure.

Generating TS Guesses for AS
The one-step inverting pathway requires multiple bond rearrangements. The hypothesis that D175 acts as the catalytic base mediated by a bridging water requires four new bonds to be formed (orange arrows in Fig. S1) and four bonds to be cleaved (light blue bonds in Fig. S1). To look for conformations near a transition state, we generated a series of guesses by using targeted MD to set intermediate distances for these eight bond rearrangements. Forty-eight guesses were generated by sampling from the C-O bond distances of 2. changing the eight distances from the initial to target distance in 20 steps, adjusting the distance linearly across the steps using a flat-bottomed potential with a force constant of 450 kcal/mol-Å 2 in the parabolic region. Each step was run for 2 ps.
To determine whether any of these guesses would produce a reactive trajectory, the guesses were used as seeds for AS runs of 50 trajectories each. Per the AS procedure, each trajectory began by generating random velocities from the Boltzmann distribution. An unbiased trajectory in this "forward" direction was first followed for a δ t of 0.025 ps (25 MD steps) to save the coordinates for potential use in the next trajectory, and then followed for another 750 steps to obtain an endpoint geometry for analysis. Additionally, the generated velocities were reversed so the trajectory could also be followed in the "backward" direction for 750 steps to obtain the other endpoint. The following distances were used to determine whether the endpoint in each trajectory visited the reactant or product basin ( Fig. S3): D221 O δ 2 (carboxylic oxygen) to D221 H δ 2 (acidic hydrogen); D221 H δ 2 (acidic hydrogen) to −1/ + 1 glycosidic oxygen; −1/ + 1 glycosidic oxygen to −1 C1; and −1 anomeric carbon to nucleophilic water O. If one of the two endpoints visited the reactant basin and the other visited the product basin, the trajectory was identified as reactive or "accepted". We also tracked whether both endpoints visited the reactant basin or both visited the product basin. If the distance criteria did not align with one of those three options, the trajectory was determined to be inconclusive.
Of the 48 TS guesses serving as seeds for AS runs, 37 yielded at least one reactive trajectory of the 50 trajectories run. Seven AS runs had an acceptance rate greater than 30 %, which is reasonable for this Monte Carlo type algorithm. 33 Several accepted trajectories were checked in detail to verify the distance requirements used to classify visiting the reactant or product basin. We

A.
B.

Fig. S3
A. The end of a forward or backward AS trajectory is identified as visiting the reactant basin if the distance between the acidic hydrogen and the D221 carboxylic acid is less than 1.15 Å and the glycosidic bond remained intact (defined by a distance of less than 1.7 Å). B. The endpoint of an AS trajectory is identified as visiting the product basin if the distance between the acidic hydrogen from D221 and the +1 glucopyranoside O4 is less than 1.15 Å and the nucleophilic water oxygen is less than 1.7 Å from the −1 glucopyranoside C1.
found that the endpoints reflected all eight bond rearrangements indicated in Fig. S1. Of the rejected trajectories, we found that some seed endpoints more often visited the reactant basin while other seed endpoints more often visited the product basin. This provided confidence that the range of C-O and O-H distances used to generate TS guesses spanned a range of potentially early and late transition states.

Aimless Shooting and Likelihood Maximization
We used endpoints of the AS runs to test our TS guesses as seeds for production AS production runs to explore the ensemble of TSs. We selected the five endpoints from the five runs with the highest acceptance ratios. Each endpoint was used to seed ten AS production runs, for a total of 50 AS production runs. Each production run consisted of a series of 2500 trajectories. Unlike enhanced sampling methods such as umbrella sampling 34 and metadynamics, 35 the forces on the system are not perturbed to increase sampling of high-energy regions of the potential surface. Thus, these trajectories are not biased according to a priori assumptions about which degrees of freedom (collective variables) best define the transition state. 14,36 As in the AS runs used to test the TS guesses, a δ t of 0.025 ps was used to obtain potential alternate starting points. The forward and backward trajectories were run for 750 steps each to obtain endpoints to determine whether they visited the reactant or product basins, using the same distance criteria previously described. Data from 32 of the 50 AS production runs were analyzed; the other runs were stopped due to the aimless exploration of the transition state ensemble leading the runs into regions when QM convergence was slow and/or acceptance rates became low (less than 6 %). Furthermore, the data from the first 500 trajectories in each AS run were not used to allow equilibra-tion/decorrelation from the biased, targeted MD runs used to create the initial TS guesses. Thus, 64,000 AS shooting points were used for the next step, Likelihood Maximization. This method makes use of the data collected during the AS simulations. Saved geometries of the shooting points are used to calculate the values of a postulated set of order parameters (OPs). The values of the OPs at each point are tested to determine if there is correlation with changes in those variables and whether the trajectory generated from that point was reactive, both endpoints went to the reactant basin, or both to the product basin. Thus, the OPs are searched for candidate reaction coordinates.
We chose a list of candidate OPs based on analysis of selected reactive trajectories to identify substrate and enzyme conformational changes along the trajectory. We additionally gathered proposed order parameters suggested in previous research on TrCel6A. 18 The OPs can take a variety of forms. The set of candidate OPs includes distances between atom centers, differences of distances, angles, dihedral angles, and Cremer-Pople ring puckering coordinates. The full set of tested OPs are listed in Table S1. The Likelihood Maximization results from the 32,000 AS points (32 runs using points from trajectories 501 to 1500) are reported. Following forward and backward trajectories from these points resulted in 6,910 accepted (reactive) trajectories; 17,121 trajectories with both endpoints in the A (reactant) basin; 7,071 trajectories with both end points in the B (product) basin; and 898 inconclusive trajectories.

Committor Analysis
The committor probability, p B (x), denotes the fraction of trajectories initiated from a configuration x with momenta randomly chosen from the Boltzmann distribution that will visit (commit to) the product basin B. Transition states are identified by p B (x) = 0.5. The Likelihood Maximization calculates optimal parameters to center p B (x) = 0.5 on an RC value of 0.5. To test if the putative RCs identified by Likelihood Maximization accurately describe the reaction coordinate, we generated a series of configurations with a calculated RC value near zero (abs(RC(x)) <0.01). We looked for AS points likely to be good TS candidates as defined by A) abs(RC(x)) <0.1, and B) a minimum 50 % acceptance rate for the ten trajectories run before and after the point (x). From these 30 points, we initiated 30 EPS runs to collect 91 decorrelated configurations with abs(RC(x)) <0.01, using six nodes per string, three MD steps separating each node, and a window of abs(RC(x)) <0.03. The first and then every twentieth harvested point within abs(RC(x)) <0.01 (for a total of 300 shooters) were then used to shoot 20 forward trajectories, each drawing fresh momenta from the Boltzmann distribution. The resulting data on which basin the endpoint committed to was used to create p B histograms.

Equilibrium Path Sampling
We performed five EPS runs to gather data for our hydrolysis PMF. To generate seeds for these separate runs, we wished to sample from different regions of the transition state ensemble. Thus, we selected one AS shooting point from five different runs which had acceptance rates of at least 20 %. In pursuit of highly decorrelated points, we chose points from no earlier than the 2000 of the 2500 trajectories from each run, choosing an accepted point from within a region with a high acceptance rate. From each of these five points, we collected accepted trajectories saving the configuration of each step for 750 steps in the "forward" direction and also reversed trajectories for the "backward" direction, calculating the values of the OPs and the RC from Likelihood Maximization at every step. Based on the range of the RC values for these runs, we chose to set up 25 EPS windows ranging from -8 to 6, with each window sharing 10 % overlap on each side. 10,000 strings were run in each window, with each string containing six nodes separated by three MD timesteps each (3 fs). The relative free energy in each window is calculated from the equilibrium probability distributions of visiting different regions of the RC within the window, separated into 0.06 Å bins. A PMF for the full RC range is obtained by matching free energies in the overlapping portions of each window. 37,38

Determination of −1 Glucopyranoside Puckering Conformations
The transition state conformations were harvested from the AS points that led to accepted trajectories. As described in the methods for Likelihood maximization, these points came from 32 independent runs, analyzing data from the trajectories 501 to 1500. The conformations for reactant or product were obtained by running unconstrained QM/MM simulations to harvest 2000 points separated by 0.5 ps each, gathered from five independent simulations started from the endpoints of the trajectories chosen to seed the EPS runs, after first equilibrating for 1 ps. This short equilibration run was determined to be sufficient to ensure that the sampled conformations were in the hydrolysis reactant or product basins.

Mutant Hydrolysis Simulations
For one set of TrCel6A D175A hydrolysis simulations, the QM region for the mutant was identical to the region used for the WT except for the omission of residue 175, leaving the overall charge of the QM region neutral. The results of these simulations are discussed only here in the ESI. The minimization and equilibration steps were identical, except only three restraints were used to keep the active site waters from exchanging with nearby waters, omitting the fourth restraint involving residue 175, shown in Fig. S2. Initial TS guesses were generated by using the same targeted MD strategy, with six target distances set instead of eight, S1-S18 | S4 dihedral S181 C α to S181 C β to S181 O γ to S181 H γ 24 dihedral S181 C β to S181 O γ to S181 H   as two of the original eight involved residue 175. Distances for 25 initial guesses were chosen to match distances used in the WT TS guesses with the highest resulting acceptance rates. Another 26 guesses were generated to match distances from accepted WT AS shooting points near the ends of AS runs with high acceptance percentages. The 51 TS guesses for the TrCel6A D175A mutant were used as seeds for AS runs of 50 trajectories each. A second set of TrCel6A D175A hydrolysis simulations was completed including a third water molecule in the QM region. The results of these simulations are described in the main text. As with the other two water molecules in the QM region, the third water molecule was observed in the 1QJW crystal structure. The molecule chosen for the third water had a distance of 3.4 Å between its oxygen and that of the second, "bridge" water oxygen. During minimization and equilibration, a weak forth restraint was included for the third water molecule with a wide flat-bottomed potential to allow it to stay in its original position identified in the crystal structure or move closer into the active site, with no bias penalty for either option. TS guesses were generated based on distances from accepted WT AS shooting points near the ends of AS runs with high acceptance percentages.
Production AS runs, Likelihood Maximization, committor analysis, and EPS simulations were not completed for the mutant enzyme.
Unrestrained MM-only simulations were run in NAMD 39 starting from the minimized and equilibrated mutant structure. These were conducted in the NPT at 300 K and 1.0 atm with the SHAKE algorithm used to fix all hydrogen distances. 40 A nonbonded cutoff distance of 10 Å, with a switching distance of 9 Å, and a nonbonded pair list distance of 13 Å was used. The Particle Mesh Ewald method was used to describe the long-range electrostatics 9 with a sixth order b-spline, a Gaussian distribution with a width of 0.312 Å, and 1 Å grid spacing. The velocity Verlet multiple time-stepping integration scheme was used evaluating the full nonbonded interactions every 2 timesteps, the full electrostatics interactions every 4 timesteps, and 20 timesteps between atom reassignments.

Processivity Simulations
TrCel6A processivity was investigated using umbrella sampling (US) 34 and employed the weighted histogram analysis method (WHAM) 41,42 to produce the PMF. As previously noted, initial configurations for the "slide" and "pre-slide" (Figure 3 in the main text) configurations were initially build, minimized, and heated using CHARMM. Unrestrained equilibration runs for each configuration were in NAMD 39 for over 50 ns, with the same parameters noted for the NAMD simulations for Mutant Hydrolysis Simulations. The endpoint structures from these runs were aligned using VMD's RMSD trajectory tool 43 for use as starting points in the processivity simulations (later, an additional 250 ns of simu-lation were completed for the slide conformation to observe stability of the active site loop). The "slide" configuration, in which the product sites are occupied, was used as the reference state for umbrella sampling simulations performed with Amber 12's "targeted MD" utility. As in a the recent work on TrCel7A, 44 the coordinate for umbrella sampling path was the RSMD between the reference state and active simulation non-hydrogen atoms of the leading cellobiose unit. The initial RMSD between the "slide" and "pre-slide" configurations was 10.5 Å. We defined windows for umbrella sampling separated by 0.25 Å, ranging from 0 to 12.5 Å. To create initial configurations for each window, the leading cellobiose unit was pulled to the desired RMSD over 5000 steps using a force constant of 30.0 kcal/mol-Å 2 to the desired RMSD value. To prevent the protein moving along with the substrate, it was initially constrained with a larger harmonic force constant of 200 kcal/mol-Å 2 . Umbrella sampling production runs were simulated for 40 ns with a harmonic force constant of 5.0 kcal/mol-Å 2 on the leading cellobiose non-hydrogen atoms, discarding the data from the first 2 ns to allow for equilibration. The protein was left mostly unconstrained to allow natural dynamic responses to the substrate; to prevent the whole protein from moving to allow the substrate to occupy the low-energy wells at the "slide" and "pre-slide" positions, a 20.0 kcal/mol-Å 2 restraint was placed on seven backbone C α atoms of residues S106, A150, D200, N247, A280, I330, and Q437. These residues were chosen based on a previous study that demonstrated that they show a low root mean square fluctuation 45 and because they were located at a variety of locations on the enzyme but not near the active site tunnel.
Separate PMFs for procession were calculated with a) the second glycosyl unit in the 4 C 1 conformation, and b) the second glycosyl unit in the 2 S O /B O,3 conformation. Data from windows centered at 0.00 through 9.50Å were used for the PMF for the 4 C 1 conformation, while data from windows 9.75 through 12.25Å were used to create the PMF fro the 2 S O /B O,3 conformation, with the PMFs aligned at 9.625Å as indicated by the red line on the composite PMF. Histograms of processivity CV values for each window were checked for overlap ( Figure S4) and sampling in all regions of the CV before calculating the PMFs. The WHAM bin spacing was 0.05 Å and used a tolerance of 1x10 −9 , with both parameters chosen based on convergence of the PMF. Data from the last 38 ns of the 40 ns simulations in each window were divided into four blocks. The reported PMF reflects the average and standard error from calculating the PMF on each block.

WT Hydrolysis
To better show the range of puckering conformations sampled by the reactant, TS, and product, Fig. S5 shows the same data as in Fig. 7 in the main text, with each state shown on a separate graph so no data is obscured. S1-S18 | S7  clipRMSD0_50_last38n  clipRMSD0_75_last38n  clipRMSD1_00_last38n  clipRMSD1_25_last38n  clipRMSD1_50_last38n  clipRMSD1_75_last38n  clipRMSD2_00_last38n  clipRMSD2_25_last38n  clipRMSD2_50_last38n  clipRMSD2_75_last38n  clipRMSD3_00_last38n  clipRMSD3_25_last38n  clipRMSD3_50_last38n  clipRMSD3_75_last38n  clipRMSD4_00_last38n  clipRMSD4_25_last38n  clipRMSD4_50_last38n  clipRMSD4_75_last38n  clipRMSD5_00_last38n  clipRMSD5_25_last38n  clipRMSD5_50_last38n  clipRMSD5_75_last38n  clipRMSD6_00_last38n  clipRMSD6_25_last38n  clipRMSD6_50_last38n  clipRMSD6_75_last38n  clipRMSD7_00_last38n  clipRMSD7_25_last38n  clipRMSD7_50_last38n  clipRMSD7_75_last38n  clipRMSD8_00_last38n  clipRMSD8_25_last38n  clipRMSD8_50_last38n  clipRMSD8_75_last38n  clipRMSD9_00_last38n  clipRMSD9_25_last38n  clipRMSD9_50_last38n A.   Fig. 7 of the main text but separated into panels for clarity, a top-down view of the northern-hemisphere of the Cremer-Pople sphere 46 designating the −1 glucopyranoside puckering coordinates for the reactant (red), transition state (TS; green), and product (blue) for the elementary reaction shown in Fig. 4 in the main text. Markers designate the puckering coordinates for the −1 glucopyranoside of the substrate mimic methyl 4-S-β -D-cellobiosyl-4-thio-β -D-cellobioside co-crystallized with Tr Cel6A wild-type (PDB ID: 1QK2; orange diamond); 47 Tr Cel6A Y169F mutant (PDB ID: 1QJW; dark green triangle); 47 and Humicola insolens Cel6A D416A mutant (PDB ID: 1GZ1; yellow circle). 48 The two markers for PDB IDs 1QK2 and 1QJW represent the two chains present in the deposited structures. The blue line shows an approximate, smoothed ring-puckering path sampled during fluctuations of the product ring conformation. S1-S18 | S8

Likelihood Maximization
The best-scoring OPs are list in Table S2 and indicate the importance of the distances nucleophilic and bridge water molecules. Interestingly, eight combinations of OPs for the three-parameter RC shared the same likelihood score. Note that these sets of OPs are not independent, as OPs 12 and 13 are combinations of OPs 4, 5, 6, and 7. The RC values are calculated as linear combinations of the normalized parameter (q k ) as RC(q) = α 0 + ∑ M k=1 α k q k . In the case of the RC consisting of OP6 and OP12, the associated parameter values are: α 0 = −6.392 Å −1 , α 1 = 2.937 Å −1 , and α 2 = 11.773 Å −1 . Normalization of the OPs was calculated based on the OP6 range of 0.915 to 3.600 Å, and OP12 range of -2.096 to 0.535 Å. The histogram resulting from the committor analysis test is shown in Fig. S6.  Fig. S6 is used to quantify the accuracy of the reaction coordinate. 49 Optimally, the histogram would be sharply peaked at p B =0.5, corresponding to a tight distribution around the characteristic isocommittor value for transition states. 50 Sampling errors and the complexity of the reaction coordinate contribute to deviations from ideal results. 16,31 We calculated the transmission coefficient to correct the rate calculated coefficient from this putative reaction coordinate.

D175A Mutant Hydrolysis
As previously described, two simulation systems were used to investigate hydrolysis reactions in the TrCel6A D175A mutant. The system for which an additional water molecule was included in the QM region (additional compared to the WT QM region) was described in the main text. Here, we present results for the case in which the QM region was the same as for the WT except for the exclusion of the mutated residue 175. As described in the main text for the simulations with an additional water, guesses for TS conformations were generated with biased MD simulations based on sets of distances for OP1-6 harvested from several wellequilibrated (more than 2,000 trajectories completed), accepted WT AS points for use in targeted MD to create TS guesses for the D175A mutant cases. From these guesses, we were able to obtain multiple reactive trajectories for hydrolysis of the glycosidic bond. Fig. S7 shows the hydrolysis reaction observed for this simulation system: a proton was transferred from D221 to the glycosidic oxygen and the glycosidic bond between the substrate glycosyl units in the −1 and +1 positions was cleaved. The reaction path initially closely followed that in the WT; the glycosidic bond elongated, D221 transferred a proton to the glycosidic oxygen, a water molecule attacked the anomeric carbon at the −1 substrate position, and then transferred a proton to the second water molecule in the QM region. However, the excess proton was not able to transfer beyond the bridge water due to the boundaries of the QM region used in the simulation. Instead, it exchanged between the −1 glucopyranoside ring anomeric hydroxyl group and the bridge water, as shown in the right-hand panel of Fig. S7. As described in the main text, the simulations with an additional water molecule show that further proton transfer is possible if additional potential proton acceptors are included in the QM region. S1-S18 | S9 Modeling the one-step inverting hydrolysis for the Tr Cel6A D175A mutant with a QM region as in the WT case, excluding residue 175 and not including any other molecules or residues, yielded reactant (left) and TS (middle) conformations are similar to those for the WT case. However, without a low-energy proton acceptor in the QM region, the proton shuttled between the −1 glucopyranoside ring anomeric hydroxyl group and the "bridge" water in the "product" state (right). Fig. S8 provides a comparison between configurations near the end of the 40 ns US processivity simulations with reference structures built from crystal structures deposited in the PDB. For both the "pre-slide" and "slide" conformations, the reference WT protein structure came from PDB ID: 1QK2 47 with the cellohexaose substrate from PDB ID: 4AVO 51 structure, aligned to the substrate mimic co-crystalized with WT protein in the PDB code 1QK2. As described in the Computational Methods, the "pre-slide" substrate initial conformation was created by removing the monomers in the −2 and −1 substrate binding sites and adding monomers in the +5 and +6 positions with the same initial geometry as the +3 and +4 monomers. The simulation and reference crystal structure conformations were aligned in VMD based on all protein backbone atoms. The figure shows alignment of the substrate binding sites as well as expected differences in conformation resulted from the equilibration and simulation of the system, especially for the +5 and +6 substrate subunits in the "pre-slide" conformation. As noted, the conformation of these monomers was not obtained from the crystal structure and these subunits have greater flexibility as they extend beyond the active site tunnel. Fig. S9 shows the total interaction energies of individual key protein residues with the cellohexaose substrate as a function of the CV for processivity (the RMSD in Å between the configuration in the simulation and a reference configuration in the "pre-slide" conformation). The magnitude of the energetic interactions for the charged residues shown in panels A and B (note that D221 is protonated and therefore neutral) are primarily due to electrostatic interactions, while the magnitude of the energetic interactions for the aromatic groups in panel C are primarily van der Waals interactions. Fig. 3 in the main text shows the location of the hydrophobic residues in the active site tunnel, whose interactions with the substrate are shown in panel C. Fig. S10 shows the locations of the residues highlighted in Figure S9A and B. Fig.  S11 and Fig. S12 show interactions of individual substrate glycosyl groups with residues with the strongest interaction energies. S1-S18 | S11 A.  C.

B.
A.

Fig. S9
Total interaction energies between the entire substrate and individual residues labeled in each plot: (A) aspartic and glutamic acids in the active site tunnel are negatively charged, except for D221; (B) positively charged residues in the active site tunnel; and (C) hydrophobic residues in the active site tunnel. The x-axis represents the CV for procession through the active-site containing tunnel: the RMSD of the leading cellobiose ring atom coordinates to a reference conformation in the pre-slide conformation. The dotted gray line indicates the point along the CV corresponding to the slide conformation, at which point the substrate is in position in the active site for hydrolysis. The solid red line designates a puckering dividing line; to the left of the line, the second glycosyl unit is exclusively in the 4 C 1 conformation and to the right it is only in the 2 S O /B O,3 conformation. The data was smoothed as described in the Computational Methods in the main text. S1-S18 | S13  Fig. S11 Total interaction energies between the individual substrate glycosyl units (as labeled) and key charged residues. As in Fig. S9, the x-axis represents the CV for procession through the active-site containing tunnel, the dotted gray line indicates the CV value corresponding to the slide conformation, and the solid red line designates a puckering dividing line. The data was smoothed as described in the Computational Methods in the main text. S1-S18 | S15 Fig. S12 Total interaction energies between the individual substrate glycosyl units (as labeled) and key hydrophobic residues. As in Fig. S9, the x-axis represents the CV for procession through the active-site containing tunnel, the dotted gray line indicates the CV value corresponding to the slide conformation, and the solid red line designates a puckering dividing line. The data was smoothed as described in the Computational Methods in the main text.