Formation and structure of the ferryl [ Fe Q O ] intermediate in the non-haem iron halogenase SyrB 2 : classical and QM / MM modelling agree

To rationalise mechanistically the intriguing regioand chemoselectivity patterns for different substrates of the non-haem iron/2-oxoglutarate dependent halogenase SyrB2, it is crucial to elucidate the structure of the pivotal [FeQO] intermediate. We have approached the problem by a combination of classical and QM/MM modelling. We present complete atomistic models of SyrB2 in complex with its native substrate L-threonine as well as L-a-amino butyric acid and L-norvaline (all conjugated to the pantetheine tether), constructed by molecular docking and extensive MD simulations. We evaluate five isomers of the [FeQO] intermediate in these simulations, with a view to identifying likely structures based on a simple ‘‘reaction distance’’ measure. Starting from models of the resting state, we then use QM/MM calculations to investigate the formation of the [FeQO] species for all three substrates, identifying the intermediates along the O2 activation/decarboxylation pathway on the S = 1, 2, and 3 potential-energy surfaces. We find that, despite differences in the detailed course of the reaction, essentially all pathways produce the same [FeQO] structure, in which the oxido is directed away from the substrate.

Partial charges for non-standard residues were obtained at the DFT level by fitting to the molecular electrostatic potential according to the Merz-Singh-Kollmann scheme 10 ; structures were fully optimised at the same level of theory unless indicated otherwise.For compatibility with the Amber ff03 force field 11 , charges were determined in a polarisable continuum solvent.For the singlet Fe(II) complexes, charges were taken from TPSS-D/def2-TZVP/COSMO( = 80) calculations run with TURBOMOLE [12][13][14] ; charges for the Fe(IV) complexes in the quintet state from B3LYP/def2-TZVP+/PCM( = 3.9) calculations in Gaussian 15 ; and charges for the substrates from B97-D/def2-TZVP/PCM(water) calculations with Gaussian.For ABA and NVA, a distance constraint was used to ensure that the pantetheine was in the same extended conformation as for THR.It was verified that the slightly different choices of exchange-correlation functional and value of had negligible effect on the fitted charges.
For residues cut by a QM-MM boundary, the charges were corrected such as to ensure that the total charge of the atoms that would become QM atoms summed up to an integer value; by construction, the total charge of the remaining MM atoms is then also an integer value.The corrections required to ensure charge integrity were small; the imbalance was distributed over all atoms of the residue, leading to very small (< 0.01e) changes of the original atomic charges.
1 Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics.This journal is © the Owner Societies 2017 1.3 MD simulations 1.3.1 Simulation parameters MD simulations were run under periodic boundary conditions in a rhombic-dodecahedral box with an image distance of d ≈ 7.88 nm, chosen to give a minimal simulation volume while maintaining a minimum distance of 1 nm to the solute.The simulation systems were neutralised by the addition of counterions (14 Na + for systems containing substrate).Electrostatic interactions were treated with the particle-mesh Ewald (PME) method (r C = 1.0 nm, fourth-order interpolation, tolerance 10 −5 , grid spacing 0.12 nm).Van der Waals interactions were cut off at r vdW = 1.0 nm.Energies and pressures were corrected for long-range dispersion effects.
The simulation stages and parameters are summarised in Table S2.A time step of 2 fs was used throughout.Overall equilibration was judged by convergence of the backbone RMSD according to a series of statistical tests 16,17 .The production runs were continued to about 20 ns or until conformational equilibration for the active site residues was established.Each trajectory spanned up to 40 ns in total.

Force-field parameters
For the residues containing atoms with non-standard charges, we list here the [atoms] blocks in GROMACS topology format.All atom types (and therefore van der Waals parameters) were standard atom types from either Amber ff03 or GAFF. A

QM/MM setup
For the purposes of the microiterative QM/MM optimisations 18 , the system is divided into an inner region, which is updated less frequently, and an outer region, which is optimised to convergence for every step taken in the inner region.The inner region comprised all residues with at least one QM atom (106 atoms for THR).Only atoms in the active region are allowed to move in optimisations; the active region included all residues within 8 Å of any atom in the inner region (1550 atoms for THR).The HDLC residues used for optimisations coincided with the standard amino-acid resides of the protein; non-standard components (substrate, active-site iron complex) were grouped into chemically meaningful HDLC residues (substrate, [FeCl(2OG)(O 2 )]).
The reaction coordinate was implemented by a harmonic restraint with a force constant of 3.0 E h a −2 0 .
substrate amino salt bridge was water-mediated, the two bridging water molecules were also included in the QM region for calculations involving this substrate.The inner and QM regions for THR are shown in Figure 5.4.For this system, the sizes of the QM region, inner region, active region and outer region are 62, 106, 1550 and 6862 atoms respectively.2 Supplementary Results In addition to the three substrate-A complexes, the holoprotein (A-H 2 O, containing the active-site iron complex with 2OG, Cl, and H 2 O ligands, but no substrate) and the apoprotein (i.e., the "bare" protein) were also simulated using the same protocol.In all cases, the equilibrated structures remained very close to the crystal structure of the holoprotein (PDB code 2FCT): the RMSDs (calculated for backbone atoms, averaged over the equilibrated section of the trajectory) were 1.24 Å (apoprotein), 1.15 Å (holoprotein), 1.95 Å (THR-A), 1.33 Å (ABA-A), 1.24 Å (NVA-A).

Docking
The conformation of the benzyl side-chain of the "gate-keeper" residue Phe196 is determined by the S5).The side-chain conformations for the three docked structures are nearly identical, with the phenyl ring being positioned such as to allow the substrate access to the active-site channel.In the MD simulations, we find different rotamers for the different systems, including cases where two rotamers are significantly populated (Figure S4).As the protein is fully flexible in the MD simulations, the channel and "gate" on the one hand and the bound substrate on the other are able to adjust their conformations to allow a best (induced) fit; for instance, the Phe196 side-chain adopts the "closed" conformation of the holoprotein in THR-A.The "gate" is thus more aptly characterised as a "curtain", which can flexibly wrap around, and yield to, an obstacle placed in the doorway.
Table S5: Side-chain torsions of the "gate-keeper" residue Phe196; values from docking refer to the top pose, MD values are converged averages.

Accessibility of the active site for small molecules
Over the course of the catalytic cycle, O 2 is consumed and CO 2 is generated.The crystal structure of the SyrB2 holoprotein features two channels connecting the active-site cavity to the surface: the main substrate channel (T 1 ) and an allosteric channel (T 2 ); see Figure S5.Two questions arise as to the accessibility of the active site for small molecules: (i) Do the channels identified in the crystal structure remain intact and open during the course of MD? (ii) Does the bound substrate plug the main channel, leaving only (T 2 ) as a route for small molecules to access the active-site cavity?
The MD trajectories for the holoprotein and the enzyme-substrate complexes were analysed using the program Caver. 19Caver searches the trajectory frame-by-frame for tunnels, then groups related tunnels from different frames into channels; a tunnel is represented as an "elastic hose", whose shape, length, and diameter vary over the course of the trajectory, reflecting the conformational dynamics of the protein.Each channel is assigned an importance (priority) value according to its length, width, and the duration for which it persists.This value is calculated from a cost function (Eq.1), which depends on overall length L and the varying radius along the tunnel, r(s); long, narrow tunnels have a high cost.The cost may be understood as a measure of the energy required to traverse the tunnel.The priority (Eq.2) is then calculated from the cost by Boltzmann-weighting and averaging over time (N is the number of frames).The priority value effectively measures the transport capacity of a channel.
The results of this analysis are shown in Table S6.No new channels opened during the MD simulations, but both T 1 and T 2 remained open in all simulations, even when the substrate was bound.

Hydrogen-bond coordination number of the oxido oxygen
In the Mössbauer spectrum of SyrB2 with its native THR substrate, two distinct [Fe=O] species are observed 20 .This was explained by Wong et al. 21by different numbers of hydrogen bonds to the oxido ligand, rather than by geometrical isomerism 22 .We therefore analysed the trajectories with the isomers D1 and D2 with respect to the hydrogen-bond coordination numbers of the oxido ligand (see Table S7).If we presume that the two species differ by one hydrogen bond, none of the ratios for THR (1 vs. 0 and 2 vs. 1 H-bonds, respectively) matches particularly well with the experimental ratio between the two [Fe=O] species of 4:1.For ABA, the experimental ratio is 7:1, which agrees nicely with the ratio of 8:1 for 0 vs.The pulsed-EPR (HYSCORE) experiments of Martinie et al. 23 provide distance and angle measurements between O p , Fe, and specific hydrogens of the substrate; see Figure S12.In the experiment, NO is used as a non-reactive, EPR-active structural mimick of O 2 , and substrates need to be deuterated at the position to be measured.Comparing to QM/MM-optimised structures, distances agree well, but angles do not (Table S12).

Figure 5 . 4 :
Figure 5.4: Left QM region and right: inner region of THR snapshot.Note the water mediated Glu102-threonyl ammonium salt bridge, included to allow proton transfer.
of the THR OH group

Figure 4 . 5 :
Figure 4.5: Hydrogen bonding partners of the THR -OH group.2OG O(1) and (2) refer to the two oxygen atoms of the oxoglutarate tail carboxyl group, Glu102 O(1) and (2) to the two oxygens of the Glu102 carboxyl group.The equilibrated time stretch begins at 0 ns.

Figure 4 .
Figure 4.6: Position of ABA (left) and THR (right) following MD equilibration.Note the hydrogen bond between THR's hydroxyl group and 2OG.Note the di↵erent states of the N C ↵ C C torsion.

Figure S2 :
Figure S2: Hydrogen-bonding partners of the THR-A OH group over the course of the MD trajectory.Time zero refers to the start of the equilibrated section of the trajectory.2OG O(1) and O(2) refer to the oxygen atoms of the 2OG tail carboxylate; Glu102 O(1) and O(2) to the carboxylate oxygens of Glu102.

Figure 4 . 7 :Figure 4 . 7 :Figure 4 . 8 :
Figure 4.7: 2OG hydrogen bonding network (from simulation with bound NVA, not shown)Figure S3: Representative snapshot illustrating the hydrogen-bonding network around the 2OG carboxylate tail as found in the equilibrated structures of ABA-A and NVA-A.

Figure S5 :
Figure S5: Channels in SyrB2 THR-A (T 1 shown in red, T 2 shown in green).

Figure 5 . 6 :
Figure 5.6: Oxygen adduct (C) for NVA (left) and THR and ABA (right) the same isomer, D1 (Figure 5.14).Scans on the other two spin surfaces are shown in Figures 5.10 and 5.11 for ABA and THR respectively.Energy profiles for the latter are shown in Figure 5.15.The two scans on the septet surface underwent a similar mechanism to those on the quintet surface, with the notable exception that the bicyclic intermediate G was not observed, and a short-lived intermediate J (Figure 5.9) was formed immediately after decarboxylation.J retains a charge on the CO2, corresponding to a CO2 anion coordinated to an Fe(III) centre.Following decarboxylation, species H and D were observed, with similar structures to their analogues on the quintet surface, the end product again being D1.The two scans on the triplet surface were very di↵erent from one another, as might be expected from the radically di↵erent electronic structure of 3 C formed in the presence of THR and ABA.The THR triplet scan underwent a mechanism very similar to that on the quintet surface, with G and H both observed prior to the formation of D1.For the ABA triplet scan, however, the mechanism was entirely di↵erent.No intermediate was detected between C and D, with the scan rising to over 151 kJ/mol before forming D2 (Figure5.13).This was the only scan to lead to a structure other than D1.The structure of D1 from the three scans on the quintet surface is shown in Figure5.14.Hydrogen bonds from the terminal hydrogens of Arg254 hold the ferryl and carboxyl

Figure S10 :
Figure S10: QM/MM-optimised structures of the O 2 complexes in the quintet state, 5 B. Selected bond lengths are given in Å.

Figure 5 . 5 :
Figure 5.5: Electronic structures of the oxygen adduct C

Figure S11 :
Figure S11: Schematic representation of the electronic structures of the O 2 complexes with S = 1, 2, 3.For THR and ABA (S = 3) and NVA (S = 2), the other direct donor atoms together carry another ∼0.5 majority spins, which is included in the Fe spin count in the schematic.

Figure S12 :
Figure S12: Structure of NVA-5 B, with the Fe-(C 5 )H distance and the O p -Fe-(C 5 )H angle highlighted in green and orange, respectively.

Figure S13 :
Figure S13: Energetic profiles of the reaction coordinate scans for THR on the triplet, quintet, and septet surfaces.Large, labelled symbols denote optimised intermediates.Scans and optimisations done at B3LYP-D3/def2-SVP/MM level.

Figure S14 :
Figure S14: Energetic profiles of the reaction coordinate scans for ABA on the triplet, quintet, and septet surfaces.Large, labelled symbols denote optimised intermediates.Scans and optimisations done at B3LYP-D3/def2-SVP/MM level.

Figure S16 :
FigureS16: QM/MM-optimised structures of the preferred [Fe=O] species, 5 D1, for the three substrates.In all three structures, Arg254 is hydrogen-bonding with one "arm" each to the oxido oxygen and the free carbonyl oxygen of the succinate ligand.

Table S1 :
Protonation states of histidine residues

Table S2 :
Equilibration procedure for MD simulations with Fe(II) active-site complex -H 2 O

Table S3 :
The ten docked poses of NVA with highest affinity

Table S4 :
The ten docked poses of ABA with highest affinity

Table S6 :
Channels leading to active site.r min is the bottleneck radius, L is the average length.ABA yielded two routes through the main channel, above and below the substrate.

Table S7 :
1 H-bond in ABA-D2.However, based on these limited data, we can neither support nor reject Wong et al.'s proposal.Prevalence of hydrogen-bond coordination numbers to oxido.

Table 5 . 3
: Geometric parameters of the oxygen adduct C. *THR/S=1 bond lengths come from optimisation at def2-SVP basis level

Table S9 :
Fe-H distances and O-Fe-H angles in O 2 complexes B from QM/MM optimisations and HYSCORE experiments 23 .