Switching the proton-coupled electron transfer mechanism for non-canonical tyrosine residues in a de novo protein

The proton-coupled electron transfer (PCET) reactions of tyrosine (Y) are instrumental to many redox reactions in nature. This study investigates how the local environment and the thermodynamic properties of Y influence its PCET characteristics. Herein, 2- and 4-mercaptophenol (MP) are placed in the well-folded α3C protein (forming 2MP-α3C and 4MP-α3C) and oxidized by external light-generated [Ru(L)3]3+ complexes. The resulting neutral radicals are long-lived (>100 s) with distinct optical and EPR spectra. Calculated spin-density distributions are similar to canonical Y˙ and display very little spin on the S–S bridge that ligates the MPs to C32 inside the protein. With 2MP-α3C and 4MP-α3C we probe how proton transfer (PT) affects the PCET rate constants and mechanisms by varying the degree of solvent exposure or the potential to form an internal hydrogen bond. Solution NMR ensemble structures confirmed our intended design by displaying a major difference in the phenol OH solvent accessible surface area (≤∼2% for 2MP and 30–40% for 4MP). Additionally, 2MP-C32 is within hydrogen bonding distance to a nearby glutamate (average O–O distance is 3.2 ± 0.5 Å), which is suggested also by quantum mechanical/molecular mechanical (QM/MM) molecular dynamics simulations. Neither increased exposure of the phenol OH to solvent (buffered water), nor the internal hydrogen bond, was found to significantly affect the PCET rates. However, the lower phenol pKa values associated with the MP-α3C proteins compared to α3Y provided a sufficient change in PT driving force to alter the PCET mechanism. The PCET mechanism for 2MP-α3C and 4MP-α3C with moderately strong oxidants was predominantly step-wise PTET for pH values, but changed to concerted PCET at neutral pH values and below when a stronger oxidant was used, as found previously for α3Y. This shows how the balance of ET and PT driving forces is critical for controlling PCET mechanisms. The presented results improve our general understanding of amino-acid based PCET in enzymes.

pKa Determinations for 2MP-α3C and E13A variant pH titrations were performed using a Cary 5000 UV-Vis spectrophotometer and a 11 cm quartz cuvette.Equal volume titrations were conducted by mixing a high pH solution with a low pH solution, each containing the same [protein], and 20 mM phosphate buffer.The pH was measured before and after each recorded spectrum.The pKa was obtained by fitting the ΔOD [298 nm (2MP-3C abs maximum) -400 nm (baseline)] vs. pH plot to a single pKa using Equation S1.

Excluding OH -as Primary Proton Acceptor
Following the same argument published in the SI of ref 3 , summarized below, we exclude OH -as primary proton acceptor for a concerted PCET mechanism when [Ru(bpy)3] 3+ or [Ru(deeb)3] 3 + were used as oxidants.
The concerted PCET reaction from MP-α3C with OH -as the primary proton acceptor can be analyzed as OH -reacting with the [Ru(III)…MP-α3C] encounter complex (where Ru(III) can be either [Ru(bpy)3] 3+ or [Ru(deeb)3] 3+ ): This reaction is sufficiently downhill such that it can be treated as irreversible.The steady-state approximation then yields the following expression: where the pseudo-second order rate constant k obs is defined as: Assuming that there is no specific driving force for formation of the complex in reaction S1, i.e., ΔG° = 0 we get kd/k-d = 1 M -1 .
If we assume that the diffusional rate constant kdiff equals1010 10 M -1 s -1 in water, this is the maximum value that k OH-can take (it is probably smaller given that the protein is large and therefore slow to diffuse).k OH-[OH -] is very likely much smaller than k -d which simplifies Eq.S4 to: where S6) This allows us to compare the measured k obs = k PCET rate constants with the [OH -].The results show that even at the highest pH-values, where the [OH -] is the largest, k OH-cannot account for the rate constants: k PCET = 1.8(±0.08)10 6M -1 s -1 (for 2MP-α3C/[Ru(bpy)3] 3+ at pH 9.0), [OH -](pH 9) = 110 -5 M making k OH-> 110 11 M -1 s -1 which is faster than a diffusional controlled reaction.While this derivation is based on several assumptions, k OH-is likely much smaller than the diffusional controlled reaction rate assumed for small molecules in water (kdiff = 1010 10 M -1 s -1 ) given the size of the protein.4MP-3C structures were generated from the experimental NMR restraints listed above by simulated annealing.NOE-derived proton−proton distance restraints were grouped in distance ranges of 1.7−3.0,1.7−4.0,and 1.7−5.0Å corresponding to strong, medium, and weak NOE cross-peak intensities, respectively.When one or two methyl groups were involved, the upper boundary was increased by 0.5 or 1.0 Å, respectively.Backbone dihedral angle and hydrogen bond restraints were derived from the secondary structure predictions made by the TALOS+.One thousand trial structures were generated and further evaluated using the CNS accept.inpscript to obtain a collection of refined structures.The 32 lowest-energy structures from this collection form the deposited 4MP-3C structural ensemble (PDB ID 8VSW, BMRB ID 31067).

Calculation of Spin Density and Mulliken Spin Populations
The spin densities were calculated for optimized geometries of the 4MP-3C, 2MP-3C, and 3Y side chain analogs in their neutral and cationic radical states with unrestricted DFT using Gaussian 16. 4 The spin densities were visualized, and the Mulliken spin population 5 values were computed.To examine if the trends observed depend on the DFT functional or basis set, we optimized the geometries and calculated the spin populations with three different functionals, B97X-D, 6 B3LYP-D3(BJ), [7][8][9] and M06-2X, 10 with the 6-31G** and 6-31+G** basis sets.Additionally, the 6-31++G** basis set [11][12][13] was used with the B97X-D functional.A tight convergence criterion (Opt=Tight) was used, and the optimized geometries were confirmed to be minima because they do not have any imaginary frequencies.The values of the spin populations for the two sulfur atoms and oxygen atoms in the sidechains at these levels of theory are given in Table S7.The spin densities for the DFT calculations at the B97X-D/6-31G** level of theory are visualized in Figure S14.
The spin densities and spin populations were also determined for these species with the complete active space self-consistent-field (CASSCF) method. 14,15 he geometries used for these calculations were optimized with DFT at the B97X-D/6-31+G** level of theory (Table S8).The active spaces were chosen with the automated π-orbital space (PiOS) method, 16 including the heavy atoms of the aromatic ring and adjacent sulfur and oxygen atoms, which resulted in a (9e, 8o) active space for the 4MP-3C and 2MP-3C sidechain models, and a (7e,7o) active space for the Y sidechain model.An aug-cc-pVTZ basis set was used for the CASSCF calculations, 17,18 and the PySCF program was used to perform the computations. 19,20 he spin densities for the CASSCF calculations are shown in Figure 6 in the main paper.

Table S7. Mulliken spin populations on key atoms of side chain analogs at various levels of theory a
a "O" refers to the hydroxyl oxygen of the sidechain, "S" refers to the sulfur atom closest to the phenol ring, and "S2" refers to the sulfur atom most distal from the phenol ring, i.e., closest to the backbone.
b "Total S" refers to the total spin population on the sulfur atoms in the molecule.Some of these results are also provided in

Investigation of Donor-Acceptor Distances Between 4MP- 3 C, 2MP- 3 C, and  3 Y Side Chain Analogs and Water
Gas phase geometries for the sidechain models hydrogen bonded to water were optimized at the DFT/B3LYP/6-311++G** level of theory, as shown in Figure S15.The geometries were confirmed to be minima because no imaginary frequencies were observed, and a tight convergence criterion (Opt=Tight) was used.Additionally, the geometries were optimized in implicit solvent using the integral equation formalism polarizable continuum model [21][22][23] with water as the solvent.For comparison, the calculations were also performed with the B97X-D functional.Distances and angles for the hydrogen bond to water are given in Table S9.S9.

MD Simulations with AMBER ff14SB force field
For the 2MP-3C 24 and 4MP-3C systems, two frames of the NMR ensemble were chosen randomly to use as starting points for solvation and equilibration of two independent trajectories per system (frames 3 and 28 for each system).The starting structures were solvated with TIP3P water, 25 and Cl -ions were added to neutralize the +2 charge, and then Na + /Cl -ions were added to produce a salt concentration of 150 mM.
As the 4MP and 2MP sidechains are not part of the standard protein force field, partial charges were assigned by the RESP procedure. 26In this protocol, the geometries of the sidechains and added N-methyl and acetyl blocking groups were optimized with the Hartree-Fock method and the 6-31G* basis set. 11,12 o conformers were generated with φ and ψ angles corresponding to a-sheet or corresponding to an  helix, following the precedent set by Ref. 27 .This workflow was followed to be consistent with the existing force field parameters.The atomic charges of the backbone atoms (C,O,N,H) were fixed to the corresponding charges of the canonical cysteine in the AMBER library file during the RESP fitting procedure.A group constraint on the blocking N-methyl and acetyl groups was used to force a neutral charge for the 2MP and 4MP sidechains.Partial charges and force field parameters are provided in Table S10 according to the atom naming given in Figure S16, and the optimized geometries used for the RESP procedure are given in Table S11.CA-S-S -2C 1 0.000 0.000 3.000 same as X -ss-ss-X CA-CA-S -S 2 0.800 180.000 2.000 same as X -ca-ss-X C -CA-S -S 2 0.800 180.000 2.000 same as X -ca-ss-X CA-CA-CA-S 4 14.500 180.000 2.000 same as X -ca-ca-X The systems then underwent the protocol described below in the same fashion as our previous paper 3 to equilibrate the added solvent and ions and provide initial minimization of the protein environment in the solvated environment.The restraints used during equilibration were harmonic with the force constants indicated.Bonded terms for the MP sidechains were adapted by analogy from existing parameters in the ff14SB forcefield. 28No hydrogen atoms needed to be added to the structure, as the locations were already assigned by the NMR experiments.
For these simulations, electrostatic interactions were treated with the Particle Mesh Ewald method, 29 and the van der Waals cut-off was set to 10 Å.The integration time step for all MD simulations was 1 fs, and the Langevin thermostat was used to control the temperature with a 2.0 ps -1 collision frequency.For the NPT ensemble, the Berendsen barostat was used. 30Bond lengths involving hydrogen were constrained by the SETTLE algorithm 31 for the water molecules and the SHAKE algorithm 26 for the protein.AMBER20 32 was used to perform the simulations, and the ff14SB forcefield was used to treat the protein.Equilibration of Protein/Solvent System 9. Heat system to 300 K in NPT ensemble over 360 ps in 60 ps increments, where the system is heated by 50K for 10 ps and then equilibrated at that temperature for 50 ps.10.NPT equilibration at 300 K for 20ns.

Minimization/Equilibration
11. NVT equilibration at 300 K for 100ns.Production 12. NVT MD for 1s A visual comparison of the NMR structures and MP-3C systems from the production MD trajectories is provided in Figure S17.As the forcefield parameters to describe the sulfur-aromatic carbon bonds, angles, and dihedrals were determined by analogy from existing parameters and were not widely tested, we performed additional simulations restraining the CB-S-S1-CG dihedral in both sidechains to the value found in the NMR ensemble using a 200 kcal/(mol•rad 2 ) force constant for the restraint.We found that the simulations with and without these restraints exhibited similar RMSDs despite sampling a much larger range of CB-S-S1-CG dihedral values compared to the NMR structures (Figures S18 and S19).The observed hydrogen-bonding interactions were also similar for the simulations with and without the dihedral restraints (Table S12).a Hydrogen bonding criteria were set as a heavy atom donor-acceptor distance less than or equal to 3.0 Å and a donor-hydrogenacceptor angle greater than or equal to 135.Hydrogen-bonding interactions with water included 2MP or 4MP each acting as a hydrogen bond acceptor or a hydrogen bond donor, where a fraction greater than one is possible due to more than one hydrogen bond forming.Hydrogen bonds to glutamate residues can be made to either carboxylate oxygen, and the number provided is the sum of the two possibilities. b The term "rest."denotes the simulations performed with a harmonic restraint on the dihedral angle CB-S-S1-CG.c For previous MD simulations of the 3Y protein, the percentage of a given trajectory with Y32 forming at least one hydrogen bond to a water molecule was computed to be 38.2% and 27.7% for two independent trajectories.For previous MD simulations of the 3Y protein, the percentage of a given trajectory with Y32 forming at least one hydrogen bond to E13 was computed to be 24.0%and 21.6% for two independent trajectories. 3d N.D. stands for not detected.e Plots of the distribution of donor-acceptor distance and donor-hydrogen-acceptor angle for the 2MP-C32 hydrogen bond with E13 or E33 for the 2MP-3C NMR ensembles as well as for MD trajectories are provided in Figures S20 and S21.

Figure S18. Dihedral angle distributions of the CB-S-S1-CG angle for the two NMR ensembles (top row) compared to the two independent MD trajectories with a restraint on this dihedral angle (second and third rows) and the two independent MD trajectories without this restraint (fourth and fifth rows).
Figure S19.RMSD of the C  atoms relative to the starting NMR structure for the production MD simulations.The top row are the four trajectories propagated for the 2MP-3C system (two with the dihedral restraint and two without this restraint), and the bottom panel is for the four trajectories propagated for the 4MP-3C system (two with the dihedral restraint and two without this restraint).to an average of ~2.6 Å, and the hydrogen bond is stable on this time scale.Note that due to the sampling limitations with QM/MM simulations, we do not expect to observe significant sidechain motions, but if the interaction were unfavorable, the donor-acceptor distance would be expected to increase, which was not observed.Thus, the hydrogen bond between E13 and 2MP appears to be stable when treating these residues quantum mechanically. .

Figure S23
. Donor-acceptor distance (left) and donor-hydrogen-acceptor angle (right) for the E13:OE-2MP:OH hydrogen-bonding interaction, as obtained from a QM/MM 5 ps trajectory.This hydrogenbonding interaction was maintained over the 5 ps trajectory when the E13 and 2MP residues were treated quantum mechanically.

Figure S1 .
Figure S1.Abs298 -Abs400 as a function of pH for 2MP-3C.The figure shows data in black and the corresponding fit in blue.The fit yielded a pKa of 9.7(±0.1).

Figure S2 .
Figure S2.Abs298 -Abs400 as a function of pH for 2MP-3C-E13A.The figure shows data in black and the corresponding fit in blue.The fit yielded a pKa of 9.2(±0.1).

Figure S6 .
Figure S6.Log(kPCET) for the oxidation of 2MP-α3C (orange circles with a solid line fit), 4MP-α3C (teal triangles with a dashed line fit) and 2MP-α3C-E13A (gray squares with a dotted line fit) vs. the pH.The fits are to a straight line and yielded a slope of 0.84 for 2MP-α3C and 4MP-α3C, and 0.78 for 2MP-α3C-E13A.Sample conditions: See Tables S1, S2, and S5.
. Because of this, the first 3-10 shots were used in analysis.The traces were fit with double exponential equations and only the fast component was used.The fast component represents the ET reaction between the oxidant and the protein, which is coupled to PT.The slower component represents what is likely a mix between different slower reactions such as [Ru(deeb)3] 3+ reaction with water and [Ru(deeb)3] 3+ .It is also possible that the slow component comes from [Ru(deeb)3] 3+ formation by reaction with the probe light, which then reacts with any of the abovedescribed reactants as well as the protein.

Figure S16 .
Figure S16.Atom naming used for MP sidechains.The naming is analogous to that for the Y and C sidechains when possible.Table S10.Force field parameters and partial charges for simulations containing MP sidechains MPC params BOND S -CA 317.0 1.74 !kb from CA-2C, equil distance from 2LXY structure

Figure S17 .
Figure S17.NMR ensembles of 2MP-3C (top) and 4MP-3C (bottom) compared to the structures obtained from molecular dynamics.(A,F) NMR structures; (B,G) MD conformations for trajectory 1; (C,H)MD conformations for trajectory 1 with dihedral restraints; (D,I) MD conformations for trajectory 2; (E, J) MD conformations for trajectory 2 with dihedral restraints.To obtain the MD conformations, ten conformations were abstracted from a 1s production trajectory at even intervals.

Figure S20 .
Figure S20.Donor-acceptor distance histograms for the E13:OE-2MP:OH interaction.(A) NMR structure ensemble; (B) 2MP Traj.1, which has no restraint on the CB-S-S1-CG dihedral.(C) 2MP Traj.2, which has no restraint on the CB-S-S1-CG dihedral.(D) 2MP rest.Traj.2, which has a restraint on the CB-S-S1-CG dihedral.The similarity of the distribution for two independent trajectories, shown in (B) and (C), indicates convergence.In these figures, the histograms are plotted separately for each carboxylate oxygen, and the cross hatched box in panel A represents the hydrogen bond criteria used in this work.Note that the MD simulations are qualitatively consistent with the NMR structure ensemble, which does not show significant hydrogen bonding between 2MP-C32 and E13 within the hydrogen bonding criteria used in this work.

Figure S21 .
Figure S21.Donor-acceptor distance histograms for the E33:OE-2MP:OH interaction.(A) NMR structure ensemble; (B) 2MP Traj.1, which has no restraint on the CB-S-S1-CG dihedral.(C) 2MP Traj.2, which has no restraint on the CB-S-S1-CG dihedral.(D) 2MP rest.Traj.2, which has a restraint on the CB-S-S1-CG dihedral.The similarity of the distribution for two independent trajectories, shown in (B) and (C), indicates convergence.In these figures, the histograms are plotted separately for each carboxylate oxygen, and the cross hatched box in panel A represents the hydrogen bond criteria used in this work.The cross-hatched bars overlayed on the histograms in panels B-D indicate the number of conformations that satisfy the distance and angle criteria for a hydrogen bond used in this work.The MD simulations show more hydrogen bonding with E33 than the NMR structure ensemble, which does not show significant hydrogen bonding between 2MP-C32 and E33.

Table S3 .
The table headings list the mean pH value (±var) calculated as an average between the pH values recorded before and after TA, the 2MP-3C concentration used for each specific measurement, observed pseudo-first order rate constants, and calculated second order rate constants (±std).Sample conditions: 20-30 μM [Ru(dceb)3] 2+ , 5 mM persulfate,100 mM KPi and 40 mM KCl.

Table S4 .
The

Table S5 .
The table headings list the mean pH value (±var) calculated as an average between the pH values recorded before and after TA, the 2MP-3C-E13A concentration used for each specific measurement, observed pseudo-first order rate constants, and calculated second order rate constants (±std).

Table S8 .
Table 2 in the main paper.Geometries used for CASSCF Mulliken spin population analysis in XYZ format.These geometries were obtained from DFT optimizations at the B97X-D/6-31+G** level of theory.

Table S9 .
Donor-acceptor distances and angles for the hydrogen bond of sidechain model to water.These values were obtained in the gas phase except the values in parentheses were calculated with implicit solvation.

Table S11 .
Cartesian coordinates in xyz file format of geometries used for RESP fitting of partial

Table S12 .
Percentage of MD trajectories with specified hydrogen-bonding interactions involving 2MP or 4MP. a