Computationally guided bioengineering of the active site, substrate access pathway, and water channels of thermostable cytochrome P450, CYP175A1, for catalyzing the alkane hydroxylation reaction

Understanding structure–function relationships in proteins is pivotal in their development as industrial biocatalysts. In this regard, rational engineering of protein active site access pathways and various tunnels and channels plays a central role in designing competent enzymes with high stability and enhanced efficiency. Here, we report the rational evolution of a thermostable cytochrome P450, CYP175A1, to catalyze the C–H activation reaction of longer-chain alkanes. A strategy combining computational tools with experiments has shown that the substrate scope and enzymatic activity can be enhanced by rational engineering of certain important channels such as the substrate entry and water channels along with the active site of the enzyme. The evolved enzymes showed an improved catalytic rate for hexadecane hydroxylation with high regioselectivity. The Q67L/Y68F mutation showed binding of the substrate in the active site, water channel mutation L80F/V220T showed improved catalytic activity through the peroxide shunt pathway and substrate entry channel mutation W269F/I270A showed better substrate accessibility to the active pocket. All-atom MD simulations provided the rationale for the inactivity of the wild-type CYP175A1 for hexadecane hydroxylation and predicted the above hot-spot residues to enhance the activity. The reaction mechanism was studied by QM/MM calculations for enzyme–substrate complexes and reaction intermediates. Detailed thermal and thermodynamic stability of all the mutants were analyzed and the results showed that the evolved enzymes were thermally stable. The present strategy showed promising results, and insights gained from this work can be applied to the general enzymatic system to expand substrate scope and improve catalytic activity.


Materials:
Component of bacteriological media (Tryptone, Yeast extract, Sodium chloride salt, Agar) were purchased from Himedia India.Isopropyl-β-D-1-thiogalactopyranoside (IPTG) was purchased from Bio Basic Canada Inc. Ni-NTA beads and plasmid isolation kit were purchased from QIAGEN.Ampicillin, Kanamycin, Chloramphenicol, hexadecane, octadecane, β Nicotinamide adenine dinucleotide, reduced dipotassium salt (NADPH), Lysozyme (from chicken egg white) and 4-amino luvolinic acid were purchased from Sigma-Aldrich.Hydrogen peroxide (H2O2) was purchased from SRL India.GC-MS grade solvents like ethyl acetate, chloroform and hexane were purchased from Merck.All the chemicals were used as supplied without any further purification until unless stated.

Bacterial Strain and plasmids:
E. coli XL10 Gold cells from agilent were used for plasmid DNA amplification and isolation.
E.coli BL21(DE3) codon plus RP cells from agilent were used for protein expression.
pRSFDuet-1 encoding for gene CYP175A1 from Thermus thermophilus HB27 was previously cloned and used.

Expression of wild type and mutants of CYP175A1:
BL21-DE3 codon plus RP cells were transformed with the expression plasmid pRSFDuet-1 encoding gene for CYP175A1.The transformed cells were grown overnight in 5 mL LB medium containing 34μg/mL chloramphenicol and 50 μg/mL kanamycin at 37 °C 200 rpm.The overnight grown culture was inoculated in 5 litter LB medium containing 34μg/mL chloramphenicol and 50μg/mL kanamycin and incubated at 37 °C 200rpm till the O.D. 600 reached 0.6.-0.8.Expression of proteins were induced by adding 1mM IPTG and the culture was grown at 37 °C 200rpm for 16 hours.The cells were harvested by centrifuging at 5000rpm, 4 ͦ C for 45 minutes.Pellet was stored at -20 ͦ C refrigerator.

Molecular Docking Protocol:
All the docking simulations were performed using protein crystal structure of CYP175A1 obtained from the protein databank entry code 1N97.Before starting docking calculations we have removed all the undesired residues like water, sulphate ions, ethylene glycol etc. from the crystal structure whereas additionally we have added all the hydrogen atoms in the protein.No hydrogen was added to sulphur atom of cys336 and kept as negatively charged.Mutations in the protein chain were made in Chimera using Dunbrack backbone-dependent rotamer library.
Docking was performed using GOLD 1 (Genetic Optimization for Ligand Docking), version 5.3.0, in combination with the Chemscore scoring function 2 parameterized for heme containing proteins.
All the docking poses were rescored using ChemPLP 3 scoring function to be ensure the reliability of results.The substrate binding site for docking was defined as the cavity in the distal pocket of the enzyme.The radius from this point was set to 1.5 nm to include the substrate access channel in the accessible volume for the docking.At most, 1000000 operations in the genetic algorithm were performed using a population of 500 genes.At least five independent docking simulations were performed to reach statistical significance.At least 10 poses were stored from every docking simulation, except if the best three docking poses had root-mean-square deviations smaller than 0.15 nm.To prevent an early convergence, a relative pressure of 1.1 was specified, and to account for diversity, the number of niches was set to 2. Ligand flexibility was specified as follows: The flipping of the free corners of ligand rings (if any), the flipping of amide bonds, and the flipping of planar nitrogen were allowed.Intramolecular hydrogen bonds were also allowed.To check the reliability of our docking methodology we performed control docking using CYP102A1and palmitoleic acid as well as CYP101 and 1s-Camphor.Docking results were very much similar to the crystal structure of the crystal structure of palmitoleic acid bound to CYP102A1 (PDB ID. 1FAG).Highest scoring pose had a RMSD of 0.2 Å with the experimental enzyme substrate complex (1FAG).In case of CYP101 substrate bound complex (2CPP) and camphor docking results were even more reliable with the RMSD of 0.01 Å.

Definition of ligand binding site in WT and mutants of CYP175A1:
6        The best-docked poses were selected for the MD Simulations.The missing hydrogen atoms of the enzyme (PDB id: 1N97) were added using the Leap Module of the Amber 20 MD package using the Amber ff19SB force field 4 .The heme parameters for the heme were developed by Cheatham et al, compatible with the Amber forcefiled and we used the same in our study 5 .Parameters for the substrate, hexadecane, were prepared using an antechamber module of the Amber MD package using Generalized Amber Force Field 2 (GAFF2).Partial charges of the substrate were calculated using the RESP (restrained electrostatic potential) method 6 of a QM optimized geometry at HF/6-31 G (d, p) level of theory.After system parameterization, the enzyme-substrate complex was soaked into a TIP3P water box extending up to 12Å from the protein boundary.An appropriate number of counter ions were added to neutralize the charge of the system.
After proper system setup, the complex was minimized in two steps; in the first step solvent minimization was performed and in the second step entire system was minimized without any restraint using 5000 steps of steepest descent followed by 5000 steps of conjugates gradient algorithm.The system was then gently heated from 0 to 300 K for 50 ps using the NVT ensemble, followed by 1 ns simulation at the NPT ensemble at 300 K and 1.0 atm using the Langevin thermostat 7 and Berendsen barostat 8 with a collision frequency of 2 ps and a pressure relaxation time of 1 ps.The equilibrated system was further subjected to a further productive run for 100ns.The Monte Carlo barostat was used during all production MD simulations.Moreover, replica simulations were performed to ensure the consistency of the obtained results.The SHAKE algorithm 8 was employed to constrain the hydrogen bonds, while particle mesh Ewald (PME) 9 and appropriate cutoff distances (12 Å) were used to treat the long-range electrostatic and van der Waals forces, respectively.All MD simulations were carried out in the GPU version of the AMBER20 package 10 .The CPPTRAJ module of the AMBER 20 was used to analyze all the results.A similar protocol was used for all mutants of the CYP175A1 enzyme.

QM/MM Calculations protocol:
The C-H activation mechanism was investigated using QM/MM calculations.QM-region includes heme porphyrin with ligated cysteine, substrate, and a water molecule.For cysteine, we used S-H and therefore we cut the S-C bond as QM/MM interface.All protein residues and water molecules within 8Å of the heme were included in the active region of the QM/MM calculations.The atoms in the active region interact with the QM atoms through electrostatic and van der Waals interactions and the corresponding polarization effects were considered in the subsequent QM energy.All QM/MM calculations were performed with ChemShell 11 , by combining Turbomole 12 for the QM part, and DL_POLY for the MM part.The MM region was treated using the Amber ff19SB force field and the electronic embedding scheme was used to account for the polarizing effect of the enzyme environment on the QM region.
The QM/MM boundary was treated using hydrogen link atoms with the charge shift model 13  (10).During QM/MM geometry optimizations and frequency calculations, the QM region was treated using the hybrid UB3LYP functional 14 with a def2-SVP basis set.All of the QM/MM transition states were located by relaxed potential energy surface (PES) scans followed by full TS optimizations using the P-RFO optimizer implemented in the HDLC code.The zero-point energies (ZPE) for all the species were further corrected and values were added with dispersion correction 15 .The final energy was further corrected by single-point energy calculations with a higher basis set at UB3LYP/def2-TZVP+ZPE level of theory.The basis set and QM method were chosen on the basis of several similar previous studies in P450 chemistry. 16The Cpd I (RC) mediated P450 reactions can be observed in two different spin states that is doublet and quartet.It is noteworthy that our calculations suggest that the reaction will not proceed through the quartet spin state since the barrier is higher compared to that in the doublet state., therefore all results were taken from the doublet states, which is the favoured one.However, the calculations for the quartet spin state have been added in Figure S6 for comparison.Provided geometry are for truncated QM zone.

Protein purification protocol:
The cell pellet was taken out from -20 °C and re-suspended in lysis buffer (10 mL of 50mM tris with 150mM NaCl, 1%Triton X100, 1mg/mL Lysozyme, 1mM EDTA, 1mM AEBSF-for pellet of 2 litter culture) and kept on rotary at 4 °C for 1 hour.The cell lysate was disrupted by sonication with 40% amplitude for 1 hour with 30 sec on -30 sec off on ice bath.The cell debris was removed by centrifugation at 17000rpm, 4 °C for 60 minutes.The supernatant was poured into 7 mL equilibrated Ni-NTA beads.Kept beads on rotary for 10-12 hours at 4 °C.
The beads were then poured into Ni-NTA affinity column and flow through was collected in a 50 mL falcon.Beads were washed with 40 mL binding buffer (50 mM Tris, 150 mM NaCl, 1% Glycerol pH8).Again beads were further washed with 30mL wash1 buffer containing binding buffer with 30mM imidazole.Further washed beads with 30mL wash2 buffer containing binding buffer with 50mM imidazole.Finally protein was eluted in 15 mL elution buffer containing binding buffer with 250mM imidazole.Purified protein solution was concentrated using 10 KDa membrane filter by centrifuging 2-3 times at 3500 rpm for 10 minutes.Protein was further purified by size exclusion chromatography using Superdex 75PG column.

PCR reaction protocol:
The site directed mutagenesis of the CYP175A1 gene were carried out using Platinum Superfi DNA polymerase PCR kit from Invitrogen.The plasmid was isolated from the freshly transformed XL10-Gold E.coli cells and the purity was checked by running agarose gel electrophoresis (1% agarose).Ethidium Bromide was used for staining and 1kb DNA ladder

CO Binding spectra:
CO-binding spectrum (Figure S9) was recorded according to earlier published reports. 17Briefly, Carbon monoxide was prepared by reacting sulphuric acid (H 2 SO 4 ) with formic acid (HCOOH).
Fumes of carbon monoxide were purged into CYP175A1 protein solution in a cuvette and few grains of sodium dithionite were added.This solution was then analysed for shifting of soret band to 450 nm by UV-VIS spectroscopy.

Catalytic reaction condition:
Reaction conditions for alkane hydroxylation were same of wild type and all the mutants of CYP175A1.In a typical reaction system 10 μM of enzyme was incubate with 1mM hexadecane solution for 20min.Monooxygenation reaction was started by adding 10mM H 2 O 2 .Stock solutions of the substrates was prepared using ethanol as solvent.Total reaction volume was 1ml was added in a 20ml screw cap scintillation vial and stirred with magnetic bead in it.
Reaction vials were incubated at 50 °C with constant stirring and products were analysed after 24 hours using GC-MS and GC-FID.All the experiments were performed at least in triplicate.
All the catalytic reactions were performed under the similar conditions until otherwise stated.

Control experiments:
Three sets of reactions as the control experiments were performed.In first case, enzyme (10 uM) was mixed with substrate (1mM) and the reaction mixture was incubated at 50 °C for 24 hours and the reaction was analysed using GC-MS and GC-FID.In the second set of control experiment, substrate (1mM) was mixed with H 2 O 2 (10mM) and the mixture was incubated at 50 °C for 24 hours.In the third control experiment, enzyme (1uM) was mixed with H2O2 (10mM) and the mixture was incubated at 50 °C for 24 hours.All the control experiments were analysed by GC-MS and GC-FID.

Determination of substrate consumption and product yield:
Calibration curve for area v/s concentration for the substrate (Hexadecane) and internal standard (octadecane) were plotted as shown in Figure S11 and the substrate consumption was determined by comparing the relative peak area ratios of substrate and internal standard.This

Figure S2 :
Figure S2: a) active site cavity of the wild-type CYP175A1 (PDB ID 1N97) showing a "U" type conformation (shape modelled in red colour for representation) b) Active site cavity of CYP102A1 (PDB ID 1FAG) showing the substrate bounds in the cavity.

Figure S4 :
Figure S4: Highlighted residues represents the binding site defined in the crystal structure of wild type CYP175A1 for the docking studies.

Figure S3 :
Figure S3: Control docking experiment performed on CYP101A1 (cyan colour represents docked camphor and golden colour represents crystal structure of camphor).

Figure S5 .
Figure S5.Hexadecane (maroon colour) binding conformation in the binding site of various mutants of CYP175A1.Data showing the pose of highest scoring docking solution.Figures were made in chimera.

Figure S7 :
Figure S7: RMSD (in Å) for protein backbone od the wild-type and mutants of CYP175A1 during MD simulations.

( 15 .Figure S8 :
Figure S8: Raw data of Plasmid sequencing.A) Forward primer for the W269F/I270A mutation (above) and corresponding plasmid sequencing data (below).B) Reverse primer for the W269F/I270A mutation (above) and corresponding plasmid sequencing data (below).C)Forward primer for the V220T mutation (above) and corresponding plasmid sequencing data (below).D) Reverse primer for the V220T mutation (above) and corresponding plasmid sequencing data (below).E) Forward primer for the Q67L/Y68F mutation (above) and corresponding plasmid sequencing data (below).F) Forward primer for the Q67L/Y68F mutation (above) and corresponding plasmid sequencing data (below).

Figure S9 :
Figure S9: CO binding spectrum of LF and LFFTFA mutant of CYP175A1.a) CO binding spectrum of LF mutant showing characteristic peak shift from 417 nm to 450nm.b) CO binding spectrum of LFFTFA mutant showing characteristic peak shift from 417 nm to 450nm.c) CO binding difference spectrum of LFFTFA mutant showing characteristic peak maxima at 450nm.
All the reactions were analysed using gas chromatography instrument coupled with flame ionisation detector (GC-FID).Catalytic reactions were stopped at different time interval by adding 20μl of 2M HCl and 10uL of 0.4mM Octadecane solution (in acetonitrile) was added as internal standard.Reaction products were isolated by solvent extraction method using CHCl 3 .CHCl 3 layer was taken and dried over anhydrous Na 2 SO 4 and subjected to GC-MS analysis.All samples were injected at a volume of 1.0 μl, and analyses were performed at least in triplicate.All analysis were performed on Trace GC ultra instrument by Thermo electron corporation system having GsBP-5MS, 30m x 0.25mm x0.25μm column fitted into the machine.A typical GC temperature program for separating the reactant and products was 250°C PTV injector with split flow rate of 50ml/min, 50 °C oven for 2 min, then 15 °C/min gradient to 220 °C hold at 220 °C for 3min, 15 °C/min gradient to 300 °C, and then 300 °C for 6 min.FID parameters were as follows: Flame temperature was 250 °C, zero air= 350ml/min, hydrogen= 35ml/min, Nitrogen=30ml/min.

2 AreaFigure S10 :
Figure S10: GC chromatographs (a-d) for the substrate (hexadecane) and internal standard (octadecane) and peak area calculation for the calibration curve.

Figure S11 :
Figure S11: Calibration curve for the hexadecane for the determination of substrate conversion..

Figure S12 :Figure S14 :
Figure S12: GC-FID chromatograph for the reaction product analysis.Upper trace shows the analysis of control reaction and the lower trace shows the analysis of catalytic reaction catalysed by final mutant LFFTFA of CYP175A1.

Figure S13 :
Figure S13: Mass spectrum of the reaction products identified using GC-MS.a) Mass spectrum of 7-Hexadecanol b) Mass spectrum of 8-Hexadecanol c) Mass spectrum of 6-Hexadecanol.

Figure S15 :
Figure S15: GC-FID chromatogram for the reaction product analysis.Chromatogram shows the product distribution of the reaction catalysed by a) QYLV, b) QYWI, c) QYVWI and d) QYLVWI mutants of CYP175A1.

Figure S16 :
Figure S16: Thermal induced unfolding profiles for secondary structure of the mutants of CYP175A1 obtained from CD at 221 nm in 50 mM Tris buffer, pH 8. ΔH T , ΔS T and ΔG T curves plotted for secondary structure of mutants of CYP175A1.Each point in the thermodynamic parameter curves represents the value at corresponding temperature.The solid line represents nonlinear curve fits to Gibbs-Helmholtz equation.

Table S2 :
Virtual library of double mutant of CYP175A1.Q67 and Y68 residues have been replaced by the non-polar amino acids.

Table S1 :
Amino acids present with in 5Å distance from the hexadecane from the highest scoring docking pose categorised according to their polarity

Table S3 :
Results of hexadecane docking in the virtual library of double mutant of CYP175A1.* docking score represent the Chem score scoring function 1 .# docking pose in the lowest energy docked complex.

Table S4 :
Spin densities of the stationary points included in the QM/MM calculations for the doublet spin states.Calculations were done using B3LYP/def2-TZVP level of theory.

Table S5 :
Spin densities of the stationary points included in the QM/MM calculations for the quartet spin states.Calculations were done using B3LYP/def2-TZVP level of theory.