Inhibitor binding influences the protonation states of histidines in SARS-CoV-2 main protease

The main protease (Mpro) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an attractive target for antiviral therapeutics. Recently, many high-resolution apo and inhibitor-bound structures of Mpro, a cysteine protease, have been determined, facilitating structure-based drug design. Mpro plays a central role in the viral life cycle by catalyzing the cleavage of SARS-CoV-2 polyproteins. In addition to the catalytic dyad His41–Cys145, Mpro contains multiple histidines including His163, His164, and His172. The protonation states of these histidines and the catalytic nucleophile Cys145 have been debated in previous studies of SARS-CoV Mpro, but have yet to be investigated for SARS-CoV-2. In this work we have used molecular dynamics simulations to determine the structural stability of SARS-CoV-2 Mpro as a function of the protonation assignments for these residues. We simulated both the apo and inhibitor-bound enzyme and found that the conformational stability of the binding site, bound inhibitors, and the hydrogen bond networks of Mpro are highly sensitive to these assignments. Additionally, the two inhibitors studied, the peptidomimetic N3 and an α-ketoamide, display distinct His41/His164 protonation-state-dependent stabilities. While the apo and the N3-bound systems favored Nδ (HD) and Nϵ (HE) protonation of His41 and His164, respectively, the α-ketoamide was not stably bound in this state. Our results illustrate the importance of using appropriate histidine protonation states to accurately model the structure and dynamics of SARS-CoV-2 Mpro in both the apo and inhibitor-bound states, a necessary prerequisite for drug-design efforts.


Parametrization of ketoamide
The initial parameters were taken from CGenFF website, which also provides penalties for assigned partial charges and parameters. At first, only charges with penalties higher than 10 were optimized by matching the energies and distances of interactions with water molecules at QM and MM levels, as advised for CHARMM compatible parameters. A few atoms with penalties less than 10 were added for the final charge optimization in order to improve the quality of the fit. The force field Toolkit (ffTK) was used for all charge and parameter optimizations and Gaussian16 was used for all QM calculations. Because the ketoamide molecule was too large for calculations at the MP2 level of theory, we used a smaller model (ketoamide model I) that still contained all atoms needing optimization ( Figure S1). All geometries were optimized at MP2/6-31G* level of theory and same level of theory was used for frequency calculations and dihedral scans. Water interactions were optimized at the HF/6-31G* level of theory, in accordance with CGenFF guidelines for these calculations.
The HF energies were scaled by a factor of 1.16 as recommended for neutral compounds.
For most atoms, a reasonable fit (error < 0.7 kcal/mol) was achieved (Table S1). It should be noted that atoms with larger errors in QM interaction energies, namely C24, N15 and C40 ( Figure S2), do not interact with the protein. Figure S1: Chemical structures of N3, ketoamide and ketoamide models used for charge optimization. Names of atoms with optimized charges are also shown.

Calculations of the pocket volume
In the Epock software package, the pocket cavity is defined by the user using a number of spheres. The centers of these spheres are based on centers of specified selections, and radius sizes are controlled by the user. The selections and sphere sizes used to define the pocket in Figure S2: Comparison of QM (black) and MM (red) energies of dihedral scans performed for ketoamide in order to optimize the dihedral parameters.
each monomer are given in Table S2 and Figure S4.    Figure S5: RMSF of C α atoms (left) and RMSD of whole residues (right) for all simulations of apo structure 6WQF. Figure S6: Distributions of pocket volume (left) and distance between NE and S in the catalytic residues His41 and Cys145 (right) for apo structure 6WQF.  Figure S7: Hydrogen Bonding occupancies and standard deviations in 6WQF based trajectories. A) His163-Tyr161, B) His172-Glu166, C) inter-monomer Glu166-Ser1 ′ with blue/green indicating the serine side chain/backbone respectively, D) the Arg40-Asp187 salt bridge/charge reinforced hydrogen bond, E) inter-monomer interaction Phe140-Ser1 ′ with blue/green indicating Ser1 ′ acting as an acceptor/donor, and F) the catalytic dyad residues with blue/green indicating the sulfur acts as donor/acceptor. In all figures, unless otherwise stated, His163 and His172 are HSE and standard deviations are indicated with bars. Also note, occupancies can be greater than 100% in cases where more than one hydrogen bond can be formed.  Figure S9: RMSF of C α atoms (left) and RMSD of whole residues (right) for all simulations of apo structure 6YB7. Figure S10: Distributions of pocket volume (left) and distance between NE and S in the catalytic residues His41 and Cys145 (right) for apo structure 6YB7. Figure S11: Hydrogen Bonding occupancies and standard deviations in 6YB7-based trajectories. A) His163-Tyr161, B) His172-Glu166, C) inter-monomer Glu166-Ser1 ′ with blue/green indicating the serine side chain/backbone respectively, D) the Arg40-Asp187 salt bridge/charge reinforced hydrogen bond, E) inter-monomer interaction Phe140-Ser1 ′ with blue/green indicating Ser1 ′ acting as an acceptor/donor, and F) the catalytic dyad residues with blue/green indicating the sulfur acts as donor/acceptor. In all figures, unless otherwise stated, His163 and His172 are HSE and standard deviations are indicated with bars. Also note, occupancies can be greater than 100% in cases where more than one hydrogen bond can be formed. Figure S12: RMSD of the active site for each separate run and each monomer (A and B) for the extended apo simulations. Figure S13: A) RMSF of C α atoms for the extended simulations starting from apo structure 6WQF. B) RMSD for each residue from the same simulations. Figure S14: Hydrogen Bonding interactions for the HE41-HE164 and HD41-HE164 systems based on five independent trajectories of 250 ns each. A) His163-Tyr161, B) inter-monomer interaction Phe140-Ser1 ′ with blue/green indicating serine acting as an acceptor/donor respectively, C) His172-Glu166, D) inter-monomer Glu166-Ser1 ′ with blue/green indicating the S1 side chain/backbone, and E) the Arg40-Asp187 salt bridge/charge reinforced hydrogen bond. Standard deviation indicated with bars. Figure S15: RMSF of C α atoms (left) and RMSD of whole residues (right) for all simulations of the N3-bound structure.
Figure S16: Distributions of pocket volume (left) and distance between NE and S in the catalytic residues His41 and Cys145 (right) for N3 bound structure.
Figure S17: N3 hydrogen bonding and hydrophobic contacts. A-C) Total hydrogen bonds from the ligand to the protein and D-F) total hydrophobic contacts. Figure S18: Hydrogen Bonding interactions in the N3-bound (PDB 7BQY) trajectories. A) His163-Tyr161, B) His172-Glu166, C) inter-monomer Glu166-Ser1 ′ with blue/green indicating the serine side chain/backbone, D) the Arg40-Asp187 salt bridge/charge reinforced hydrogen bond, E) inter-monomer interaction Phe140-Ser1 ′ with blue/green indicating Ser1 ′ acting as an acceptor/donor, and F) the catalytic dyad residues. In all figures, unless otherwise stated, His163 and His172 are HSE and standard deviations are indicated with bars. Also note, occupancies can be greater than 100% in cases where more than one hydrogen bond can be formed. Figure S19: RMSF of C α atoms (left) and RMSD of whole residues (right) for all simulations of ketoamide-bound structure 6Y2G. Figure S20: Distributions of pocket volume (left) and distance between NE and S in the catalytic residues His41 and Cys145 (right) for ketoamide-bound structure 6Y2G. Figure S21: Ketoamide hydrogen bonding and hydrophobic contacts. A-C) Total hydrogen bonds from the ligand to the protein and D-F) total hydrophobic contacts. Figure S22: Hydrogen bonding interactions in the ketoamide-bound (PDB 6Y2G) trajectories. A) His163-Tyr161, B) His172-Glu166, C) inter-monomer Glu166-Ser1 ′ with blue/green indicating the serine side chain/backbone, D) the Arg40-Asp187 salt bridge/charge reinforced hydrogen bond, E) inter-monomer interaction Phe140-Ser1 ′ with blue/green indicating Ser1 ′ acting as an acceptor/donor, and F) the catalytic dyad residues. In all figures, unless otherwise stated, His163 and His172 are HSE and standard deviations are indicated with bars. Also note, occupancies can be greater than 100% in cases where more than one hydrogen bond can be formed.