Modelling the active SARS-CoV-2 helicase complex as a basis for structure-based inhibitor design

The RNA helicase (non-structural protein 13, NSP13) of SARS-CoV-2 is essential for viral replication, and it is highly conserved among the coronaviridae family, thus a prominent drug target to treat COVID-19. We present here structural models and dynamics of the helicase in complex with its native substrates based on thorough analysis of homologous sequences and existing experimental structures. We performed and analysed microseconds of molecular dynamics (MD) simulations, and our model provides valuable insights to the binding of the ATP and ssRNA at the atomic level. We identify the principal motions characterising the enzyme and highlight the effect of the natural substrates on this dynamics. Furthermore, allosteric binding sites are suggested by our pocket analysis. Our obtained structural and dynamical insights are important for subsequent studies of the catalytic function and for the development of specific inhibitors at our characterised binding pockets for this promising COVID-19 drug target.


List of Homologous PDB Structures
The helicase model was used as starting point for MD simulations. The system consists of the helicase, three Zn 2+ ions, ATP, Mg 2+ and ssRNA with eight uracil bases. The MD simulations were performed using NAMD 2.13, 1 using CHARMM36 force field. 2 The system was solvated by 50,000 -70,000 TIP3P water molecules resulting in a box of 120 Å per side. To neutralize the system and account for a 0.15 M KCl solution we added 171 K + and 189 Clions. 3 Periodic boundary conditions (PBC) were used in all the simulations and the particle mesh Ewald (PME) method was used for long-range electrostatic interactions. SHAKE algorithm was deployed to constraint the covalent bonds involving hydrogen atoms. A cutoff 12 Å was used to treat nonbonding interactions. The energy of the system was minimized using a standard protocol via steepest descent algorithm for a total number of 10,000 steps, followed by 50 ns equilibration with restrained heavy atoms (heavy atom of the backbone of the protein and the nucleic acid with an isotropic force of 1000 kJmol -1 nm -1 ) in constant pressure and temperature (NPT) and constant volume and temperature (NVT; up to 1ns) at 303.15 K via standard MD procedure with a time step of 2 fs. To maintain the tetrahedral coordination of the three zinc ions in the ZBD domain, we applied a combination of angle and distance restrain during all the MD simulations. To help equilibrate the complexes, we used a harmonic constraint on selected contacts with a force constant of 10 kcal/mol for 15 ns in our preliminary MD simulations to maintain relevant contacts. These constraints were subsequently progressively reduced and removed during the next 20 ns, using the colvar function implemented in NAMD.

Amber/Gromacs
To compare simulation results obtained with MD, we also carried out MD simulations using GROMACS 2018 [4][5][6][7] with the Amber ff99+ parmbsc0+chioL3 force field 8,9 for ssRNA and Amber14SB 10 for the helicase. To maintain the coordination of the Zn 2+ ions, the ZAFF model was used 11 . The molecular systems were placed in a cubic box and solvated with TIP3P water molecules. 3 The distance between the solute and the box was set to at least 14 Å. The solute was neutralized with potassium cations and then K + Clion pairs were added to reach the salt concentration of 0.15 M. We used the ion corrections of Joung et al. 12 as this force field has been shown to produce stable RNA structures. 13 The parameters for Mg 2+ are taken from Ref. 14 . Long-range electrostatic interactions were treated using the particle mesh Ewald method 15,16 with a real-space cut-off of 10 Å. The hydrogen bond lengths were restrained using P-LINCS, 5,17 allowing a time step of 2 fs. 18 Translational movement of the solute was removed every 1000 steps to avoid any kinetic energy build-up. 19 After energy minimization of the solvent and equilibration of the solvated system for 10 ns using a Berendsen thermostat (τT = 1 ps) and Berendsen pressure coupling (τP = 1 ps), 18 simulations were carried out in an NTP ensemble at a temperature of 300 K and a pressure of 1 bar using a Bussi velocityrescaling thermostat 20 (τP = 1 ps) and a Parrinello-Rahman barostat (τP = 1 ps). 21 During minimization and heating, all the heavy atoms of the solute were kept fixed using positional restraints. The restraints on the RNA and the protein backbone were relaxed slowly during the equilibration from 1000 kJmol -1 nm 2 to 10 kJmol -1. nm 2 .
Amber/Amber-GPU Additional MD simulations, constructed using the AmberTools20 building package, were performed with the GPU version of Amber18 using the ff14SB force field to represent the protein, 10 the ff99OL3 force field for the RNA, 8,9 ATP parameters from Meagher et al 22 and parameters for Mg 2+ are taken from Ref. 23 . The tetrahedral coordination state of the zinc was maintained using the ZAFF bonded force field. 23 Note that additional parameters were required for the HIS-33 that interacted with the zinc via its epsilon nitrogen by reference to comparable parameters in the ZAFF using a hybrid of the center ID 4 and 6 models. 24 For structures where ATP is bound, the octahedral coordination of the Mg 2+ (which involves bonds to the ATP β and γ phosphate oxygen atoms, one with oxygen of the Ser289 hydroxyl group and three structural water molecules) was constructed using the Chimera metal center builder. 25 The solute was neutralized with potassium cations, then the protein was immersed in a box of TIP3P water molecules extending a minimum of 10 Å from the protein surface, and K + Clion pairs were added to achieve a salt concentration of 0.14 M. MD simulations were performed in the NTP ensemble, with Berendsen temperature and pressure coupling. SHAKE was applied to all bonds involving hydrogen, allowing an MD integration timestep of 2 fs. Long-range electrostatic interactions were treated using the particle mesh Ewald method 15,16 with a real-space cut-off of 12 Å. To equilibrate the protein and nucleo-protein complexes, the systems was initially energy minimized with positional restraints placed upon the solute, followed by minimization of both solvent and solute. The system was then heated to 300 K in the presence of positional restraints upon the solute, which were gradually reduced from 50 kcal/mol Å 2 to 1.0 kcal/mol Å 2 over a timescale of 100 ps. For the apo-helicase structure, all restraints were then removed. For the ATP-ssRNA helicase complex which included the coordinated Mg 2+ ion, an additional 50 ns of equilibration was performed with harmonic distance restraints (set at 2.1 Å with a spring constant of 20 kcal/mol Å 2 ) to maintain the positions of coordinated atoms, and angle restraints imposing the octahedral geometry around the Mg 2+ ion. An additional restraint was imposed to maintain the orientation of Asp374 and Glu375 to the adjacent coordinated water molecule, as observed in the MutS-ATP complex (PDB ID 1w7a 26 ). Three 1 µs simulations of the apo-structure at a salt concentration of 140 mM, and one 1.5 µs simulation in neutralizing salt were performed. We have also obtained 1 µs simulations of the ATP-helicase (two replicas), the RNA-helicase and the ATP-ssRNA-helicase complex. For all coordinated ATP Mg 2+ metal centers, these equilibration protocols provide stable octahedral geometries, including the complexed water molecules, during unrestrained MD over 1 µs timescales.
List of MD Runs, Force Fields and Software.  Figure S1. RMSD of the protein backbone along unbiased MD trajectories for the apo (a) and holo (b) models.

Supplementary Note 2: Redundancy in the Similar Sequences from UniProtKB
The multipe sequence alignment produced for the representative 796 sequences is available here.
List of clusters determined by cd-hit, using 90% sequence identity as a cluster criterium. Sequences are identified by their UniProt ID ATP Binding Distances Figure S3. Histograms of nine key interactions in the ATP pocket has been monitored during the multiple MD simulations. For each distance (see title above each plot), we represent the distribution along multiple replicas for the CHARMM (blue) and Amber (orange) force field. The structural representation of each distance is shown in the insets. Figure S4. a: Dihedral angles involved in the conformational description. b: Envelope conformers of the ribose ring associated to puckering angle values. c: Schematic example conformers of C3'-endo and C2'-endo highlighting the out-of-plane atoms.

Supplementary Note 3: Puckering Analysis
To take the ribose conformation into account, we described the sugar ring using pseudorotation parameters ( Figure S4). Although there are four possible pseudorotation parameters for a five-membered ring, 27 two in particular are useful to characterize the sugar conformation: the phase (Pha) and the amplitude (Amp). While the amplitude describes the degree of ring puckering, the phase describes which atoms are most out of the mean ring plane. We calculated these parameters using the expression: 28 , with vi the ring dihedral angle i. This approach has the advantage of processing the five ring dihedrals from v1 (C1-C2-C3-C4) to v5 (O4-C1-C2-C3) in an equivalent manner. Conventionally, sugar ring puckers are divided into 10 families described by the atom which is most displaced from the mean ring plane (C1 , C2 , C3 , C4 or O4) and the direction of such displacement (endo for displacements on the side of the C5 atom and exo for displacements on the other side). Using the Curves+ program 29 for each simulation trajectory, for each nucleotide we computed the percentage of appearance for each family. To understand the interplay between the sugar conformation and the chemical reactivity, we grouped the sugar puckers into two large families. The sugar puckers C1'-exo, C2'-endo, C3'-exo, C4'-endo belong to the B-like family, while C1'-endo, C2'-exo, C3'-endo, C4'-exo belong to the A-like family.   Figure S8. DCC maps showing the correlation between the interatomic displacements over the motion described by the first principal component.

Supplementary Note 4: Apo Structures
All replicas of the apo structure show low flexibility and no major changes of the backbone structure of the dimer. We analyze the overall flexibility of the dimers and compare our results with the experimental b-factor obtained in 6jyt and 6zsl ( Figure S9 in the main text). Our model, in common with the two crystal structures, shows higher flexibility on the external shell of RecA2 domain, while the ATP and the RNA pockets appear to be more conserved. The ZBD shows low flexibility, in agreement with the b-value of 6jyt, but not with 6zsl (especially chain A), in which the temperature factor is higher, due to the different dimerization of the crystal structures ( Figure S9).

Pocket Analysis
Along the simulation trajectories, we define pocket distances to every 10th alpha carbons, and use these internal coordinates for identification. A representative pocket is selected from one frame and the similarity of other pockets from all frames are calculated via the pocket distances. A pocket is deemed identical to the reference, if it is below a threshold for the Cartesian distance within the space of the internal coordinates. This threshold is determined using the distribution of pocket distances. The analysis is implemented into our version of pyvol, and available together with the thresholds used via the link: https://github.com/bertadenes/pyvol." Table S5. Extended statistics of pocket volumes in different simulations. Pockets are named as depicted in Figure 7 in the main text. Holo and apo refers to whether the ATP and RNA substrates were bound to the helicase in the simulation. CH and Am are CHARMM and Amber Force Fields (FFs  Figure S11. Distribution of pockets grouped by force fields, data collected from holo simulations. Pockets are named as depicted in Figure 7 in the main text. Figure S12. Distribution of pockets grouped by force field, data collected from holo simulations. Pockets are named as depicted in Figure 7 in the main text. Figure S13. Example for a merged pocket along the domain interfaces in apo simulations (cyan surface). ATP (sticks) and ssRNA (cartoon) are included as visual aid for determining protein orientation. Figure S14. Allosteric pockets and residues near them. Sidechains within 4 Å of the pocket are depicted as sticks.

Supplementary Note 5: Correlation of Trajectory Features and Pockets
We defined residue-residue distances by the shortest non-hydrogen distance between the two residues. To filter for contacts that are producing significant variations in the trajectory, we selected those distances that are below 4.5 Å in 40-60% of the trajectory. These interactions were tested against the pocket volume data for correlation defined by Pearson's coefficient. Distances presented in Figure 9 of the main text were selected according to the high correlation/anti-correlation to the pocket data.