Drug repurposing and computational modeling for discovery of inhibitors of the main protease (Mpro) of SARS-CoV-2

The main protease (Mpro or 3CLpro) is a conserved cysteine protease from the coronaviruses and started to be considered an important drug target for developing antivirals, as it produced a deadly outbreak of COVID-19. Herein, we used a combination of drug reposition and computational modeling approaches including molecular docking, molecular dynamics (MD) simulations, and the calculated binding free energy to evaluate a set of drugs in complex with the Mpro enzyme. Particularly, our results show that darunavir and triptorelin drugs have favorable binding free energy (−63.70 and −77.28 kcal mol−1, respectively) in complex with the Mpro enzyme. Based on the results, the structural and energetic features that explain why some drugs can be repositioned to inhibit Mpro from SARS-CoV-2 were exposed. These features should be considered for the design of novel Mpro inhibitors.


Introduction
Coronaviruses (CoVs) are enveloped viruses with a diameter of 60 to 130 nm that contain a single-stranded positive ribonucleic acid (RNA) genome and belong to the Coronaviridae family, which is divided into four genera (a, b, g e d), in which SARS-CoV belongs to the genus Betacoronavirus. 1,2 COVID-19 (COronaVIrus Disease 2019) is a disease caused by the SARS-CoV-2 coronavirus that has spread rapidly around the world since it was reported in late 2019 in Wuhan, China. 3 SARS-CoV-2 is the seventh coronavirus known to infect humans and that can cause serious diseases, such as SARS-CoV and the Middle East respiratory syndrome (MERS-CoV), 4 while HKU1, 5 NL63, 6 OC43 (ref. 7) and 229E 8 are associated with mild symptoms. SARS-CoV-2 is the one with the highest transmission rate, as it includes asymptomatic carriers, a long latency period, and a high infection rate. 9 COVID-19 causes fever, fatigue, dry cough, muscle pain, shortness of breath, and in some cases lead to pneumonia. 10 In addition, a small proportion of those infected ($5%) develop diseases that can progress to severe acute respiratory syndrome (SARS), which can lead to death. 9 Since the initial SARS-CoV-2 outbreak, the World Health Organization (WHO) declared on March 11, 2020, COVID-19 as a pandemic disease due to the spread of the coronavirus. 11,12 Until April 26, 2021, the total number of conrmed cases of COVID-19 was 146 841 882 with over 3 104 743 deaths in at least 223 countries, 13 which indicates the epidemic is a serious public health problem.
The 3-chymotrypsin-like protease enzyme (3CL pro ), also known as the "main protease" (M pro ), is considered an attractive target in drug design for the treatment of coronavirus infection. 14,15 The role of this protease involves the proteolytic processing of viral polyproteins, being considered essential in the process of replication and maturation of the virus. 16 The M pro cysteine protease contains three domains, domain I being formed by residues 1 to 101, and domain II consisting of antiparallel barrels that are formed by residues 102 to 184, and for enzyme activity domain III, composed mainly of alpha helices, formed by residues 201 to 301. 17,18 The substrate-binding region is located in the gap between domains I and II and consists of a catalytic dyad formed by the residues Hys41 and Cys145, in which cysteine acts as a nucleophile and histidine as a proton acceptor. 14,19 In addition, as M pro does not have a homologous human enzyme, it presents itself as an ideal target for drug design. 15,19,20 A traditional drug discovery process is expensive and timeconsuming and can take decades to complete. 21 Currently, there are no drugs available for the treatment of COVID-19, and in the absence of effective treatment, the redirection of drugs becomes an attractive solution, due to the reduction in development time and cost, as the drugs have already been tested to what concerns safety in humans, which means they should quickly enter the most advanced clinical stages. 22,23 Structure-Based Virtual Screening (SBVS) is a method used to identify and prioritize ligands for subsequent in vitro and in vivo proling and, therefore, is an attractive way to identify new compounds for more effective treatment of COVID-19. 18,24 Therefore, in this study, a virtual screen based on the structure of the receptor in a set of molecules of approved drugs was used to identify promising candidates that could be tested against the SARS-CoV-2 enzyme. Then, molecular dynamics and binding free energy calculations were carried out to evaluate the inhibition potential of the selected commercial drugs against the M pro enzyme of SARS-CoV-2. Overall, this work shines a light on the M pro -inhibitor complex interactions, which is paramount for the development of new drugs against COVID-19.

Molecular docking, libraries description, and preparation
The crystallographic structure of the hydrolase enzyme was recovered from the Protein Data Bank (PDB) database under code 6LU7 with a resolution of 2.16Å. 15 The enzyme has a chain with a total of 306 amino acid residues. Molecular docking can be used to explore possible conformations of the ligand inside the receptor's binding pocket as well as estimation of the strength of ligand-protein interaction. 25 The molecular docking simulations were performed using the GOLD 2020.1 program (Cambridge Crystallographic Data Centre -CCDC, Cambridge, UK). 26,27 GOLD program uses a genetic algorithm to generate and select conformations of exible compounds that bind to a protein's receptor site. 28 Compounds were docked by applying the GoldScore scoring function with a search efficiency of 100%. The binding site was dened as a 10Å sphere centered on the N3 crystallographic ligand. Before the preparation of the enzyme for docking, where the covalent bond between the Cys145 residue and the N3 crystallized ligand was removed. Finally, water molecules were removed from the structure of the enzyme, and hydrogen atoms were added by the GOLD program.
Before starting the virtual screen with all compounds chosen from the database, a re-docking of the N3 crystallographic ligand was carried out. Thus, the best conformation obtained by the program is compared with that of the ligand co-crystallized in the enzyme. In this way, an analysis is made by comparing the values of the root-mean-square deviation (RMSD). RMSD values below 2Å show that the re-docking was successful according to data in the literature. 29 The PoseView online server 30 was used to analyze the interactions of hydrogens as well as hydrophobic interactions between the selected ligands and the amino acids of the enzyme, derived from molecular docking.
The two-dimensional structures of the 2998 compounds that were retrieved from the Repurposing hub (http:// www.broadinstitute.org/repurposing) and built-in the Marvin-Sketch program, 31 as well as all structures, were optimized in the OpenBabel program. 32 For the selection of compounds, the total drug-score parameter calculated by the OSIRIS Property Explorer program was also evaluated. 33 This measurement is a score generated by the combination of the parameters of drugsimilarity, hydrophobicity (c log P), aqueous solubility (log S), molar mass, and toxicity risk data. 34 System setup for MD simulations The molecular docking structures were submitted to QM optimization at the HF/6-31G* level by using the Gaussian09 program. 35 Then, at the same QM level, partial atomic charges for all docked compounds were obtained by applying the RESP method 36 as implemented in the antechamber module of the AmberTools19 package. 37 The general AMBER force eld (GAFF) 38 and AMBER ff14SB 39 were used in describing the MM parameters for the ligands and enzyme, respectively. The PROPKA3.0 (ref. 40) was used to compute the protonation states of the amino acid residues at neutral pH. Particularly, the catalytic dyad (Cys145 and His41) was considered in neutral form, as reported for the SARS-CoV-2 M pro . 41 Then, the absent protons were added by using tleap module of the AmberTools18 package. 37 Each system was solvated in a truncated octahedron box of TIP3P 42 water molecules with a buffer region of at least 10 A from the solute atoms. Then, Na+ ions were added to neutralize the overall charge of the complexes. The SHAKE algorithm 43 was applied to constraint the covalent bonds involving hydrogen atoms. The particle mesh Ewald approach 44 was adopted to compute the long-range electrostatic interactions, a cut-off distance of 10Å was used.
Each resulting system was minimized using 10 000 steps of the steepest descent method followed by 5000 steps of the conjugate-gradient method until the root-mean-square of the gradient was below 10 À4 kcal (mol À1 $Å À1 ). Aer, each minimized system was heated up from 0 to 300 K with solute restrained using a harmonic potential with a force constant of 100 kcal (mol À1 $Å À2 ) during 500 ps. Along the equilibration stage, the positional restraint force was reduced from 50 to 2 kcal (mol À1 $Å À2 ) during 1.2 ns of MD simulations. Then, 200 ps of density equilibration without any restraint under the NPT was performed. Finally, 100 ns of NPT simulation at 300 K was performed with a 2 fs time step. The GPU version of the pmemd 45 module of the Amber18 package was used to run all MM MD simulations. The structural analysis of each system was performed by computing the root-mean-square deviations (RMSD) of the main chain atoms concerning average coordinates structures. Similarly, the root-mean-square uctuations (RMSF) of Ca atoms of each amino acid residue were computed. Besides, MD runs for M pro -triptorelin system were also performed with a different starting structure (random seed). In addition to that, additional replicas starting from different sets of atomic coordinates and velocities were also chosen to generate completely independent trajectories. Details are presented in the ESI † les. The enabled us to calculate the error for these free energy results.

Binding free energy and residual decomposition analysis
For the binding free energy calculations, an average ensemble of structures from MD simulations, as described above, was extracted by using the CPPTRAJ module 46 of the AmberTools18 package. A total of 1000 snapshots were extracted from the last 10 ns trajectory of the production MD stage (at 10 ps intervals). Next, MM/GBSA 47,48 approach was employed to determine the binding free energy) M pro -ligand complexes using the MMPBSA.py, 49 according to the eqn (1): where, DE MM , gas-phase MM energy is computed by eqn (2): where, DE int , DE ele and DE vdW are the changes in the internal (bond, angles, and dihedral energies), electrostatic, and van der Waals energies, respectively. The DG SOLV is the sum of the electrostatic solvation energy (DG GB ) (polar contribution) and the nonpolar contribution (DG SA ), according to eqn (3): Particularly, DG SA is estimated by using the solventaccessible surface area (SASA) method. 50 As the entropic term (ÀTDS) has a heavy computational cost, it has been neglected for binding free energy calculations. 51 Finally, since the MM/PBSA 48,50 approach is computationally time-consuming, 52 MM/GBSA has been used to compute the contributions from individual residues by free energy decomposition analysis. 51 This method has been frequently applied in drug design studies. [53][54][55][56][57][58] Results and discussion

Virtual screening and molecular docking calculation
It should be highlighted that SARS-CoV-2 M pro is a crucial target for potential repurposing of known clinical drugs 41,59-64 as well as for designing specic protease inhibitors. 15,[65][66][67] Although clinical drugs are not available for use against SARS-CoV-2 M pro , others protease inhibitors have been designed to inhibit the very closely related SARS-CoV M pro . 1,68-70 Therefore, in this study, a computational protocol was applied to provide new insights for structureassisted and computational drug design against SARS-CoV-2 M pro .
Structure-based virtual screening was carried out in the gold program 2020.1, which uses a genetic algorithm to generate and select conformers of the database of compounds. From the results of re-docking, it was found that re-docking of the crystallographic ligand with the gold program gives an RMSD value of 2Å. Thus, it is observed that the GOLD program was able to reproduce the conformation of the crystallographic ligand and the main interactions obtained at the M pro enzyme site. 29 Besides, molecular docking calculations have been successfully applied in recent SARS-CoV-2 M pro studies. [71][72][73][74][75] Initially, 75 of the nearly 3000 drug molecules that had the best scores in the GOLD program were selected. It is noteworthy that the selection criterion was based on the score of remdesivir, as this has a potential effect in the treatment of COVID-19. 9,29,76 Aer, a visual inspection of the docking results was carried out on the Poseview online server, 30 taking into account mainly the interactions with the catalytic dyad residues of the M pro enzyme, in addition to the drug-score values to qualify the best drug candidates against COVID-19. 77 Thus, from the 75 structures, ten hit compounds were selected for further analysis (Table 1), considering the highest gold score values with M pro enzyme. Besides, they were selected using as criteria hydrogen interactions, complementarity through visual analysis at the active site of the M pro enzyme and the drug-score values. In addition to the dyad, the enzyme M pro has a catalytic region formed by the residues Phe140, Asn142, Gly143, Ser144, Cys145, Met165, Glu166, Gln189 and Thr190. 78 As a protease, the catalytic site contains different subsites where interactions occur with the amino acids of the substrate whose peptide bond will be hydrolyzed. The S1 subsite is formed by the residues Phe140, Leu141, Asn142, His163 and Glu166, the S2 subsite is formed by the residues Met49, Tyr54, His164, Asp187 and Arg188, the S3 subsite is formed by the residues Met165, Leu167, Gln189, Thr190 and Gln192, in addition to the S1 0 , formed by the residues His41, Gly143, Ser144 and Cys145. 75 Analyzing Table 1, it was possible to observe that the selected ligands interact with the enzyme through hydrogen bonds with the Thr24, Thr26, Hys41, Ser46, Met49, Asn119, Ser139, Phe140, Asn142, Gly143, Ser144, Cys145, His163, His164, Met165, Glu166, Pro168, Gly170, Asp187, Gln189, Thr190 and Gln192. In addition, among the ten main compounds selected, the interactions of residues His41, Asn142, Cys145, Glu166 and Gln189 can be highlighted, in which they make two, three, ve, six and six hydrogen bonds, respectively. As mentioned earlier, His41 and Cys145 are part of the catalytic dyad, and a computational study of Wang's drug repurposing showed that His41 and Gln189 residues were considered important. 79 These residues are mainly part of the S1 subsite (Asn142, Cys145 e Glu166) and S2 (His41 e Gln189), in addition, studies by Koulgi and collaborators suggest that the inhibitory effect is observed when interactions with these key residues occur at the active site of the enzyme. 80 Apart from remdesivir, application of darunavir against Covid-19 should be highlighted, especially since it has been studied as a possible treatment for SARS-CoV-2, and due to in vitro evidence that supports its ability to ght the disease. 81

Stability of MD simulations
Initially, it should be noted that experimental evidence reveals that in the biological environment, M pro enzyme acts in the dimer form instead of the monomeric one. 82 However, for computational proposals, it is possible to speed up computeraided drug design by using only the monomeric form of SARS-CoV-2 M pro . 73 Therefore, a total of 100 ns production simulations were performed for each repurposed drug on the M pro enzyme from molecular docking procedures. Initially, we compare the representative snapshots chosen from the last 10 ns MD trajectory of each system. As illustrated in Fig. 1, the binding modes of all repurposed drugs in the active sites of the M pro from MD simulated results are nearly the same place.
The root-mean-square deviations (RMSDs) of each system concerning the average structure were plotted in Fig. 2. As the plots show, the RMSDs of each system tend to converge aer 20 ns simulation time, indicating the system is stable and equilibrated. Besides, all systems show very small RMSDs, which change from 0.97 AE 0.23Å (pentagastrin) to 1.40 AE 0.20Å (octeotride), which suggests that they are in a similar conformation. Additionally, for the M pro -triptorelin, which shows the best binding free energy (as shown below), two more 100 ns of MD runs were performed using random seed with different atomic velocities. These results suggest that their average structures have very high 3D conformations and similar RMSD plots (Fig. S1 in the ESI † le).
To explore the uctuations of the residue of system-wise, the root-mean-square uctuation (RMSF) analysis was carried out. The RMSF of the Ca atoms of the protein for each residue during whole MD simulations for the complexes is shown in Fig. 3. The highest RMSF values are attributed to the N-and Cterminals, and loop regions. As shown in Fig. 3, the region closer to the active site formed by the Cys44-Asn53 residues (represented in red), that presented greater uctuation to the terlipressin compound. Based on this observation, Bzówka et al. suggested that the presence of an inhibitor in this region might stabilize the loops surrounding the active site. 72 Another highlight is that residues are formed by the Asp153-Cys156 (represented in orange). In addition, a study by Shekh and colleagues describes the importance of Cys156 in the accessibility of external agents in the M pro enzyme. 83 Another region that  presented high values of RMSF (Fig. 3), for all compounds, was the region formed by the Gly275-Thr292 residues (represented in yellow).

Binding free energy analysis
The binding free energies were computed using MM/GBSA method 48 as implemented in MMPBSA.py 49 are presented in Table 2. As a single MD trajectory of the bound complexes was used to compute DG bind , the DE int term can be canceled once the energy difference between complex systems and their components are computed using the same congurations. 51 The energy components of the binding free energies were also listed in Table 2. As can be seen from Table 2, the DE ele , DE vdW and DG SA terms are favorable contributors to most of the repurposed drugs, except for the pentagastrin and pepstatin drugs. Interestingly, in these systems, the term DG GB is favorable, which does not occur on other systems. Then, in most of the systems, the favorable electrostatic interactions are equilibrated by the unfavorable polar solvation upon binding. As a result, the total electrostatic interaction contributions have detrimental effects on the binding of repurposed drugs. Except for the pentagastrin and pepstatin, the positive solvation contribution indicates that the repurposed drugs were exposed to water interaction in their respective complexes. If we consider the contributions of different binding free energy components of M pro -ligand binding, the most important determinant of difference in the binding affinity are DE vdW and DG GB contributions.
Interestingly, according to molecular docking results, the triptorelin was pointed with the second-highest value of gold score (Table 1) and aer 100 ns of MD simulations has become the most promising M pro inhibitor among the selected compounds. On the other hand, terlipressin, which has the fourth-highest value of gold score (Table 1), has been identied as the weakest M pro inhibitor. This evidence can be observed by comparing the binding free energy values of the selected repurposed drugs (Table 2) it is possible to observe two compounds with distinct energy values. The calculated values of the terlipressin and triptorelin compounds have DG bind (À25.39 AE 0.21 and À77.28 AE 0.20 kcal mol À1 , respectively) and DE vdW contribution (À54.24 AE 0.30 and À92.42 AE 0.19 kcal mol À1 , respectively). These results suggest the importance of applied computational strategies as highlighted in previous studies. 25 To better understand the individual contribution of each residue for the binding free energy for the lowest DG bind , a residual decomposition analysis of DG bind was performed using MM/GBSA approach. 51 Here, any residue that contributes to the binding free energy values below À1.50 kcal mol À1 was considered an important residue to the binding process. The energy contribution of each residue for the complexes formed by terlipressin (weakest binding free energy) and triptorelin (strongest binding free energy) obtained by the MM/GBSA method are shown in Fig. 4, once they have shown the different binding free energy values. The residue contributions to binding energies using the plugin CHEWD 84 with Chimera 85 soware are shown on the right side of Fig. 4. From this, the favourable interactions are at the blue of the colour scale, that is, those that contribute to the stabilization of ligand into complex, while the red colour represents interaction by residue with positive values and corresponds to unfavourable interaction values.
Particularly, we can highlight the results for the darunavir, which shows a moderate binding free energy value (À63.70 AE 0.25 kcal mol À1 ). This drug has shown evidence in vitro studies involving a possible treatment for SARS-CoV-2. 86 Therefore, our results suggest that M pro would be a potential target for the darunavir drug.

Conclusion
Drug repurposing is the well-established safety strategy of the development of novel therapeutic compounds, where the principal advantage is reducing the time and cost of new candidates in a future clinical trial. The COVID-19 caused by the SARS-CoV-2 has become a pandemic health crisis and the discovery of effective treatment against the virus remains an outstanding challenge. The results of this study showed that the use of molecular modelling approaches, such as molecular docking and molecular dynamics simulations, can be used to design new drug leads that specically target the M pro enzyme from SARS-CoV-2. The molecular docking results allowed the identi-cation of key interactions in the M pro active site that make hydrogens bonds involving His41, Asn142, Cys145, Glu166 and Gln189 residues. The MD simulations of the best top 10 compounds indicated that systems were stable and equilibrated along the trajectory in the active site of the M pro enzyme, as well as a favourable value for the prediction of the binding energy with the MM-GBSA method. It is worth mentioning that the darunavir and triptorelin drugs exhibited the strongest binding free energies in complex with M pro enzyme. Thus, darunavir and triptorelin are potential inhibitors of SARS-CoV-2. In summary, all of the repurposed compounds reported here may provide insights for structure-assisted optimization to for potential application against the COVID-19 pandemic.

Author contributions
The manuscript was written through the contributions of all authors. All authors have approved the nal version of the manuscript.

Conflicts of interest
There are no conicts to declare.