Shunzhou
Wan
a,
Agastya P.
Bhati
a,
Alexander D.
Wade
a,
Dario
Alfè
bc and
Peter V.
Coveney
*ad
aCentre for Computational Science, Department of Chemistry, University College London, UK. E-mail: p.v.coveney@ucl.ac.uk
bDepartment of Earth Sciences, London Centre for Nanotechnology and Thomas Young Centre at University College London, University College London, UK
cDipartimento di Fisica Ettore Pancini, Università di Napoli Federico II, Italy
dInstitute for Informatics, Faculty of Science, University of Amsterdam, The Netherlands
First published on 18th November 2021
Although researchers have been working tirelessly since the COVID-19 outbreak, so far only three drugs – remdesivir, ronapreve and molnupiravir – have been approved for use in some countries which directly target the SARS-CoV-2 virus. Given the slow pace and substantial costs of new drug discovery and development, together with the urgency of the matter, repurposing of existing drugs for the ongoing disease is an attractive proposition. In a recent study, a high-throughput X-ray crystallographic screen was performed for a selection of drugs which have been approved or are in clinical trials. Thirty-seven compounds have been identified from drug libraries all of which bind to the SARS-CoV-2 main protease (3CLpro). In the current study, we use molecular dynamics simulation and an ensemble-based free energy approach, namely, enhanced sampling of molecular dynamics with approximation of continuum solvent (ESMACS), to investigate a subset of the aforementioned compounds. The drugs studied here are highly diverse, interacting with different binding sites and/or subsites of 3CLpro. The predicted free energies are compared with experimental results wherever they are available and they are found to be in excellent agreement. Our study also provides detailed energetic insights into the nature of the associated drug–protein binding, in turn shedding light on the design and discovery of potential drugs.
Design, System, ApplicationThe COVID-19 pandemic has led to a rush to repurpose existing drugs which can treat the disease or arrest the spread of the virus. Drug repurposing can speed up the traditional process of drug discovery because the drugs have already been proven to be safe in humans. In the current study, we use molecular dynamics simulation and an ensemble-based free energy approach to investigate the interactions of a set of existing drugs with the main protease of the SARS-CoV-2 virus. The drug–residue interaction profile elucidates the amino acids crucial to the drug binding while the detailed energetic insights into the nature of binding shed light on possible new routes to future rational drug design. |
In general, small-molecule drugs have some obvious advantages over biologics, including their oral bioavailability, pharmacological activity, stability, permeability, etc. Relentless research efforts2,3 have led to some progress to find novel small-molecule drugs, or to repurpose existing small-molecule drugs for the treatment of the ongoing COVID-19 disease. An exciting global collaboration, called the COVID Moonshot, has come together for the discovery of new, urgent drug treatment for COVID-19.4 Despite this encouraging development, the discovery and development of new drugs are still associated with a slow pace and substantial costs. Repurposing of existing drugs for the ongoing disease therefore represents an attractive proposition as the safety of these drugs have been demonstrated in clinical trials and clinical applications.
The COVID-19 drug repurposing research can be grouped into four categories: clinical trials, in vivo cell experiments, in vitro protein-binding experiments, and in silico computational studies. As of 23 August 2021, there are 1822 registered COVID-19 clinical studies listed at https://clinicaltrials.gov/ct2/covid_view which have at least one drug intervention. There is a total of 615 drugs in these trials, many of which were initially developed for other diseases. In vivo and in vitro studies have uncovered some interesting drugs but caution must be exercised as the apparent antiviral activities are frequently caused by the drugs interrupting fundamental cellular processes rather than killing the virus or preventing its entry and duplication.5In silico computational research efforts are productive for the structures and interactions of the key proteins, especially at the earliest stages when no structures have yet been reported from experimental studies. Despite an unprecedented number of studies having been published in the last one and half years on computer-aided drug discovery,6 only one drug has arisen from computational studies7 – baricitinib – which has been approved for emergency use to treat COVID-19, in combination with remdesivir. Baricitinib is a repurposed medication: it is a kinase inhibitor originally designed to treat rheumatoid arthritis. As a broad-spectrum antiviral medication, remdesivir was originally developed to treat hepatitis C, and subsequently investigated for Ebola. Clinical trial has shown that the combination of baricitinib and remdesivir reduces the recovery time for hospitalized COVID-19 patients, especially for those requiring oxygen or ventilation.8 But neither alone, nor in combination, has proved to be a curative treatment for the disease.8 The four groups of repurposing studies are not independent. Computational studies, for example, require routine validation from experiments and clinics.
The main protease of SARS-CoV-2, 3CLpro or Mpro, is a key enzyme of the coronaviruses. It plays a pivotal role in processing the polyproteins that are translated from the viral RNA.9 The enzyme has become an attractive drug target because inhibiting its activity would block viral replication. X-ray structures have identified 22 different sites to which small molecules or molecular fragments can bind.10 However, most of the fragments do not show any antiviral activity. The most interesting small molecules in these x-ray structures are those screened from drug libraries in a recent study.11 A high-throughput X-ray crystallographic screening has been performed for two repurposing drug libraries against 3CLpro. The libraries contain 5953 drugs which have been approved or are in clinical trials, from which thirty-seven drugs have been identified which bind to 3CLpro. The effective concentrations of the drugs have been measured in a cell-based assay, at which SARS-CoV-2 infectious particles are reduced by 50% (EC50) (Table 1). Some of these drugs are considered to have antiviral activities, which show ≥100-fold reduction in infectious particles and have a much higher cytotoxic concentration than EC50 values.11 The X-ray structures of these drugs show that they bind at the substrate-binding site or one of the two allosteric sites identified11 (Fig. 1).
Hit | Compound name | PDBea | Tested in antiviral assay | pdb | EC50c (uM) |
---|---|---|---|---|---|
a PDBe ligand code. b The drug binds to 3CLpro covalently and non-covalently. The non-covalent binding mode is used in the current study. c EC50 values from literature.10 “—” means no antiviral activity detected, or no cell assay performed. | |||||
Substrate-binding site | |||||
#1 | Adrafinil | RNW | + | 7ANS | — |
#11 | Fusidic acid | FUA | + | 7A1U | — |
#18 | LSN-2463359 | S8B | − | 7AWU | — |
#20 | MUT056399 | RQN | + | 7AP6 | 38.24 |
#27 | SEN1269 | S1W | − | 7AVD | — |
#34 | Tretazicar | CB1 | + | 7AK4 | — |
#35 | Triglycidyl isocyanurateb | RV8 | + | 7AQJ | 30.02 |
#36 | UNC-2327 | RV5 | + | 7AQE | — |
Allosteric site I | |||||
#15 | Ifenprodil | QEL | + | 7AQI | 46.86 |
#22 | PD-168568 | RMZ | + | 7AMJ | — |
#23 | Pelitinib | 93J | + | 7AXM | 1.25 |
#26 | RS-102895 | R6Q | + | 7ABU | 19.8 |
#32 | Tofogliflozin | RT2 | + | 7APH | — |
Allosteric site II | |||||
#3 | AT7519 | LZE | + | 7AGA | 25.16 |
Fig. 1 Chemical structures of 14 repurposing drugs to SARS-CoV-2 main protease. The drugs are grouped according to the sites they bind at: the substrate-binding site, the allosteric site I, and the allosteric site II (see Table 1). The drugs are also shown in chemical representation bound to 3CLpro (shown in cartoon) at the three binding sites. The three domains (I, II and III) of 3CLpro are shown in light blue, blue and green, respectively, along with some loops and links (white). |
In the current study, we use molecular dynamics simulation and an ensemble-based free energy approach, namely, enhanced sampling of molecular dynamics with approximation of continuum solvent (ESMACS),12,13 to investigate a subset of the aforementioned drugs. For the drug–protein complexes studied here, the binding sites of the protein are well structured, and the binding modes are resolved from crystallography experiments. It is likely that a reasonable prediction can be achieved, although the relatively large size of the substrate-binding site (Fig. 2) may pose a challenge for the conformational sampling and hence the convergence of the free energy predictions. In addition, per-residue free energy decomposition14 and close contacts between drugs and the protein have been performed to elucidate amino acids crucial to the binding and to understand the differential binding modes of drugs.
Fig. 2 Drugs bind at different subsites within the substrate-binding site, which are labelled for one of the drugs (RNW). The conformational changes of the substrate-binding site are induced upon bindings of different drugs. The structures are generated from PDB IDs 7ANS, 7A1U, 7AWU, 7AP6, 7AVD, 7AK4, 7AQJ and 7AQE (Table 1). |
Drug parameterizations were created using the general Amber force field 2 (GAFF2).16 All drugs were electrostatically neutral except FUA which has a net charge of −1e. The AM1-BCC partial charges were assigned using the Antechamber component of the AmberTools package.16 The Amber ff14SB force field was used for the protein, and TIP3P for water molecules. The protonation states of the protein residues were assigned using the reduce module of AmberTools. All systems were solvated in orthorhombic water boxes with a minimum extension from the protein of 14 Å.
The binding affinity calculator (BAC)17 software tool was used to perform ESMACS studies. We employed an ESMACS protocol, which consisted of performing 25 replicas for a total of 10 ns production runs each. The molecular dynamics simulations were conducted using the package NAMD 2.14 (ref. 18) for each of the molecular systems studied. The protocol generates precise and reliable free energy predictions, and has been adequately validated for a diverse set of protein systems.13,19–23 The systems were minimized with all heavy protein atoms restrained at their initial positions, with restraining force constants related to their β-factors in X-ray structures. Initial velocities were then generated independently for each replica from a Maxwell–Boltzmann distribution at 50 K. The systems were virtually heated to 300 K over 60 ps with the NVT ensemble, followed by 2 ns equilibration with the NPT ensemble, during which the restraints on heavy atoms were gradually removed. Finally, 10 ns production simulations were executed with snapshots derived for analysis every 20 ps. A 2 fs time step was used for all MD simulations. NPT simulations were performed with pressure and temperature maintained at 1 bar and 300 K, respectively, during the equilibration and production runs. The simulations were performed on ARCHER2 (https://www.archer2.ac.uk/) and SuperMUC-NG (https://doku.lrz.de/display/PUBLIC/SuperMUC-NG).
Ligand | ΔGset-1ESMACS | ΔGset-2ESMACS | EC50a |
---|---|---|---|
a EC50 data from literature.10 “+”: cell assay performed but no antiviral activity detected at the highest concentration (100 μM) tested; “−”: no cell assay performed. | |||
Substrate-binding site | |||
RQN | −22.16 ± 0.35 | −15.98 ± 0.64 | 38.24 |
CB1 | −21.19 ± 0.48 | −21.09 ± 0.52 | + |
RV8 | −20.70 ± 0.86 | −21.29 ± 0.75 | 30.02 |
S1W | −20.69 ± 0.30 | −18.77 ± 0.90 | − |
FUA | −18.07 ± 0.61 | −12.98 ± 0.94 | + |
RV5 | −17.77 ± 0.91 | −19.07 ± 0.49 | + |
RNW | −13.20 ± 0.44 | −13.57 ± 0.35 | + |
S8B | −12.38 ± 0.26 | −11.69 ± 0.39 | − |
Allosteric site I | |||
93J | −19.86 ± 0.18 | −19.78 ± 0.15 | 1.25 |
RT2 | −18.36 ± 0.43 | −18.59 ± 0.40 | + |
R6Q | −15.33 ± 0.46 | −14.93 ± 0.41 | 19.8 |
RMZ | −13.55 ± 0.39 | −14.05 ± 0.45 | + |
QEL | −12.18 ± 0.48 | −11.78 ± 0.53 | 46.86 |
Allosteric site II | |||
LZE | −18.36 ± 0.32 | −19.60 ± 0.30 | 25.16 |
For the drugs at the allosteric site I, excellent agreement is obtained between the two sets of simulations. The simulations generate not only the same ranking, but the same binding free energies, within error bars, for all of the compounds binding at this site (Table 2). The calculated binding free energies also agree with the experimental measurements for a subset of these drugs for which EC50 values are detected. For LZE at the allosteric site II there are no other drugs to compare with at the same binding site, but its binding free energy is very favourable. This drug has antiviral activity observed in the cell assays.11
To quantify the energetic contribution of each amino acid to the bindings, we have performed a per-residue decomposition analysis14 for the drugs at the substrate-binding site. It is observed that the residues contributing the most to the bindings are clustered with residue numbers between 25 and 27, 41 and 50, 140 and 145, 163 and 173, and 187 and 192 (Fig. 3). While most of these residues contribute to the bindings in a favourable way, some residues show unfavourable contributions to the binding energies. Although there is a total of 37 residues having a contribution of |ΔG| > 0.1 kcal mol−1, only one residue, MET49, is universally presented for all of the drugs at the substrate-binding site. Another residue, MET165, appears for 7 out of 8 drugs, and is merely missed for S8B with a contribution of −0.09 kcal mol−1. 23 of the residues have contributions (|ΔG| > 0.1 kcal mol−1) only for one or two drugs. The contributions from the same residue also vary significantly for the binding of different drugs. HIS163, for example, provides the most favourable contribution for one of the drugs, RQN, with an energy of −1.96 kcal mol−1. Its contributions for three other drugs, S1W, RV5 and CB1 are, however, negligible. It should be noted that the residues do not interact with the drugs independently. HIS163 contributes favourably for the binding of all eight drugs, while its immediately adjacent residue, HIS164, contributes unfavourably for 6 out of 8 drugs studied here.
In addition to the size and accommodation capacity of the subsites, the occurrence of mutations at the binding site should also be considered when designing new drugs or modifying existing drugs. Correlated mutation analysis (CMA) has revealed some key residues which contribute significantly to the protein stability.24 These residues are unlikely to mutate and thus can be considered as key anchoring residues for drug design. Some of these anchoring residues24 overlap with the ones which contribute significantly to the binding from the per-residue analyses. Residue G143, for example, is one of the anchoring residues identified from CMA study; it also contributes significantly to the binding energies of RV8 (−1.09 kcal mol−1) and FUA (−0.49 kcal mol−1). These interactions of drugs with the conserved residues are likely to be maintained within potential 3CLpro variability, and hence reduce the probability of resistance emergence. On the other hand, drug resistance may be present when mutations occur in the residues important to the drug–protein interactions. Mutation of residue N142, for example, has been identified from a sequencing study.25 The residue contributes significantly to the binding of S8B, RV8 and FUA, with binding free energy contributions of −0.87, −0.90 and −0.97 kcal mol−1, respectively. The mutation may confer resistance against these drugs if the drug–residue interactions become less favourable. The interference of the neighbouring residues, however, would complicate the estimations of their individual contributions to drug binding affinities. This problem can be overcome by calculating the overall contributions instead from the neighbouring residues as a whole.
To check the stability of drug binding, the close contacts of the drugs with the protein are monitored (Fig. 4). A close contact is defined when the distance is less than 4 Å between two heavy atoms arising within drug and protein. The number of contacts is a good qualitative indicator of the predicted binding free energies.26 It has a good correlation with the binding free energies from the ESMACS approach, with a correlation coefficient of −0.78 (Fig. 5), indicating that more contacts lead to more negative free energies.
In set-1 simulations, 5 out of 8 drugs have less contacts during the simulations than those identified in the initial X-ray structures, and 2 drugs have more contacts. Only one drug, RQN, maintains roughly the same numbers of contacts as these in its X-ray structure. Most of the contacts from the X-ray structures are not maintained in the simulations. The lost contacts are largely compensated by other pairs of drug and protein atoms which are brought closer during the simulations. In set-2 simulations, all drugs except S1W have similar or less contacts comparing with those in set-1 simulations. This is not unexpected as conformational differences are usually induced when small molecules bind to a protein. A single protein structure is not optimal for the binding of every drug. Consequently, the results from set-2 simulations are less reliable than these from set-1. Some drugs may even drift away from the binding sites identified in the crystallography experiments. We are currently performing very long time scale MD simulations in which binding and unbinding events are investigated. The drug FUA in set-2, for example, is observed to either bind at different binding subsites or is completely unbound in the ensemble simulations, evidenced by its contact distribution extending to 0 (Fig. 4). This is the reason for its less favourable binding free energy predicted in set-2 simulations (Table 2).
The per-residue energy decomposition provides some favourable features of the binding sites, which are important for the building of an energy-based pharmacophore model. The analysis suggests the amino acids crucial for the binding of drugs at the 3CLpro binding sites. The combination of energy-based and structure-based pharmacophore models could provide an improved virtual screening for the initial selection of promising compounds.
Although the X-ray structures have a high degree of similarity for the particular set of drugs studied here, the difference in predicted binding free energies for the drugs RQN and FUA when using individual X-ray structures compared to using repurposed X-ray structures from similar protein ligand complexes, highlights the importance of the local conformations at the binding site. The orientations of some side chains need to be adjusted to better accommodate different drugs. While treating the protein as a rigid entity is still common for most docking studies, induced fit docking approaches have been attempted in which an ensemble of protein conformations is used. The conformations are collected from multiple experiments or more commonly are generated by MD simulations. The close contacts between the drugs and the proteins in the simulations differ significantly from those present in the x-ray structures. Our study provides detailed energetic insight into the nature of drug–protein binding, which may be used to shed light on the design and discovery of potential drugs.
This journal is © The Royal Society of Chemistry 2022 |