Eugene N. Muratova,
Rommie Amarob,
Carolina H. Andradec,
Nathan Brownd,
Sean Ekinse,
Denis Fourchesf,
Olexandr Isayevg,
Dima Kozakovh,
José L. Medina-Francoi,
Kenneth M. Merzj,
Tudor I. Opreaklm,
Vladimir Poroikovn,
Gisbert Schneidero,
Matthew H. Toddp,
Alexandre Varnekqw,
David A. Winklerrst,
Alexey V. Zakharovu,
Artem Cherkasov
*v and
Alexander Tropsha
*a
aUNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA. E-mail: alex_tropsha@unc.edu
bUniversity of California in San Diego, San Diego, CA, USA
cDepartment of Pharmacy, Federal University of Goias, Goiania, GO, Brazil
dBenevolentAI, London, UK
eCollaborations Pharmaceuticals, Raleigh, NC, USA
fDepartment of Chemistry, North Carolina State University, Raleigh, NC, USA
gDepartment of Chemistry, Carnegie Melon University, Pittsburgh, PA, USA
hDepartment of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
iDepartment of Pharmacy, National Autonomous University of Mexico, Mexico City, DF, Mexico
jDepartment of Chemistry, Michigan State University, East Lansing, MI, USA
kDepartment of Internal Medicine and UNM Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, USA
lDepartment of Rheumatology and Inflammation Research, Gothenburg University, Sweden
mNovo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark
nInstitute of Biomedical Chemistry, Moscow, Russia
oInstitute of Pharmaceutical Sciences, Swiss Federal Institute of Technology, Zurich, Switzerland
pSchool of Pharmacy, University College of London, London, UK
qDepartment of Chemistry, University of Strasbourg, Strasbourg, France
rMonash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC, Australia
sSchool of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Australia
tSchool of Pharmacy, University of Nottingham, Nottingham, UK
uNational Center for Advancing Translational Science, Bethesda, MD, USA
vVancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada. E-mail: acherkasov@prostatecentre.com
wInstitute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Japan
First published on 2nd July 2021
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
The response by the research community to the pandemic measured by the number of publications has been substantial. As of April 2021, nearly 125000 research papers on COVID-19 have been annotated in Pubmed2 and more than 14500 preprints have been deposited by the scientific community in MedRxiv or BioRxiv,3 with many more appearing in other preprint servers. Many of these publications reported on extensive structural and proteomic studies of SARS-CoV-2 components, biological screening of chemical libraries, and other experimental investigations that provided valuable data to support multiple computational approaches to COVID-19 drug discovery. Conversely, many computational studies proposed candidates for drug repurposing as well as novel drug candidates, but the overwhelming majority of respective publications reported no supporting experimental evidence. The number of such manuscripts has become so overwhelming that even preprint servers have stopped accepting manuscripts describing purely computational submissions.4 However, comprehensive studies combining computational investigations with experimental validations have emerged as well.
Due to this unprecedented number of studies by both specialists and novices in computer-aided drug discovery (CADD) who embarked on virtual searches for COVID-19 drug candidates, we considered it extremely timely to critically review computational approaches employed in CADD for COVID-19 and the results of their application. We felt it was important to summarize the strategies and best practices of computational drug discovery that have emerged from the analysis of the most impactful publications. We have focused on small molecule drugs, as vaccine development has been reviewed elsewhere.5,6 It worth noting that although multiple effective COVID-19 vaccines have been developed, tested, and distributed with unprecedented speed, their long term efficacy, side effects, and coverage of rapidly emerging SARS-CoV-2 variants are not fully understood. In addition, none of the vaccines developed thus far offered 100% protection to all vaccinated people. It is also important to state that small-molecule DAA agents and vaccines correspond to fully complementary therapy- and prevention-oriented approaches, both aiming to contain COVID-19 pandemics. Thus, as emphasized in the recent Nature editorial7 and argued in a recent historical survey on antiviral drug discovery,8 efforts to develop new antiviral medications should not only continue but accelerate.
In this review, we provide a critical summary of research efforts that emerged in the CADD community in response to the pandemic. The overall flow of this review is shown in Fig. 1.
We start by providing brief overview of small molecule drug discovery and repurposing efforts and key data-rich resources that have been developed in the last year with the focus on SARS-CoV-2 and COVID-19. We follow with the detailed consideration of SARS-CoV-2 proteins critical to the virus’ life cycle and a critical overview of the computational drug discovery studies that can be classified into three major categories: structure-based approaches including molecular docking, molecular dynamics (MD) and free energy perturbations (FEP) (reviewed, in part, recently9); ligand-based methods such as Quantitative Structure–Activity Relationship (QSAR) modeling; and knowledge-mining approaches, including Artificial Intelligence (AI), that led to data-supported nomination and testing of several repurposed drug candidates and drug combinations. In reviewing these approaches and their applications, we emphasize the importance of reliable experimental validation of computational hits and describe the advantages of open drug discovery to accelerate the discovery of novel therapeutics against both the current and possible future pandemics.
We can summarize our analysis of the CADD research literature for COVID-19 as follows:
– The magnitude and urgency of the research response to COVID-19 pandemics highlights the ability of CADD to capture and transform both pre-existing and new data of relevance to the pandemic into actionable drug discovery hypotheses.
– CADD provides a robust framework for open science including knowledge exchange, open-source software implementation, and data sharing, as the nature of the field embodies collaboration between computational, experimental, and clinical scientists, and convergence of multi-disciplinary, goal-oriented approaches toward discovery and development of novel and powerful medicines.
– The expert use of methods and adherence to the best practices of CADD catalyze faster experimental success and enable rapid emergence of valid, experimentally confirmed drug candidates.
We trust that our observations and summaries of the best practice approaches to CADD in the times of pandemic are helpful to all investigators working on COVID-19 as well as other important drug targets. We hope this critical review will prove valuable not only for researchers but also for journal editors by helping them to assess quality and impact of manuscript submissions and media stories on COVID-19 drug discovery.
Molecule | Name | Target | SARS-CoV-2 activity in Vero cells | SARS-CoV-2 activity on other cell types |
---|---|---|---|---|
Remdesivir | RNA-dependent RNA polymerase | EC50 0.77 μM2 | Human epithelial cell culture (EC50 0.01 μM); Calu3 (EC50 0.28 μM)10 | |
EC50 1.65 μM5 | ||||
Apilimod | PIKfyve | EC50 0.023 μM11 | 293T cells (EC50 0.012 μM)12 | |
IC50 < 0.08 μM13 | Huh-7 cells (0.088 μM)12 | |||
A549 cells (IC50 0.007 μM)14 | ||||
GC376 | Mpro (Ki 12 nM)15 | EC50 0.91 μM15 | Not tested | |
EIDD-1931 | RNA-dependent RNA polymerase | IC50 0.3 μM16 | Calu-3 (IC50 0.08 μM)17 |
One of the earliest drug repurposing studies18 identified several previously known antivirals with low μM activity against SARS-CoV-2 virus in Vero cells and possessing a selectivity index (SI) greater than 10,. Those included FDA-approved drugs nitazoxanide (EC50 2.12 μM), remdesivir (EC50 0.77 μM), and chloroquine (EC50 1.13 μM). Although subsequent clinical trials did not deliver a ‘silver bullet’ for COVID-19, remdesivir was eventually authorized for the clinical use.19 Other notable repurposing examples included lumefantrine (EC50 23.50 μM), the natural products lycorine (EC50 0.31 μM) and oxysophoridine (EC50 0.18 μM), where the latter two demonstrated Vero cell activity superior to gemcitabine (EC50 1.24 μM) and chloroquine (EC50 1.38 μM). Another repurposing screen identified niclosamide (IC50 0.28 μM), ciclesonide (IC50 4.33 μM), and tilorone (IC50 4 μM), previously shown to be active against MERS and Ebola. Pyronaridine (IC50 31 μM) was also identified as a SARS-CoV-2 candidate inhibitor, and both tilorone and pyronaridine have progressed into clinical trials.20 The FDA approved antiparasitic, ivermectin (IC50 2.8 μM) also demonstrated significant in vitro activity in Vero cells leading to broad discussions in the literature21 and eventual nomination for clinical trials.22
Progressive growth of assay- and robotic capabilities has enabled large-scale screening campaigns against SARS-CoV-2. For example, a recent study23 used biological activity-based modeling to identify 311 chemicals, of which 99 demonstrated in vitro activity against the virus. In another notable large-scale study, 12000 clinical stage or FDA approved compounds from the ReFRAME library were evaluated in a Vero cell assay.12 As the result, twenty-one hits were identified with promising dose–response readouts. Of those, clofazimine (EC50 0.31 μM) and the kinase inhibitor apilimod (EC50 0.023 μM) were of particular interest. Apilimod was subsequently tested in 293T and Huh-7 infected cells where it demonstrated striking potency (12 and 88 nM, respectively);12 the drug entered clinical trials for COVID-19 in June 2020.24 In July 2020, clofazimine has also advanced into clinical trials as a part of a combination therapy.25
In another study,26 authors demonstrated that SARS-CoV-2 virus can rewire phosphorylation signaling in infected Vero and A549 cells, also suggesting the use of kinase inhibitors, including apilimod. A drug repurposing study14 used a protein interaction map to identify approved and experimental drugs that bind to sigma-1 and 2 receptors (acting as host factors), where the most potent compound, PB28 demonstrated IC90 of 280 nM in Vero cells.
According to DrugBank, more than 680 medications have been in over 3300 clinical trials, including remdesivir, hydroxychloroquine, chloroquine, lopinavir, ritonavir, camostat, ivermectin and baricitinib, among others.27 Unfortunately, despite significant effort toward finding COVID-19 drugs among approved therapeutics, most repurposing studies (including clinical trials) have proved unsuccessful. A recent summary of trends observed across several thousand of COVID-19 therapeutic clinical trials of drug products and antibody-based agents with the total enrolment of over 500000 patients was recently published by the FDA.11 The study came to rather unenthusiastic finding that “the vast majority of trials of therapeutics for COVID-19 are not designed to yield actionable information; low randomization rates and underpowered outcome data render matters of safety and efficacy generally uninterpretable”. This observation, however, does not obviate the need for carefully designed and executed trials involving evidence-supported drug candidates. For instance, Pfizer's SARS-CoV-2 Main protease (MPro) inhibitor PF-0083523113 continues to be clinically evaluated and still provides hope. Moreover, Pfizer recently announced that the company started clinical trials of another, new oral antiviral agent PF-07321332, designed as specific SARS-CoV-2 MPro inhibitor in less than a year.
Along with experimental repurposing screening campaigns, there has been an avalanche of computational drug repurposing studies, especially against SARS-CoV-2 main protease (Mpro) that was the first viral protein with X-ray resolved structure.28 Shortly after the first structure Mpro was deposited into the Protein Data Bank,29 numerous research groups from all around the world started submitting manuscripts describing docking experiments with SARS-CoV-2 Mpro and various drugs, natural products, nutraceuticals, etc., have been annotated as putative hits.
In our observation, many researchers started to use molecular modeling and cheminformatics tools for the first time. Consequently, many were unaware of the best practices of CADD and rigorous protocols required for data preparation, curation, and proper validation of predictions. Arguably, the most common issue was the absence of chemical standardization and curation, leading to the use of incorrect protonation states in the ligands, missing hydrogen atoms, presence of salts, duplicates, inconsistent representations of chemical moieties and tautomers, etc.30 Additionally, some studies employing molecular docking, omitted key steps of protein structure preparation, including removal of water molecules, addition of explicit hydrogens and assignment of accurate protonation states for residues, identification and addition of missing side chains or loops, removal of overlapping atoms and energy-minimization of sidechains among others. Some papers apparently docked their library “directly from SMILES strings”, strongly suggesting neglect of proper compound curation and preparation, which are critical.30 Another common shortcoming was the use of rigid docking, which has significant limitations and may require additional post-processing steps.31 Unfortunately, as mentioned above, many of such papers (frequently accompanied by press releases) made misleading claims about the discovery of COVID-19 cures based solely on computational model predictions.32 Clearly, such statements can only be made after robust experimental and, ideally, clinical validation of computer-generated drug candidates.
Most promising drug candidates that were not FDA approved drugs have been previously known or well-advanced experimental DAA agents. For example, the SARS-CoV Mpro inhibitor GC376 also showed excellent potency against SARS-CoV-2 Mpro (Ki 12 nM) and demonstrated significant activity in Vero cells (EC50 0.91 μM). Another important example is EIDD-1931, a broad-spectrum antiviral, targeting RNA viruses and causing mutations to accumulate in viral RNA. It was shown to inhibit SARS-CoV-2 in Vero (IC50 0.3 μM) and Calu-3 cells (IC50 0.08 μM) and a prodrug version of this molecule was previously reported active against SARS-CoV and MERS-CoV in mouse models. Notably, recent clinical trials33 of the Pfizer's SARS-CoV Mpro inhibitor PF-07304814 (a prodrug form of the aforementioned PF-00835231) generated promising initial results warranting the continuation of the study.13 There is growing understanding that future computational and experimental studies need to place greater focus on the development of novel chemical entities with targeted, tailored activity against SARS-CoV-2 virus. As mentioned above, clinical studies of another Pfizer compound, PF-07321332, have begun: if approved, it could become the first DAA drug developed specifically against SARS-CoV-2. This compound is an example of the focused drug discovery approach enabled by the knowledge of the specific viral target. Thus, continuously evolving knowledge of these targets along with the expert use of current and novel computational approaches to antiviral drug discovery using constantly emerging SARS-CoV-2 and COVID-19 knowledge bases is critical for guiding DAA efforts as discussed in the next sections of this review.
In support of structure based drug discovery, the Diamond Synchrotron source has made available a set of ∼1500 resolved crystal structures of low-molecular weight fragments bound to SARS-CoV-2 Mpro, along with their experimentally estimated binding affinities.37 This and similar efforts resulted in more than 1100 protein structures deposited into the Protein Data Bank (PDB) to date, covering most of SARS-CoV-2 RNA translates.29 Furthermore, the Diamond fragments collection was used as a starting point for collaborative, community-sourced de novo ligand design led by PostEra.38 As the result, more than 1800 of specifically designed compounds have been proposed, synthesized, and screened to date and the results were publicly disclosed.
Unstructured data depositories offer another source of valuable information on the virus and the infection. Thus, most scientific publishers agreed to freely disclose all COVID-19 related papers to the public. Kaggle has made available the COVID-19 Open Research Dataset (CORD-19) containing about 200,00 scholarly articles on new and related coronaviruses, including over 100000 full-text items.15 Similarly, Elsevier, released the free Coronavirus Information Center encompassing more than 30000 papers and book chapters.39
The experimental information of protein–protein interaction (PPI) in SARS-CoV-2-virus represents another invaluable knowledge source. Such PPI networks have been reconstructed for proteins encoded by genes, which expression is altered in SARS-CoV-2-infected human cells organs, model organoids and cell lines. These networks enable to identify hubs (highly connected protein nodes), and bottlenecks (proteins exclusively connecting distinct modules), that represent potentially valuable drug targets for COVID-19.16 A powerful Coronavirus Discovery Resource to visualize such network was developed by the Institute of Cancer Research in the UK.40
Similar approach have been used to construct drug–protein interaction networks, such as Connectivity Map, which has been extensively employed to flag potential COVID-19 therapeutics.16 This approach identifies compounds (including known drugs), which upregulate human genes that are suppressed in cells invaded by SARS-CoV-2. These chemically induced gene expression profiles can be obtained from LINCS L1000 database, which contains information on thousands of perturbed genes at various time points, doses, and cell lines. This approach can be used separately or together with network-based applications to identify possible anti-COVID drugs.16 Examples of such studies are summarized in Table 2 illustrating that COVID-19 targets can be identified from PPI networks, from compound-target interactions, at transcription levels, as well as from pathways and biological processes.
Study | Cava et al.41 | Hazra et al.42 | Karakurt et al.43 | Zhou et al.44 |
---|---|---|---|---|
a NA – Not applicable. | ||||
Source of network | Human PPI network subnetwork from the genes, which are co-expressed with ACE2. Human PPIs were obtained using SpidermiR tool (PMID: 28134831) | Human PPI network from STRING (https://string-db.org) | Metabolic network of bronchus respiratory epithelial cell based on Recon2 (PMID: 23455439), human PPI network from STRING (https://string-db.org) | SARS-CoV-2-human PPIs,41 viral-human PPIs for other coronaviruses, human PPIs from 18 public databases. |
Source of compound-target interactions | Drug-target interactions were obtained from Matador (http://matador.embl.de) and DGIdb (https://www.dgidb.org) databases | STITCH (http://stitch.embl.de) | NAa | Drug–target associations from DrugBank (https://www.drugbank.com), Therapeutic Target Database (http://db.idrblab.net/ttd), ChEMBL (https://www.ebi.ac.uk/chembl), PharmGKB (https://www.pharmgkb.org), BindingDB (https://www.bindingdb.org/bind/index.jsp), Guide To Pharmacology (https://www.guidetopharmacology.org) |
Transcription dataset | Data on transcription in normal lungs was obtained from Cancer Genome Atlas (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga), Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) and Genotype-Tissue Expression (https://gtexportal.org) databases. | Transcription profiles of peripheral blood mononuclear cells from SARS-CoV-1 infected patients (GEO ID: GSE1739) | Transcription profiles from SARS-CoV-2 infected human lung epithelial cells (GEO ID: GSE147507) | Transcription profiles from SARS-CoV-2 infected human lung epithelial cells (GEO ID: GSE147507). Protein expression profile from human Caco-2 cells infected with SARS-CoV-2 (PRIDE ID: PXD017710) |
Pathways and biological processes | Genes correlated with ACE2 are mainly enriched in the sterol biosynthetic process, aryldialkylphosphatase activity, adenosylhomocysteinase activity, trialkylsulfonium hydrolase activity, acetate-CoA and CoA ligase activity | MMP9 showed functional annotations associated with neutrophil mediated immune-inflammation | Matrix metalloproteinase 2 (MMP2) and matrix metalloproteinase 9 (MMP9) with keratan sulfate synthesis pathway may play a key role in the infection. | Co-expression of ACE2 and TMPRSS2 was elevated in absorptive enterocytes from the inflamed ileal tissues of Crohn's disease patients compared to uninflamed tissues, revealing shared pathobiology by COVID-19 and inflammatory bowel disease. COVID-19 shared intermediate inflammatory endophenotypes with asthma (including IRAK3 and ADRB2) |
Potential targets | NAa | Hub-bottleneck node MMP9 | IL-6, IL6R, IL6ST, MMP2, MMP9 | NAa |
Potential drugs | 36 potential anti-COVID drugs. Among possible interesting 36 drugs for COVID-19 treatment, the authors found Nimesulide, Fluticasone Propionate, Thiabendazole, Photofrin, Didanosine and Flutamide | Chloroquine and melatonin targeting MMP9. Melatonin appears to be more promising repurposed drug against MMP9 for better immune-compromising action in COVID-19 | MMP9 inhibitors may have potential to prevent “cytokine storm” in severely affected patients | 34 potential anti-COVID drugs. Among them melatonin was confirmed by observational study of 18,118 patients from a COVID-19 registry. Melatonin was associated with 64% reduced likelihood of a positive laboratory test result for SARS-CoV-2 |
In summary, data accumulated in multiple databases and repositories enable the application of ligand based, structure based, and knowledge mining approaches in support of COVID-19 drug discovery that we discuss below. The role of SARS-CoV-2 targets in guiding DAA drug discovery efforts is discussed in the next section of this review.
The relatively small SARS-CoV-2 genome suggests that most of its 29 encoded proteins should play important roles in host invasion and/or viral replication. Hence, successful inhibition of many of them could lead to useful therapeutics. An insightful recent study has examined the variability of these targets across 58 coronaviruses (CoVs) to support the search for broad spectrum antivirals.47 The authors have also established an interactive web portal48 displaying the 3D structures available for 15 of the SARS-CoV-2 proteins with 19 putative drug binding sites mapped on these structures; this set of binding sites was collectively called a SARS-CoV-2 pocketome. This portal is very useful for scientists interested in analyzing these binding sites as part of the future structure based drug discovery efforts. Computer-aided discovery of drug candidates targeting key coronaviral proteins is discussed in subsequent sections. Herein, we summarize relevant information about the SARS-CoV-2 protein targets that can be explored by computational modeling.
At the whole-genome level SARS-CoV-2 exhibits 79% sequence identity to SARS-CoV and about 50% identity to MERS-CoV. In spite of the relatively modest levels of sequence conservation, CoVs share essential (more conserved) genomic targets. This suggests that repurposing of existing antivirals and, or rational development of novel DAAs using the wealth of information collected from previous drug discovery efforts, both represent promising avenues; both approaches are discussed below in greater details.
Viral proteins can be grouped into three main functional categories: attachment and penetration into host cells; viral replication and transcription; and suppression of the host immune response. Although the SARS-CoV-2 replicative and host invasion mechanisms are not yet fully understood, rapid structure determination of many SARS-CoV-2 proteins from all three groups enables structure-based drug discovery. Table 3 summarizes druggable sites in experimental structures of SARS-CoV-2 proteins that can be exploited using a wide range of CADD methods. We provide brief functional description of SARS-CoV-2 proteins listed in Table 3.
ID | Protein | Target | PDB | Ligand | Ref. |
---|---|---|---|---|---|
1 | Nsp1 | Nsp1/ribosome 40S interaction interface | 6ZLW | NA | 49 |
2 | Phosphatase | ADP-ribose binding site | 6W02 | ADP-ribose | 50 |
3 | PLpro (nsp3) | Active site | 7JIW | PLP_Snyder530 | 51 |
4 | Mpro (nsp5) | Active site | 6W63 | X77 | 52 |
5 | Mpro (nsp5) | Dimerization interface | 5RFA | Fragment x1187 | 53 |
6 | Primase (nsp7) | nsp7/nsp8 interaction interface | 6XIP | NA | |
7 | Nsp9 | Peptide binding site | 6W9Q | NA | 54 |
8 | Nsp10 | Predicted pocket, not annotated | 6ZCT | NA | 55 |
9 | RdRp (nsp12) | NiRAN domain | 6XEZ | ADP-Mg2+ | 56 |
10 | RdRp (nsp12) | Active site | 7BV1 | NA | 57 |
11 | RdRp (nsp12) | NTP entry site | 7CTT | NA | 57 |
12 | RdRp (nsp12) | Nsp12-Nsp7/Nsp8 interaction site | 7BV1 | NA | 58 |
13 | Helicase (nsp13) | ATP/ADP binding site | 6XEZ | NA | 56 |
14 | Helicase (nsp13) | DNA/RNA binding site | 6ZSL | NA | 59 |
15 | Endoribonuclease (nsp15) | Catalytic site | 6WXC | Tipiracil | 60 |
16 | 2′-O methyltransferase (nsp16) | RNA binding site | 6WKS | RNA cap | 55,61 |
17 | 2′-O methyltransferase (nsp16) | Active site | 6YZ1 | Sinefungin | 55 |
18 | 2′-O methyltransferase (nsp16) | Allosteric site | 6WKS | Adenosine | 61 |
19 | Spike (post-fusion) | HR2 linker motif | 6M3W | HR2 motif | 62 |
20 | Spike (post-fusion) | S2 HR1/HR2 bundle fold | 6M3W | NA | 62 |
21 | Spike (pre-fusion) | S2 U-turn loop | 6NB6 | NA | 62 |
22 | ORF3a | Predicted pocket, not annotated | 6XDC | NA | 63 |
23 | ORF8 | Predicted pocket, not annotated | 7JTL | NA | 64 |
24 | ORF9b | Lipid binding site | 6Z4U | PEG lipid | 65 |
25 | Nucleoprotein | RNA binding site | 6M3M | NA | 66 |
Following release of the viral RNA into the host cytoplasm, two open-reading frames translate the viral RNA into two overlapping co-terminal polyproteins, pp1a and pp1ab, containing the non-structural proteins (nsps) (1–16), involved in immune suppression, replication, and transcription of the RNA.
Nsp1 suppresses host gene expression, thus weakening cellular antiviral defense mechanisms, including the interferon response.49 A recently resolved crystal structure of nsp1 bound tightly to the mRNA entry channel of the 40S ribosomal subunit (PDB: 6ZLW) suggests that blocking this interaction (ID = 1 in Table 3) could help reactivate the host immune response against SARS-CoV-2.
Nsp3 is a multifunctional protein comprised of several distinct domains, some with papain-like protease (PLpro), activity while others play important complementary roles. Thus, the phosphatase domain of nsp3, also referred as MacroD (ID = 2), is believed to interfere with the immune response by acting as a ADP-ribose phosphatase to remove ADP-ribose from host proteins and RNAs.50 The recently reported crystal structure of a liganded MacroD (PDB: 6W02) provides an important avenue for rational development of MacroD-directed inhibitors that could restore host immune capabilities.
PLpro (nsp3) (ID = 3) and main protease Mpro (nsp5) (ID = 4) are enzymes that carry the critical upstream function of cleaving mature nsps from the pp1a and pp1ab polyproteins, following their initial translation. Both protease targets are under intensive investigation as discussed in great detail below. It is important to note that Mpro is a stable homodimer and its dimerization interface (ID = 5) may be an important site for targeting this critical viral enzyme.53
Nsp7 and nsp8 form a primase complex, involved in the RNA synthesis pathway and required for enhanced functionality of RNA-dependent RNA polymerase RdRp.67 A recently published crystal structure of nsp7 complexed with the C-terminus of nsp8 (PDB: 6XIP) identified several potentially druggable pockets in the dimerization interface (ID = 6) that could potentially be used to design interaction inhibitors capable of suppressing SARS-CoV-2 replication.
The exact role of nsp9 in SARS-CoV-2 biology is not yet fully defined, but its structural homolog in SARS-CoV species suggests that the protein may be essential for viral replication. To be functional, nsp9 needs to form an obligate homodimer via its conserved “GxxxG” motif (ID = 7). Notably, disruption of key residues in this motif in related coronaviruses resulted in reduction of viral replication.54
The replication-transcription complex (RTC) represents the major viral assembly responsible for RNA synthesis, replication, and transcription. The RTC consists of RNA-dependent RNA polymerase (RdRp, nsp12), the primase complex (nsp7–nsp8), and helicase (nsp13) that combine to maintain optimal functioning of the replication machinery. The RTC provides numerous opportunities to inhibit SARS-CoV-2 replication. In particular, RTC activity relies heavily on RdRp, an indispensable enzyme in the life cycle of all RNA viruses.68 RdRp supports the transcription and replication of viral RNA genome by catalyzing the synthesis of viral RNA templates to produce genomic and subgenomic RNAs.69 SARS-CoV-2 RdRp contains an extended N-terminal nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain (ID = 9), and, although its exact role is still unknown, its enzymatic activity is considered critical for viral propagation.56 Recent cryo-EM structures of the NiRAN domain (PDB: 6XEZ) revealed a potential allosteric site that may be a suitable target for drugs that disrupt the function of NiRAN and RdRp. In the resolved structure of the complex, ADP is located in the active site of the NiRAN domain, highlighting a potentially druggable area, although further investigations will be required to determine the exact NiRAN activity as well as its preferred substrate.
The core and main structural motifs of RdRp are highly conserved between SARS-CoV-2 and SARS-CoV species (96% sequence identity) including sharing key residues in their active sites.70 An apo-structure of RdRp has pockets in the catalytic chamber (active site) (ID = 10) where the RNA template needs to bind for replication (PDB: 7BV1). These pockets could be targeted by small molecules to impede RNA binding and disrupt RNA replication by the RTC. Situated next to the catalytic chamber, the NTP entry tunnel (ID = 11) guides new NTP into the extending RNA primer. Recently resolved structure of the RdRp (PDB: 7BW4) demonstrates that the tunnel could be blocked to interrupt elongation of the RNA duplex.68 As previously mentioned, the primase complex (nsp7–nsp8) also interacts with RdRp to significantly enhance polymerase activity of the RTC.58 Thus, the critical interaction between nsp7–nsp8 complex and RdRp (ID = 12) (PDB: 7BV1) represents another rational drug target. In the early phase of the pandemic, remdesivir (RDV) was considered a potential treatment for COVID-19, as studies reported that it was highly effective in inhibiting growth of SARS-CoV-2.18 RDV targets the RdRp to arrest RNA synthesis, thus highlighting the key role of RdRp in replication of SARS-CoV-2 and its potential as a druggable target.71
The helicase (nsp13) facilitates unwinding of RNA helices to prepare a template strand for replication and to hydrolyze various NTPs.72 There are two major functionalities of helicase that could be therapeutic targeted: the ATP binding site (PDB: 6XEZ) and RNA binding site (PDB: 6ZSL).56
The function of a nidoviral RNA uridylate-specific endoribonuclease NendoU (nsp15) in the viral replication cycle is also not fully understood. Nsp15 may be involved in interfering with host response and/or in viral replication by processing RNA. Nonetheless, the role of NendoU protein is considered essential,73 and its crystal structure (with the active site occupied by citrate, PDB: 6WXC) provides an attractive starting point for structure-based drug design.
2′-O-RNA methyltransferase protein (MTase, nsp16) is also involved in viral RNA replication. MTase ensures the integrity of viral RNA by adding to its 5′end a cap fragment consisting of a N-methylated GTP and C2′-O-methyl-ribosyladenine moiety. The cap ensures adequate RNA integrity and stability for translation.55 Based on the available crystal structure of MTase, several potentially targetable sites have been identified: a positively charged RNA binding canyon (capping site, PDB: 6WKS), S-adenosylmethionine binding site (ID = 17, PDB: 6YZ1) and a unique allosteric site (ID = 18) found to be occupied by adenosine in the 6WKS crystal structure.61
The spike glycoprotein (S protein) initiates the attachment and penetration of SARS-CoV-2 into host cells and consists of two subunits: S1 and S2. The former is responsible for binding the virus particle to the host's angiotensin-converting enzyme 2 (ACE2) receptor, while the latter facilitates the fusion of the viral and host cellular membranes. The receptor-binding domain (RBD) of S1 is the only exposed part of the virus and, therefore represents an exceptional targeting opportunity (described in detail in the following sections).
The S2 subunit of S protein also presents opportunities to inhibit attachment of SARS-CoV-2 to host cells. S protein undergoes significant structural rearrangements upon binding to ACE2 to allow fusion of host and viral membranes. Thus, disrupting S protein from reaching its stable fusion conformation could be a viable therapeutic approach. A linker needs to bind in a cavity upstream of the heptad repeat 2 (HR2) in S2, and this positively charged cavity represents a rational surface target (ID = 19). Moreover, small pockets along the HR1–HR2 six-helix bundle in the post-fusion state (ID = 20) could be targeted to prevent S protein from forming its fusion core (PDB: 6M3W). Similarly, the S2 U-turn loop in the pre-fusion state (ID = 21) could also be targeted by small molecules to hamper S protein's appropriate refolding (PDB: 6NB6).62
The SARS-CoV-2 virus evades the host immune system through an intricate network of interfering proteins. Thus, accessory protein 9b (ORF9b) is another virulence factor that may suppress type I interferon responses by associating with TOM70 human protein (translocase of outer membrane 70). This reduces the development of innate and adaptive immunity.65 A recently resolved crystal structure of ORF9b (PDB: 6Z4U) revealed the presence of a lipid binding site (ID = 24) that could be relevant for targeting with small molecules.
Finally, the nucleocapsid protein (N protein) of SARS-CoV-2 virus plays a structural role in protecting viral RNA. The N protein enhances the efficiency of virion assembly by binding to viral RNA to form functional ribonucleocapsid (PDB: 6M3M).74 Therefore, blocking RNA binding to the N protein may disrupt the critical RNA packing event.66
The valuable structural information on SARS-CoV-2 proteins generated to date has identified up to 25 potential target sites for rational drug discovery campaigns. Notably, this list is constantly evolving with more viral proteins and protein complexes qualifying as potential targets. Additional potentially targetable sites in SARS-CoV-2 proteins and complexes are being discovered and researched by CADD methods. Furthermore, various cryptic target sites on SARS-CoV-2 proteins represent another important targets for structure-based drug discovery.75 So far, these has been identified in nsp10, ORF3a, and ORF8 proteins from Table 3 (ID= 8, 22, 23) that do not exhibit distinct druggable sites on their surfaces.63 By combining Molecular Dynamics (MD) simulations with target site prediction tools, one could identify such cryptic protein pockets and develop inhibition strategies for SARS-CoV-2 proteins that are otherwise deemed non targetable. A more detailed discussion on this topic will be presented in a later section on molecular dynamics simulations for discovery of cryptic pockets.
Although this section has focused on SARS-CoV-2 protein targets for drug discovery, substantial efforts are underway to repurpose or discover drugs acting on human proteins that play significant roles in SARS-CoV-2 infection. Thus, in a seminal work by Gordon et al.,14 UCSF researchers expressed 26 of the 29 SARS-CoV-2 proteins and used them as baits in a mass-spectral proteomics experiment to identify 332 critical interactions with human proteins. Subsequently, cheminformatics and text-mining tools identified 66 human proteins that could targeted by 69 approved and experimental drugs. A subset of those identified by docking experiments was assessed in multiple viral assays. Ultimately, two series of host-directed pharmacological agents (inhibitors of mRNA translation and the sigma-1,2 receptor regulators) demonstrated significant antiviral activity.14
In summary, detailed structural information on both static and dynamic pockets in viral proteins and PPIs provide significant opportunities for structure-based drug discovery.
While no repurposed or novel SARS-CoV-2 inhibitors have yet been identified with SBDD tools, an important trend has emerged that involves applying supercomputing resources to COVID-19 drug discovery. Early work by Smith and Smith76 employed the world's largest supercomputer – the IBM SUMMMIT to screen the SWEETLAND library consisting of 8000 drugs and natural products against the complex of SARS-CoV-2 Spike protein and human ACE2 receptor. The computationally demanding replica-exchange MD simulations were combined with ensemble docking and resulted in identification of 77 candidate drugs, of which five were approved therapeutics (pemirolast, isoniazid pyruvate, nitrofurantoin, ergoloid, and cepharanthine) that constituted putative treatment options for COVID-19. While the work by Smith and Smith has received broad coverage,32 the proposed repurposing candidates have not been properly validated nor confirmed by experiments. Moreover, ergoloid is a mixture of three different compounds but there was no indication which of them was identified. Such modest outcome from 200 petaflops of computational power, together with a notable lack of validation of the repurposing hits, might suggest that it would be more effective to simply screen relatively small drug libraries (e.g., a few thousand compounds) in a wet lab. In support of this notion, recent high-throughput repurposing campaigns conducted by NCATS and leading academic groups12 resulted in a number of attractive repurposing candidates that demonstrate potent inhibition of SARS-CoV-2 virus, as described in earlier sections of this review. However, computational resources are still very important in virtual screening campaigns that aim to identify novel chemical entities as potential COVID-19 therapeutics. This scenario, which promises to design or discover bespoke drugs with greater potency than repurposed drugs, needs to work in much larger chemical spaces that are currently inaccessible to experimental screening methods.
Below we summarize expert SBDD approaches that have been applied to SARS-CoV-2 targets and discuss recent trends in SBDD that aim at more rigorous, computationally efficient, and affective COVID-19 drug discovery.
On the other hand, the use of rigorous SBDD tools significantly facilitated our knowledge about SARS-CoV-2 target proteins including their dynamic behavior, induced ionization states and plasticity among other major factors potentially influencing ligand binding. The SARS-CoV-2 pocketome portal48 mentioned above can be used to visualize the details of the binding sites within individual target structures described below.
Thus, SARS-CoV-2 PLpro active site is centered on the catalytic triad of C111-H272-D286, which cleaves the replicase polyproteins at three specific sites featuring a conserved LXGG motif.77 The motif residues are labeled based on their relative position within the cleavage site. Position P1 is closest to the cleavage site, followed by P2, P3, and P4 at the end of the site, as shown in Fig. 2.
The catalytic site of PLpro can be divided into different sub pockets identified by the residue recognized at each position. Flexibility in the PLpro active site complicates rational SBDD. Notably, the loop formed by Tyr268/Gln269 is highly flexible and adopts a closed conformation via an induced-fit mechanism by interaction with specific inhibitors.78 Thus, the active site cavity can change from an open to closed state depending on the co-crystallized ligand.
Detailed structural information on active site of PLpro has been used to design a potent inhibitor GRL-0617 (PDB: 7JIW), which binds to the active site of PLpro and inhibits its enzymatic activity. Due to the proximity of the active site to the S1 ubiquitin binding site, it was suggested that GRL-0617 could also inhibit the interaction of PLpro with ubiquitin-like protein ISG15 responsible for regulating host innate immune response.51 Other inhibitors with similar modes of action, such as the PLP_Snyder series and VIR250/VIR251, are also currently under investigation.79 Although GRL-0617 and the PLP_Snyder series do not explicitly interact with the P1 catalytic site, both compounds bound to the active site at P3-P4 and GRL-0617, exhibited potent activity (IC50 of 2.2 μM).
The active site of, perhaps, the most prominent SARS-CoV-2 target protein Mpro is centered on the catalytic dyad of Cys145-His41. It cleaves the replicase polyproteins at 11 specific positions, using core sequences in the polyproteins to determine the cleavage sites.80 The recognized residues on the polyproteins are named depending on their relative position to the cleavage site (see Fig. 3). Position P1 corresponds to the residue before the cleavage site up until the N-terminal (P2, P3, P4, P5), while position P1′ corresponds to the residue immediately after the cleavage site up until the C-terminal (P2′, P3′, P4′, P5′, etc.).81 Therefore, the active site of Mpro can be partitioned into different pockets, depending on the residue occupancy at each position. Rational drug design must again take into consideration the flexible nature of Mpro active site. Structural rearrangements of Met49 and Gln189 in the P2 position affects the size of the pocket for optimal occupancy. Therefore, special scrutiny must be taken when designing Mpro-specific inhibitors to fit the highly flexible P2 pocket in either the open or closed state. The latest study also indicated that the ionization state of Mpro active site residues could also be context-dependent which further complicated SBDD efforts with this protein.82
Nonetheless, based on the recognition sequence of the polyproteins, ligands can be effectively designed to occupy the same pockets in SARS-CoV-2 Mpro active site. Recent X-ray structures of Mpro with potent inhibitors revealed key features of various different binding modes to the enzyme. Thus, P1 and P2 pockets must be occupied to inhibit Mpro activity, as all current ligands interact with Mprovia those two sites. Thus, early SBDD efforts with Mpro target site allowed the development of a potent covalent inhibitor, 11b (partly occupying the P1′ pocket) that exhibited IC50 values of ∼50 nM in vitro.83 The same molecular scaffold was later combined with different covalent warheads to generate more soluble Mpro inhibitors such as GC376, occupying pockets P1 and P2. This candidate had an in vitro IC50 ∼ 500 nM and is undergoing more extensive evaluation. More recently, a more potent derivative, PF-00835231 (with a prodrug PF-07304814), was developed with nanomolar potency against Mpro. Conspicuously, it also exhibited potent in vitro suppression of SARS-CoV-2 as a single agent and in combination with remdesivir.33
De novo drug design efforts exploiting the Mpro active site received a significant boost in early 2020 when scientists in the UK Diamond Center made available >70 experimentally resolved structures with diverse chemical fragments non-covalently bound to the Mpro active site.37 Importantly, these structures have been crowdsourced for de novo design of fragment-based Mpro inhibitors. This resulted in >10000 submissions from all around the world.84 Of these, ∼ 1000 compounds have been synthesized and experimentally tested, resulting in several low- to sub-micromolar hits that await rigorous evaluation. The most active fragment-derived derivatives are being further refined using another computational crowdsourcing campaign, folding@home.85
A recent seminal work by Lyu et al.86 demonstrated that expanding virtual screening to include large ‘make-on-demand’ chemical libraries yields highly potent compounds and new scaffolds not present in available chemicals libraries. Importantly, this study used extensive computational resources but could only process 170 million molecules. However, the number of accessible small molecules to date is numbered in the billions. Potential synergy between massive chemical libraries, such as ZINC15,87 and supercomputing facilities, such as SUMMIT at the Oak Ridge National Laboratory, have been identified in a recent study.88 Enhanced sampling MD and ensemble docking with AutoDock-GPU was applied to eight SARS-CoV-2 target proteins. This achieved exhaustive docking of 1 billion compounds against the 8 targets in under 24 hours. Unfortunately, as noted above, this extensive computational study was not followed by the experimental evaluation, so the value of the practical value of the identified hits is yet to be determined. However, this study highlights the previously unattainable boundaries of molecular docking that have emerged in the time of pandemics.
Ligand docking with template-based approaches, shown to often outperform conventional docking, were used to discover novel inhibitors of SARS-CoV-2 Mpro. Thus, LigTBM89 was employed to obtain a model of SARS-CoV-2 Mpro active site in complex with a low μM noncovalent inhibitor characterized crystallographically in the COVID Moonshot initiative.89 Unlike conventional docking, template-based methods are particularly useful because they do not require detailed binding site information. They also provide measures of model quality based on the similarity between the target and the template. Template-based approaches can be readily applied to modeling interactions between inhibitors and SARS-CoV-2 viral and human targets relevant to COVID-19, as their structural coverage in PDB is on exponential trajectory.
The already outlined, the massive increase in SARS-CoV-2 structural information provides valuable inputs for MD simulations. In mid-February, just when cases in the US were very low, the McLellan group developed the first cryoEM dataset of the SARS-CoV-2 main infection machinery, the spike protein.91 Its early release set the stage for the first SBDD efforts using that key target. Subsequent work by several groups established strong methodological frameworks for the construction and simulation of the glycosylated spike protein,92 including the need for long MD runs (μS) in order to reveal the active participation of glycans in the spike opening motions.93 Additionally, a large-scale simulation of a patch of viral membrane containing four spikes, coupled with data from cryoEM, indicated that the spike stalk has joints that enable it to undergo hinge bending motions.94
In addition to the ability to explore orthosteric and allosteric binding pockets, an interesting recent application of MD simulations is the analysis of hidden (cryptic) binding sites. These sites are particularly useful for the design of compounds that have enhanced selectivity or resistance profiles.95 Because MD simulations explore the low lying energy landscape around the minimum energy, high-resolution static structure from x-ray crystallography or cryoEM, simulations are increasingly being used to discover these cryptic pockets.9697 MD has identified cryptic pockets for both SARS-CoV-2 spike protein (at or near joints or hinges in the protein)98 and Mpro (near the active site and at the allosteric site), though these have not yet been experimentally validated.
Finally, MD simulations can identify potentially useful pharmacophores or targetable epitopes. For example, simulations of truncated human ACE2 in complex with the spike receptor binding domain generated a topological map of the key interactions. It further suggested the importance of rigidity at the binding interface,99 molecular details that can inform on the design of peptidomimetics, for example. MD simulations of the full length ACE2 embedded in the host cell membrane indicated an unexpectedly large degree of flexibility in the linker domain. This may provide another avenue of exploration for small molecules that disrupt mechanical processes related to the virus-cell fusion.100
Another potentially impactful study used MD to explore details of molecular complexes between the Spike protein and nicotinic acetylcholine receptors in the muscle and brain.101 These simulations provided support for the nicotinic hypothesis and provided a molecular basis for receptor subtype specificity. These findings may facilitate development of compounds selective for the α7 subtype as a way of blocking the interaction. Undoubtedly there will be many additional simulation-based studies that contribute to therapeutic programs against COVID-19.
The explicit use of quantum mechanical (QM) methods can aid solving the scoring problem and can provide more accurate estimates of binding affinity.102 This is especially important in cases involving metal ions, covalent bond formation, strong polarization and charge transfer effects, halogen bonding, etc.102 However, accurate QM calculations are very computationally demanding. Conventional Density Functional Theory (DFT) method scales nominally as O(N3), N being a measure of the system size. Wave-function based post-Hartree–Fock methods could scale even worse: O(N4–N7). The most popular strategies for addressing this challenge include hybrid QM/MM methods that partition the protein–ligand system such as only small most important region is treated with QM (e.g., ONIOM or QM/MM) and semiempirical and tight-binding methods that are applicable to thousands of atoms but need parametrization to overcome their inaccuracies.103
To date QM studies related to COVID19 have focused on reaction mechanisms and substrate specificity of the SARS-CoV-2 Mpro enzyme. Thus, Ramos-Guzmán et al.104 presented a detailed QM/MM analysis of the proteolysis reaction catalyzed by Mpro, modelling different states along the reaction pathway. These calculations were consistent with recently reported kinetic data for SARS-CoV-2 Mpro. Both studies presented a detailed analysis of key protein interactions and the critical importance of the P1/P1′ pockets in the design of potent and specific inhibitors.
Hatada et al.105 employed a fragment molecular orbital (FMO) interaction analysis of the complex between the SARS-CoV-2 Mpro and its peptide-like inhibitor N3 (PDB ID: 6LU7). They computed the contributions of different residues and elucidated the nature of interactions in this complex. Similarly, Ramos-Guzmán et al.104 identified the important role of His41 and Cys145 in the design of covalent inhibitors of SARS-CoV-2 Mpro. Furthermore, Khrenova et al.106 used hybrid QM/MM MD simulations to derive a simple descriptor, based on the Laplacian of the electron density and the electron localization function, that discriminated between covalent and non-covalent complexes. Cavasotto et al.107 used semiempirical PM7 calculations to rescore docking to the SARS-CoV-2 Mpro, PLpro, and spike glycoprotein, while Adhikari et al.108 used large-scale DFT calculations to analyze interactions in the RBD domain of the spike protein.
The modest contribution of QM studies to the body of literature covered in this review highlights the slow pace of these approaches. Therefore, the ability of QM methods to contribute to the development of therapies for COVID-19, under the time constraints of the pandemic, is quite limited. However, very recent, and exciting developments in AI and ML have the potential to greatly enhance the role of QM methods in drug discovery and development. Substantial progress has been made in the development of general-purpose atomistic potentials using ML, in particular, using deep neural networks (DNN).109
The ANAKIN-ME (or ANI for short) method110 is one example of transferable DNN-based molecular potentials. The key components of ANI models include the selection of diverse training data with active learning, non-equilibrium sampling of 3D conformations, and atom-centered descriptors to represent molecules for learning.111 The ANI-1ccx model was built from energies and forces of ∼60000 small organic molecules (constituted of C, H, N and O atoms), considering non-equilibrium molecular conformations, using 5 million DFT (wB97x-D/DZ) and 0.5 million DLPNO-CCSD(T)/CBS calculations. These benchmark studies demonstrated the ANI-1ccx model to be within 1–2 kcal mol−1 of the reference (and extremely computationally demanding) Coupled Cluster calculations and to exceed the accuracy of DFT in multiple applications.112 The Atoms-In-Molecules neural Network or AIMNet improves the performance of ANI models for charged states and continuum solvent effects.113
The recently-developed ANI-2x model supports three additional chemical elements: S, F, and Cl. ANI-2x underwent torsional refinement training to better predict molecular torsion profiles.114 These new features open a wide range of new applications, including receptor–ligand systems, as they now cover 90% of drug-like molecules. Consequently, Lahey et al.115 demonstrated that by using the ANI potential to represent intramolecular interactions of ligands in protein pockets, both binding poses and conformational energies could be accurately calculated.
The NSF Molecular Sciences Software Institute (MolSSI), in collaboration with BioExcel, has set up a centralized hub and file sharing service for COVID-19 applications. It will connect scientists across the global biomolecular simulation community. The COVID-19 Molecular Structure and Therapeutics Hub also improves connection and communication between simulation, experimental, and clinical data investigators.90
The ANI-2x model was used to generate two public datasets, ANI–FDA Drugs and ANI–CAS Antiviral, for SBDD research for COVID-19.116 ANI–FDA Drugs contains low-energy conformers, tautomers, and dipole-consistent partial atomic charges for 6433 FDA approved and investigational drugs. It consists of 32036 tautomeric structures and approximately 3 million conformers. ANI-CAS Antiviral contains 67167 tautomeric structures and ∼6.6M conformers for 20306 molecules from the CAS Antiviral database.117 Axelrod and Gomez-Bombarelli118 used semi-empirical tight-binding density functional theory (GFN2-xTB) to compute minimal conformers for 278622 molecules that have been tested for in-vitro inhibition of SARS-CoV-related assays in PubChem. These recent developments are bound to improve the accuracy of both ligand representation and scoring functions used in virtual screening of chemical libraries against SARS-CoV-2 targets.
Consistent with the Best Practices of CADD,120 consensus filtering and post-processing of GLIDE hits with a 3-feature pharmacophore generated from the active site of Mpro was employed (see Fig. 4).
This protocol enabled the identifications of 211 compounds highly ranked by both GLIDE and pharmacophore model that were selected for experimental evaluation. A continuous fluorescence resonance energy transfer (FRET)-based assay using recombinant Mpro allowed reliable and fast identification of small molecule inhibitors.121 Recombinant his-tagged protein was purified from E. coli lysates by Ni2q binding chromatography following protocols for SARS-CoV-2 Mpro108 >95% pure samples of the 211 selected compounds were acquired from vendors and tested in the FRET assay with serial dilutions. Ultimately, 25 molecules were confirmed as active, with IC50 values in the range 10–100 μM, a respectable 12% hit rate for the Deep Docking method.
Notably, eight top-scoring ZINC compounds from the original Deep Docking paper were also evaluated by a third-party group resulted in identification of two low micromolar hits for SARS-CoV-2 Mpro.122
Importantly, these results identify the need for the use of stringent methods and consensus protocols, relying on a larger number of more diverse CADD and experimental approaches discussed below.
These NCATS datasets were consequently processed within this workflow using three different families of descriptors: fingerprints; pharmacophores; and physico-chemical properties. Starting from 22 different ML algorithms, six best performing algorithm/descriptor/assay combinations were selected, and voting-based consensus models were implemented on the REDIAL-2020 server for most of the assays (except AlphaLISA and ACE2). When tested on the external data, the REDIAL-2020 models correctly predicted 24 out of 39 published compounds for the CPE assay,126 15 out of 21 CPE actives from the ReFRAME library,12 and four out of the six Mpro inhibitors.127
Comparisons of a large number of anti-SARS-CoV-2 active compounds from the literature126 highlight frequent inconsistencies and discrepancies between different experimental measurements. For instance, out of 9 compounds tested in 6 published CPE assays,20,128,129 only remdesivir was active across all studies. As noted in the beginning of this review, the rush to publish initiated by the urgency of the COVID-19 pandemic has resulted in an unprecedented number of communications in peer-reviewed sources and media.130 Hence, it is particularly important to obtain an independent confirmation of anti-SARS-CoV-2 activities using alternative approaches. A recent study131 provides an example of such confirmatory evaluation of SARS-CoV-2 DAAs predicted by REDIAL-2020 with independent ligand-based virtual screen. From an initial set of 9 “chloroquine-like” drugs, zuclopenthixol, a typical antipsychotic, and nebivolol, an antihypertensive beta-adrenergic blocker, were identified as efficient inhibitors of SARS-CoV-2 infection with EC50 values in low micromolar range (see Table 4). The anti-SARS-CoV-2 activity of the antimalarial drug amodiaquine20,129 was also confirmed, and its metabolite, N-mono-desethyl amodiaquine, also appeared active and had a notable half-life of 21 days. Furthermore, two additional independent experimental evaluations were conducted, both of which confirmed zuclopenthixol and nebivolol as potential therapeutic agents for the treatment of incubation and early stage COVID-19 infections. The REDIAL-2020 platform123 can be accessed from any web browser; it accepts SMILES, drug names (e.g., generic or trade names), or PubChem IDs as an input, and generates predictions against 11 assays, with the top compounds from the NCATS training set, ranked by the corresponding chemical similarity; applicability domain was estimated for each assay.
Compound | EC50 (μM) | EC50 (μM) | C max (μM) | % oral | t 1/2 (hours) |
---|---|---|---|---|---|
Amodiaquine | 5.4 | 0.13 | 0.13 | 29 | 7.9 |
N-Mono desethyl amodiaquine | 4 | N/A | 2.5 | N/A | 500 |
Nebivolol | 2.8 | 2.72 | 0.02 | 12 | 10 |
Zuclopenthixol | 0.015 | 1.35 | 0.03 | ∼50 | 20 |
In the early days of drug discovery for SARS-CoV-2 no experimental data were available, and therefore, the initial studies were based on the prior data for related pathogens and hence chemography helped developing a global overview of the coronaviral DAA agent landscape. For instance, Horvath et al.134 have prepared several GTMs representing previous medicinal chemistry efforts to target CoVs. All CoV-associated molecules and antiviral DrugBank135 entries were projected onto seven maps hosting over 700 predictive activity landscapes.136 The list of approved or pending drugs associated with an “antiviral” label in DrugBank annotated the maps and fixed specific residence areas corresponding to compounds under clinical evaluation against SARS-CoV-2 (see Fig. 5). This framework, presenting the density distribution of CoV DAA agents, helped to highlight structural relatedness between compounds of different categories. Thus, similarity between umifenovir and SARS-CoV Mpro-inhibiting indole esters raised a new hypothesis that umifenovir might also act on viral proteases.
Fig. 5 Pool of 1000 compounds predicted to inhibit the 3CL proteinase of the novel SARS-CoV-2 (red) mapped against the SARS-CoV (betacoronavirus) compounds (blue). Location of several “antiviral” DrugBank molecules color-coded by their approval status (not-yet approved in red) is shown. Reproduced from ref. 134 with permission from the WILEY, copyright 2021. |
Contemporary generative approaches usually build on deep neural networks (DNN),139 aiming to model the underlying distribution of a given set of molecules and, by sampling from the modelled distribution, construct novel chemical entities.140 Recurrent neural networks (RNNs) with long short-term memory (LSTM),141 as well as variational autoencoders,142 generative adversarial networks (GANs),143 graph neural networks (GNNs),144 and other network architectures145 have been explored. These methods are trained using algorithms that are successful for language analysis. Accordingly, for the purpose of molecular design, the training molecules are represented in terms of string notations, most often as simplified molecular input line entry systems (SMILES strings). Importantly, generative DL models automatically derive internal representations of SMILES, without relying on human-engineered molecular descriptors or reaction schemes. The generative model captures the syntax of these training molecules and generates new SMILES-encoded molecules that satisfy the constraints of the training set. This RNN-LSTM approach previously resulted in prospective discovery of novel compounds with desired bioactivities.146
As an example of generative de novo design, RNA-dependent RNA polymerase (RdRp) of SARS-CoV-2147 was targeted, aiming to obtain new potent DAA agents. An RNN-LSTM model was employed for molecule generation,141 that was trained in two steps. Firstly, a generalized model (‘virtual medicinal chemist’) was developed by learning the syntax of approximately 400000 SMILES strings of known bioactive compounds.148 Secondly, the model was fine-tuned with four nucleoside analogues that were effective against SARS-CoV-2 RdRp: approved favipiravir, and ribavirin; investigational galidesivir; and the active component GS-5734 of remdesivir prodrug. These four template compounds biased the model toward nucleoside analogues. Consequently, new SMILES were sampled by the tuned model, and the computer-generated molecules were ranked according to their topological pharmacophore similarity to the four RdRp inhibitor templates. Notably, the de novo generated structures contained several substructures of known RdRp inhibitors, but also carried novel chemical moieties, especially among the lower ranking designs (data not shown). We anticipate that these computer-generated molecules could serve as prospective templates rather than elaborated DAA designs because of limitations of the approach. For example, no background information about nucleoside interaction in RNA was considered during RNN-LSTM training. Neither target selectivity, pharmacokinetic and -dynamic properties, nor the synthesizability of the designs were explicitly considered. Consequently, the suggested molecules will benefit from careful checking by human experts and other computational tools. The selected designs then have to be synthesized and tested before any claim of pharmacological activity can be made. Nonetheless, some of the de novo generated molecules appear chemically feasible and attractive, contain innovative molecular scaffolds and deserve further consideration, illustrating the potential of generative models for rapid delivery of testable chemical designs and concepts.149
One of the most notable examples of a KG was developed for drug repurposing against COVID-19 by BenevolentAI. This KG integrated a vast repository of structured medical information, including numerous connections extracted from scientific literature by various ML algorithms.154 To find a drug effective against COVID-19, a custom graph was created and a subgraph relating to SARS-CoV-2 extracted to permit inspection by experts.16 This KG revealed that the virus binds the host cells via the ACE2 receptor expressed on the surface of lung AT2 alveolar epithelial cells. ACE2 is involved in clathrin-mediated endocytosis, which in turn is promoted by members of the numb-associated kinase (NAK) family, including AAK1 and GAK. Baricitinib, a drug approved for the treatment of rheumatoid arthritis, was identified as a NAK inhibitor with sufficient plasma concentration to inhibit AAK1. It was therefore submitted for clinical testing.16 Furthermore, baricitinib is a JAK–STAT signaling inhibitor and was predicted to be effective against the elevated levels of cytokines (cytokine storm) observed in people with COVID-19. It was also predicted to have a tolerable side effect profile and low risk of interactions with other drugs based on the KG.154
These predictions were verified in vitro: baricitinib inhibited signaling of cytokines implicated in COVID-19 infection, it showed high affinity to several members of the NAK family, and it showed reduced viral infectivity in human primary liver spheroids.16 Initial clinical data has shown that baricitinib treatment was associated with clinical and radiologic signs of recovery, and a rapid decline in viral load and inflammatory markers in patients with bilateral COVID-19 pneumonia. A randomized clinical trial, ACTT-II, has been initiated by Eli Lilly and NIAID to study the effectiveness of baricitinib for serious COVID-19 infections and resulted in drug's approval for emergency use in combination with remdesivir.155
Taking advantage of publicly available information, a network of universities and biotechnology companies in China have created a KG for target-drug interactions, protein–protein interactions, drug molecular similarities, and protein sequence similarities. The KG was queried, using a network-based knowledge mining algorithm, for suitable drugs. These were identified as hit candidates if another NLP relation extraction model found a bag of sentences from the PubMed abstracts corpus describing a relation between the drug and a target in the coronavirus of interest. This method identified a PARP1 inhibitor, CVL218, which subsequently exhibited effective inhibitory activity against viral replication with no apparent signs of toxicity in rats and monkeys. It also possessed anti-inflammatory effects.
Researchers from Amazon Web Services (AWS) and a network of organizations in China and the USA have created a KG with 15 million edges (interactions) across 39 types of relationships connecting drugs, diseases, genes, pathways, and expressions, from a large scientific corpus of 24 million PubMed publications, the GNBR data set, and the DrugBank database.156 The RotatE algorithm was used to generate a low dimensional embedding of the KG that suggested 41 drug candidates for repurposing. These were supported by a high score in the treatment space, their proximity in the low dimensional embedding, and gene-set enrichment analysis from transcriptomic and proteomic data. AWS has also generated a similar biological knowledge graph, called DRKG, to fight COVID-19. It included information from six databases (DrugBank, Hetionet, GNBR, String, IntAct and DGIdb), and data collected from recent publications particularly related to COVID-19, containing nearly 6 million edges between 100 thousand entities of 13 entity types.157
Other open source COVID-19 KGs include the extension of ROBOKOP to COVID-KOP by researchers at the University of North Carolina,158 and KG-COVID-19 by investigators at Berkeley, California.159 The ROBOKOP biomedical KG was enriched with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. Sentence-by-sentence co-occurrence analysis added 800000 new edges to the COVID-KOP graph, and co-occurrence counts at the paper level led to 4.5 million new edges. Gene ontology data for viral proteins and symptom data was also added to the KG. The authors demonstrated the utility of the new KG by retrieving the pathway serving as a rationale for the linagliptin clinical trial against COVID-19 and suggesting new inferences.158 Thus, KG-COVID-19 was created by incorporating the latest data extracted from several biomedical databases and literature, including drug, protein–protein interactions, SARS-CoV-2 gene annotations, concept, and publication data from the CORD19 data set in an ontology-aware way. it contains about 16 million edges between nearly 300 thousand entities. The KG can be queried using SPARQL and the authors provide example queries to ease entry.
Another recent example of relevant KG construction is provided by Neo4COVID-19,160 a knowledge mining workflow inspired by SmartGraph161 and Hetionet,162 which served to assemble a Neo4j network with essential ingredients such as virus-host protein–protein interactions (VHPPIs), human protein–protein interactions (hPPIs), and drug-target interactions (DTIs). Its purpose is to better evaluate network-pharmacology-driven hypotheses and accelerate anti-SARS-CoV-2 drug repositioning. VHPPI sources included two proteomic studies,14,127 the SARS-CoV-2 subset from the viral-human interactions atlas163 and a genome-wide CRISPR screen for host genes related to SARS-CoV-2 infection.164 To streamline these non-overlapping VHPPIs with hPPIs,14,127,164 the authors used a KG based machine learning step (described elsewhere in the context of autophagy),165 by using the “positive” (known) interactions against true negatives (from the above experiments) in the context of data aggregated from 17 distinct machine-learning ready sets from TCRD/Pharos.166 For the pharmacology component of the network, DTIs were extracted from the DrugCentral database.167 DrugCentral currently includes 4642 drugs, of which 2549 have regulatory approval dates. DrugCentral DTI annotations include 19959 human DTIs and 2570 non-human DTIs; of these 2752 are mode-of-action DTIs.167
In summary, recent efforts in knowledge graph construction and data mining illustrate the immense amount of research already performed on COVID-19, and the utility of KG approaches for the drug repurposing is outlined by the stellar example of baricitinib.
Open-source implementations are also available for anyone wanting to extend this work. It is important to note, however, that proper clinical validation of suggested candidates will require strong collaborations between academic, industrial, and government partners, and will take much longer than a KG query. It is a testament to the urgency of the pandemic that such a huge amount of data has been released to the community, and a vast array of AI and ML approaches have been brought to bear in the challenge of discovering effective treatments for COVID-19.
A detailed description of such study design is outlined in Fig. 6, where the initial step corresponds to the use of the combination of text mining (using Chemotext),169 knowledge mining (using ROBOKOP/COVID-KOP knowledge graphs),158,170 and machine learning (QSAR)171 tools to identify existing drugs with possible activities against SARS-CoV-2.172 Based on the initial findings, 76 individual drug candidates were identified as components of possible combinations.
Fig. 6 Study design for identifying drug combinations. Reproduced from ref. 173 with permission from the Cell Press, copyright 2021. |
These drugs can generate 2850 unique-component combinations; to increase their synergetic probability, pairs of drugs with different mechanisms of action, and/or targeting virus at different lifecycle stages174 were prioritized. Consequently, 281 binary combinations of 38 drugs, and 95 ternary combinations of 15 drugs were chosen for further consideration. The in silico pipeline incorporating Chemotext169 along with recently developed COVID-KOP,158 and QSAR models of major drug-drug interactions175 was then used to determine whether selected compounds had been previously tested together and whether negative drug-drug interactions could be anticipated. The resulting prioritized list included 32 drugs and their 73 selected binary combinations for testing in vitro against SARS-CoV-2.173
Selected combinations were then experimentally screened in a 6 × 6 dose matrix format, involving two biological batches (cell and SARS-CoV-2 virus) and two assays (cytopathic effect and cytotoxicity against Vero-E6 cells) across 42384-well plates, including replicates. Each batch was then assessed with five known DAAs used as a positive control. The batch readouts were highly reproducible, emphasizing the importance of using a dose matrix, instead of a single dose combination, to enhance the confidence of synergism/antagonism findings. The highest single agent (HSA) synergy model was subsequently applied to the screening outcomes and revealed that within 73 binary combinations of 32 compounds, there were 16 synergistic and 8 antagonistic pairs, with 4 displaying both synergistic and antagonistic interactions at different concentrations.176
Notably, these results demonstrated a strong antagonistic effect between remdesivir and the antimalarial drugs hydroxychloroquine, mefloquine, and amodiaquine (Fig. 7). Remarkably, the most striking antagonism was observed in the combination of the only two drugs approved with FDA Emergency Use Authorization (EUA) to treat COVID-19: hydroxychloroquine and remdesivir (the EUA for hydroxychloroquine has since been withdrawn by the FDA).177
Fig. 7 Activity and synergy/antagonism matrices for selected drug combinations (A: Remdesivir + Hydroxychloroquine; B: Remdesivir + Amodiaquine; C: Nitazoxanide + Remdesivir; D; Nitazoxanide + Amodiaquine). Reproduced from ref. 173 with permission from the Cell Press, copyright 2021. |
Among the identified 16 synergistic combinations, a significant enrichment for nitazoxanide (FDA-approved broad-spectrum antiviral and antiparasitic drug) was also observed. The three most synergistic combinations were: nitazoxanide with remdesivir; nitazoxanide with umifenovir; and nitazoxanide with amodiaquine. A complete rescue of CPE was observed when 0.6–5 μM of nitazoxanide combined with remdesivir/umifenovir/amodiaquine, while any of these drugs alone only achieved 40–60% rescue. Important to note that amodiaquine, one of 32 drugs identified as descibed above and found active in CPE assay,172 subsequenty was found to have antiviral activity against SARS-CoV-2 in vitro131 and in vivo.178
These findings demonstrate the importance of preclinical research on antiviral drug combinations, as well as the utility of data and text mining approaches to explore modes of action (MoA) underlying synergism/antagonism in the context of COVID-19. These results also signal that the paucity of preclinical studies on drug combinations, prior to their use in patients, may significantly increase risks of undesirable side effects and poor outcomes. Furthermore, the developed matrix screening platform173 represents an efficient, data-driven means for prioritizing synergistic combinations of COVID-19 therapies and flagging undesirable drug interactions. All the results were made publicly available via NCATS Open Data Platform.125
Consensus approaches and/or post-docking processing have been used to a greater or lesser extent in the majority of VS campaigns on SARS-CoV-2 targets reported to date. As we have mentioned, the vast majority of these studies have not provided experimental validation for predictions.107,185–187 Only a few reported VS campaigns on Mpro resulted in identification of confirmed hits.28,52,188 While these represent rare cases of experimentally validated inhibitors of SARS-CoV-2 targets, the levels of activity achieved were not sufficient for direct therapeutic use. As we have described above, such hits may conceivably be improved by conventional medicinal chemistry-driven hit to lead optimization, however any NCEs that arise would have to follow the lengthy drug development pipeline. It is likely, therefore, that the flexible and chemically active enzymatic site of the Mpro requires use of more diverse and accurate CADD tools and more stringent and sophisticated consensus protocols.
To address the question of whether more rigorous scoring schemes could lead to more accurate VS performance, various consensus docking approaches were investigated using four major programs Autodock-GPU,189 FRED,190 GLIDE,191 and ICM.192 They were applied to the consensus protocol in sequential order (noting the decrease in the respective program's efficiency). A closely related SARS-CoV main protease (PDB: 4MDS193) was employed, for which many validated, diverse non-covalent inhibitors have been reported.194 From the literature, 81 such non-redundant inhibitors were identified,194 and for each, up to 50 molecular decoys were generated using the Directory of Useful Decoys–Enhanced (DUD-E) server.195 The resulting test set included 81 active and about 4000 inactive molecules, corresponding to a rather optimistic 2% background (random) hit rate.
For all poses generated by different docking programs for the same ligand, their pairwise RMSD values were calculated. Molecules that were docked by different programs with and RMSD < 2A were then considered to have been predicted by the consensus. The generated docking scored were consequently ranked by the last docking protocol used. The performance of this consensus approach was evaluated by the Enrichment Factor:196 Other common scoring criteria used were the receiver operating curve (ROC) and the area under the curve (AUC)120 metrics that illustrate the general quality of the ranking schemes. The resulting EF and ROC metrics estimated for the four VS strategies are presented in Fig. 8.
These results demonstrate that consensus prediction by two or more docking programs results in significantly better ROC statistics (with improvements in both initial slope and AUC values). In cases when all four docking programs were used, the AUC value was as high as 0.96 using ICM scoring function, indicating a very significant capability to distinguish between active and inactive compounds in the test set (Fig. 8). Similarly, EF values consistently increased with the number of programs combined in the consensus strategy, clearly indicating that consensus discarded decoys at a significant higher rate than active molecules.
In another conceptually similar study by Ghahremanpour et al.,188 consensus docking approaches also led to notable success. The authors concurrently employed Glide, AutoDock Vina, and two protocols with AutoDock 4.2 for concurrent virtual screening of ∼2000 existing drugs against the Mpro active site to arrive at 42 top-scoring consensus hit compounds. Then, taking into account intermolecular contacts, conformation, stability in molecular dynamics (MD) simulations, and potential for synthetic modification, 17 compounds were selected for purchasing. Remarkably, 14 out of these 17 tested compounds were found to be micromolar inhibitors of Mpro with IC50 values of 5–10 μM. This investigation suggests that rigorous approaches to molecular docking and consensus hit selection afford very high experimental hit rates. While compounds demonstrating micromolar activities in vitro are unlikely to be potent enough to be stand-alone drug candidates, these compounds were expected to be very useful for conventional hit-to-lead medicinal chemistry optimization. Indeed, in a recent exciting sequel to the aforementioned study,188 using Free Energy Perturbation (FEP approach, Zhang et al. redesigned the weak hit perampanel to yield multiple noncovalent, nonpeptidic inhibitors with ca. 20 nM IC50 values in a kinetic assay.
In summary, examples of studies described in this section, demonstrate that rational reduction of a molecular database through consensus VS could represent a rational strategy to find elusive, potent noncovalent SARS-2-CoV Mpro inhibitors. They also show the importance of rigor in evaluating computational hits and the power of the experimental confirmation of hits selected by computational protocols to increase the impact and recognition of CADD methods. We additionally reflect on the importance of rigorous execution of both molecular simulations and confirmatory experimental bioactivity testing in the next sections of this review.
Experienced CADD users know that very flexible ligands suffer from entropic penalties that can affect their binding affinities. Thus, some important candidate hits that have emerged from virtual screening against the SARS-CoV-2 Mpro and RdRp are very flexible, with large numbers of rotatable bonds that make significant conformational entropy contributions to the ligand binding free energies. Studies that combine docking calculations with MD simulations of the best scoring hits often employ the Poisson–Boltzmann or Generalized Born and surface area continuum solvation (MM/PBSA and MM/GBSA) methods to estimate the free energy of the binding of small ligands to their targets. These popular methods are intermediate in accuracy and computational effort between empirical docking scores and strict alchemical perturbation methods. While they do a reasonable job of accounting for the entropic contributions of solvent, they ignore or approximate the conformational entropy of ligands due to the high computational cost of normal mode analysis.198 For example, Alamri et al. reported the results of a combined AutoDock Vina and MMGBSA study of the binding of libraries of covalent inhibitors and antiviral compounds against the SARS-Cov-2 Mpro.199
Researchers using MD simulations of molecules with the most favorable docking scores to calculate absolute binding energies need to be aware of the approximations inherent in the popular MMPBSA and MMGBSA methods and in the use thermodynamic cycle methods with insufficient conformational sampling. There are several recent developments that allow ligand entropies to the accounted for in more computational efficient ways.200 We expect that the use of the such corrections will improve the accuracy of docking calculations as applied to SARS-CoV-2 targets whereas approaches considered in following sections will help improve their computational efficiency.
Different types of assays can assess activities in different ways and can be used orthogonally to increase the confidence in hits. For example, Hanson et al.201 developed a proximity-based AlphaLISA assay to measure binding of SARS-CoV-2 spike RBD protein to the ACE2 receptor that can be used to find small molecules disrupting this critical interaction. These researchers screened 3384 drugs and pre-clinical candidates and identified 25 hits with IC50 values ranging from 0.1 to 29 μM.
Identification of false positives during any HTS campaign is similarly crucial. There are many assay components that can cause non-specific compound interference, such as readout type, signal generation or detection, platform automation, assay conditions, etc. To eliminate such potential false positives, Hanson et al. used the AlphaLISA TruHits kit as a counter-screen. This kit identifies inner filters, light scatterers (insoluble compounds), singlet oxygen quenchers and biotin mimetics interfering with the assay signal, thus eliminating false positives and helping to improve HTS outcomes.
Several cell-based live virus assays have been developed for SARS-CoV-2.125 One measures the ability of compounds to reverse the viral induced cytopathic effect (CPE) in infected Vero E6 host cells. The CPE reduction assay,202 indirectly detects the ability of a compound of interest to inhibit viral replication and/or infection through mechanisms such as direct inhibition of a viral entry, suppression of enzymatic processes, and action on host pathways that modulate viral replication. The CPE reduction assay was used in many studies of individual DAA agents or drug combinations that have been discussed in the previous sections.
To summarize, a significant number of cell-based and biochemical assays have been developed to aid drug discovery for SARS-CoV-2.203 Sharing and dissemination of such assays, along with screening results and successful CADD protocols, is in high demand. To address this need, the NCATS has developed open science data portal125 offering real-time results of various SARS-CoV-2 screening campaigns. This online resource contains readouts for more than 10000 compounds, when possible, evaluated over full dose–response ranges. This portal stimulates multi-faceted collaboration between groups from different fields and represents the best practice scenario for drug discovery research against COVID-19 as described in the final section of this review.
Conspicuously, the COVID-19 crisis has done much to stimulate collaboration and greater openness in science,204 driven by the assumption that openness accelerates the research. Sharing data and ideas with minimal restrictions allows all parties to capitalize on new knowledge more quickly and effectively, avoiding unnecessary duplication. Our willingness and efforts to bypass traditional scientific restrictions (e.g., the need to patent, secure research funding, or boost our academic profile) is encapsulated in three initiatives: Open access, Open data and Open source.
The Diamond Light Source has also placed many of its structures, and associated fragment screening results, into public domain,206 preparing the ground for the community-based Moonshot initiative.
A generalized consensus on Open science has been agreed on by a number of global and national coalitions,217 research, business and regulatory consortia218 and progress-tracking initiatives committed to supporting open research, and collectively battling the deadly pandemic.
To illustrate this assertion, selected examples of de novo designed chemical compounds, drugs, or drug combinations discovered or repurposed using computational approaches are shown in Table 5. Compounds MLS000699212-03 and NCGC00100647 were discovered using biological activity-based modeling approach, in which compound activity profiles established across multiple assays were used as signatures to predict compound activity in other assays or against a new target.219 Although the idea of using activity data in the modeling is not new,220 the authors validated its utility by achieving ∼30% success rate in the discovery of novel antivirals. Consensus of docking techniques resulted in repurposing of nafamostat and camostat,221 manidipine and boceprevir,222 and subsequent discovery of two antioxidant polyhydroxy-1,3,4-oxadiazole compounds CoViTris2020 and ChloViD2020 with high activities in vitro,222 and several perampanel analogs223 (identified using free-energy perturbation method) as novel, non-covalent Mpro inhibitors with 20 nM–5 μM IC50 values in a kinetic assay for Mpro. Several Deep Docking52 and QSAR224 hits were selected and confirmed experimentally by independent research groups that led to discovery of several potent Mpro inhibitors122 and repurposing of cenicriviroc and two other drugs, among others.125 In case of mixtures, AI-derived hypothesis of baricitinib as a potential treatment for COVID-1916 resulted in Emergency Use Authorization (EUA) by the U.S. Food and Drug Administration (FDA) granted for its combination with Remdesivir.155 Sixteen synergistic and eight antagonistic drug combinations, including most notable nitazoxanide – umifenovir for synergy and remdesivir – (hydroxy)chloroquine for antagonism, were identified using knowledge mining approaches and QSAR and then confirmed experimentally. Importantly, amodiaquine, identified as potential anti-COVID-19 repurposing candidate by knowledge-mining approaches,172 was confirmed to have experimental antiviral activity in CPE172 and titer reduction131 assays as well as in animal studies.178 Given its half-life of 3 weeks, amodiaquine could be a great solution, particularly for countries lacking access to Remdesivir, Favipiravir and other antivirals.
Finally, Open Science and data sharing can go far in helping computational modelers discover new therapies and computational scientists must routinely seek experimental validation of their “digital dreams” before promoting computational results. Adhering to rigorous practices of modern research may dramatically reduce the number of publications but also dramatically improve the number of computer-assisted, experimentally validated potent antivirals discovered. We hope this collective contribution will be useful for data modelling and experimental researchers wishing to expand their toolkits to include rigorous computational approaches in their efforts to combat current and future pandemics.
This journal is © The Royal Society of Chemistry 2021 |