Whole-genome sequence analysis and homology modelling of the main protease and non-structural protein 3 of SARS-CoV-2 reveal an aza-peptide and a lead inhibitor with possible antiviral properties

Arun K. Shanker *a, Divya Bhanu ab, Anjani Alluri c and Samriddhi Gupta d
aICAR – Central Research Institute for Dryland Agriculture, Santoshnagar, Hyderabad – 500059, India. E-mail: arunshank@gmail.com
bCentre for Plant Molecular Biology, Osmania University, Hyderabad, India
cAdvanced Post Graduate Centre, Acharya N. G. Ranga Agricultural University, Guntur, India
dDepartment of Biochemistry, School of Life Sciences, University of Hyderabad, India

Received 26th February 2020 , Accepted 6th May 2020

First published on 8th May 2020


Viruses belonging to the family Coronaviridae consist of virulent pathogens that have a zoonotic property. Severe acute respiratory syndrome coronaviruses (SARS-CoVs) and Middle East respiratory syndrome coronaviruses (MERS-CoVs) of this family have emerged before and SARS-CoV-2 has emerged now globally. The characterization of spike glycoproteins, polyproteins and other viral proteins from viruses is important for antiviral drug development. Homology modelling of these proteins with known templates offers the opportunity to discover ligand-binding sites and explore the possible antiviral properties of these protein–ligand complexes. In this study, we performed a complete bioinformatic analysis, sequence alignment, comparison of multiple sequences and modelling of the SARS-CoV-2 whole-genome sequences, the spike protein and the polyproteins for homology with known proteins. We also analysed binding sites in these models for possible binding with ligands that exhibit antiviral properties. Our results indicated that the sequence of the polyprotein isolate SARS-CoV-2_HKU-SZ-001_2020 showed 98.94 percent identity to SARS-coronavirus NSP12 bound to NSP7 and NSP8 co-factors. The results also indicated that a part of the viral genome (residues 3268–3573 in Frame 2 with 306 amino acids) of the SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank Accession Number MN908947.3) when modelled with template 2a5i of the PDB database showed 96 percent identity to a 3C-like peptidase of SARS-CoVs, which has the ability to bind with an aza-peptide epoxide (APE) known for the irreversible inhibition of SARS-CoV main peptidase. A docking profile with 9 different conformations of the ligand with the protein model using Autodock Vina showed an affinity of −7.1 kcal mol−1. This region was conserved in 831 genomes of SARS-CoV-2. The part of the genome (residues 1568–1882 in Frame 2 with 315 amino acids) when modelled with template 3e9s of the PDB database showed 82 percent identity to a papain-like protease/deubiquitinase, which when complexed with ligand GRL0617 acts as an inhibitor and can block SARS-CoV replication. A docking profile with 9 different conformations of the ligand with the protein model using Autodock Vina showed an affinity of −7.9 kcal mol−1. This region was conserved in 831 genomes of SARS-CoV-2. It is possible that these ligands can be used as antivirals of SARS-CoV-2.


1. Introduction

The human coronavirus that caused severe acute respiratory syndrome (SARS-CoV) emerged in 2003, another type of coronavirus, namely, the Middle East respiratory syndrome coronavirus (MERS-CoVs) emerged in 2012 and SARS-CoV-2 has emerged in 2019 in China and has since assumed proportions that is of a global concern in 2020. Viruses belonging to the family Coronaviridae mainly consist of virulent pathogens that have a zoonotic property, and this large family of coronaviruses have been known to be circulating in animals including camels, cats and bats. It has been seen in the past that severe acute respiratory syndrome-associated coronaviruses (SARS-CoV) and Middle East respiratory syndrome-associated coronaviruses (MERS-CoVs) belonging to this family of viruses can be transmitted from animals to humans and can cause respiratory diseases. Human-to-human transmission of this virus has been a concern, and therefore, search for antiviral compounds and vaccine development for this family of viruses have become the need of the hour.

SARS was first reported in 2002 in Guangdong province of China, and later spread globally and had affected about 8096 people.1,2 In 2012, a novel betacoronavirus, designated Middle East Respiratory Syndrome coronavirus or MERS-CoV associated with severe respiratory diseases in humans, emerged in the Arabian Peninsula.3

The World Health Organization (WHO), China Country Office, was informed of cases of pneumonia of unknown aetiology in Wuhan City, Hubei Province, on 31 December 2019.4 A novel coronavirus now officially designated SARS-CoV-2 was announced as the causative agent by Chinese authorities on 7 January 2020. As on 24 April 2020, the World Health Organization reported 2[thin space (1/6-em)]544[thin space (1/6-em)]792 confirmed cases globally.5

Coronaviruses are positive-sense RNA viruses, and they have a large 27–32 kilobase genome structure. These are of ancient origin, and they have non-segmented genomes with two large overlapping open reading frames (ORF1a and ORF1b) that are translated into polyproteins. These polyproteins are processed into 16 non-structural proteins (NSPs), and the remaining portion of the genome has the structural proteins: spike (S) envelope (E), membrane (M) and nucleoprotein (N).6

It is also known that various CoVs can undergo recombination in their genomes after infecting host cells.7 This recombination can be a significant factor for their evolution to novel types, which may have new animals as their intermediate hosts. These factors endow the CoVs with high adaptive ability and the capability to jump across species and have a relatively large host range.

The characterization of spike glycoproteins from these viruses is important for vaccine development. Designing antiviral drug candidates such as in silico epitopes, polyproteins and spike protein-based peptide vaccines for infectious viruses is a way that can hasten the process of vaccine development. Targeting spike (S) proteins, polyproteins and other viral proteins of SARS-CoV-2 for the development of vaccines and therapeutics against infection is an important approach. In the case of SARS-CoVs, these proteins can mediate the binding of the virus with its receptor and promote the fusion between the viral and host cell membranes and virus entry into the host cell, hence peptides, antibodies, organic compounds and short interfering RNAs that interact with the spike protein can have a potential role in vaccine development.8

There are multiple domain functions that are active in the replication of the coronavirus, and these domains are present in a protein designated non-structural protein 3 (nsp3), which is the largest protein in the coronavirus genome.9 The 3C-like protease (3CLpro) and papain-like protease (PLpro) enzymes are involved in the processing of viral polyproteins from the genomic RNA to individual protein components that are required structurally or non-structurally for the replication and packaging of new-generation viruses.10

The main protease in the SARS virus is the key enzyme for the processing of polyproteins of the virus. This has been the main target for antivirals in the past in SARS-CoVs, and we hypothesize that this can have high homology with the main protease of SARS-CoV-2, and the same protein can be a target for antivirals in this virus as well. It has been known that viral replication can be blocked by inhibiting this protein.11 The nonexistence of this protein in humans makes it an even more attractive antiviral target, as there can be no cytotoxicity to humans.

SARS-CoV 3CLpro is essential for the replication of the virus, and hence, as potential drug targets, various efficacious anti-SARS-CoV 3CLpro compounds have been discovered from sources including laboratory synthetic methods, natural products and virtual screening.12 Peptidic and small-molecule-based inhibitors have been used as potential drugs against SARS-CoV 3CLpro.13 Similarly, molecular modelling of SARS-CoV PLpro has shown that it has deubiquitinating activities. The deubiquitination function can provide a framework for the development of antivirals to treat SARS.14,15

We hypothesised that there can be some proteins in the ORF1ab in SARS-CoV-2 that could have homology with the non-structural protein 3 (nsp3) SARS CoVs, and these proteins can possibly have binding sites for known ligands having antiviral properties. In this study, we performed a complete bioinformatic analysis, sequence alignment, comparison of multiple sequences of the SARS-CoV-2 whole-genome sequences, the spike protein and the polyproteins for homology with known spike proteins and also analysed the binding sites for possible antiviral drug targets.

2. Materials and methods

Six complete viral genome sequences, seven polyproteins (RdRp region) and seven glycoproteins available on NCBI portal on 4 Feb 2020 were taken for analysis. The sequence details and GenBank accession numbers are listed in Table S1 (ESI). Amongst the seven polyproteins, five are of Wuhan pneumonia virus isolate SARS-CoV-2 and two are of Wuhan pneumonia virus isolate SI200040-SP. The seven glycoproteins are of the same isolate, Wuhan pneumonia virus isolate SARS-CoV-2. For further analysis, 831 whole-genome sequences of SARS-CoV-2 were taken from NCBI as on 21 April 2020.

The available polyproteins (RdRp region) and glycoproteins were retrieved from GenBank, NCBI.16 These sequences were translated to amino acid sequences by sorted six frame translation using Bioedit.17 Multiple sequence alignment of the translated protein sequences was performed, and a phylogenetic tree was constructed using Mega-X.18 The alignment shows that amongst the seven polyproteins, five were identical being from the same isolate and two other polyproteins of the other isolate were identical. Similar analyses of the seven glycoproteins were performed, and all the seven glycoprotein sequences were found to be identical. Therefore, further analysis was carried out for three sequences.

1. MN938385.1 SARS-CoV-2 virus isolate SARS-CoV-2_HKU-SZ-001_2020 ORF1ab polyprotein, RdRp region, (orf1ab) gene, partial cds: 0 to 284: Frame 3 95 aa

2. MN970003.1 SARS-CoV-2 virus isolate SI200040-SP orf1ab polyprotein, RdRP region, (orf1ab) gene, partial cds: 2 to 289: Frame 2 96 aa

3. MN938387.1 SARS-CoV-2 virus isolate SARS-CoV-2_HKU-SZ-001_2020 surface glycoprotein (S) gene, partial cds: 1 to 105: Frame 1 35 aa

Expasy proteomics server19 was used to study the protein sequence and structural details. These peptides were studied for their physio-chemical properties using the tool Protparam.20 The secondary structure analysis was performed using the Chou and Fasman algorithm with CFSSP.21 To generate the 3D structure from the FASTA sequence, homology modelling was performed and the templates were identified. The model was built using the template with the highest identity.

The SWISS-MODEL server was used for homology modelling,22 where computation was on ProMod3 engine, which is based on the open structure.23 Structural information is extracted from the template, and the sequence alignment is used to define insertions and deletions. The protein–ligand interaction profile with hydrogen bonds, hydrophobic interactions, salt bridges and π-stacking was obtained using the PLIP server.24 The SWISS-MODEL server25 was used to build and validate the 3D model, and structural assessment was also performed to validate the model built. The PDB sum was used to validate the model with the Ramachandran plot.26

Whole genome sequence of the SARS-CoV-2 virus isolate Wuhan-Hu-1 (Genbank Accession Number MN908947.3) which has 29[thin space (1/6-em)]903 bp ss-RNA linear was translated sorted 6 frame with minimum ORF of 20 with any start codon and the resultant protein sequence was used for homology modelling, homology models where done with proteins sequences 21[thin space (1/6-em)]503 to 25[thin space (1/6-em)]381 in Frame 2 with 1293 amino acids, 13[thin space (1/6-em)]450 to 21[thin space (1/6-em)]552 in Frame 1 with 2701 amino acids and 254 to 13[thin space (1/6-em)]480 in Frame 2 with 4409 amino acids. Alignments of the residues 3268–3573 in Frame 2 with 306 amino acids and the other from the part of the genome residues 1568–1882 in Frame 2 with 315 amino acids of SARS-CoV-2 were done with 831 genomes of SARS-CoV-2 and found to be identical.

Blind docking with ligands of the homology models generated from template 2a5i of the PDB database, template 3e9s of the PDB database and also the templates 2a5i and 3e9s was done on the whole surface of a protein using Autodock Vina. The ligands and the protein molecules were taken in the PDBQT format for docking. The preparation of the ligand and protein files was done by converting the SDF format into PDB format using OpenBabel. The ligands were then prepared by detecting their root and torsion tree. The proteins were modelled by deleting all heteroatoms, water molecules, polar hydrogen were added. In order to know where the ligand would bind optimally the grid was specified to be the whole protein. The docking was done with Autodock Vina,27 9 different conformations of the ligand were considered for each docking. The docking results were obtained as PDBQT and .txt files.

The docking was analyzed by opening the PDB form of the protein in Pymol along with the PDBQT file of the most suitable ligand conformation result. The complex obtained was saved as a PDB file. This PDB file is then viewed in DS visualizer, where ligand interactions were observed with the corresponding amino acids for the kind of interactions and the distance between them. The software used for the process were Pymol, Autodock Vina, and Discovery Studio Visualizer.

3. Results and discussion

The physico-chemical properties and primary structure parameters of the 7 polyprotein RdRp region of SARS-CoV-2 isolate are given in Table 1. RdRP forms an important part of the viral genome, where the RNA viruse's function is to catalyze the synthesis of the RNA strand complementary to a given RNA template.
Table 1 Physico-chemical properties of polyproteins of SARS-CoV-2 virus isolates
Accession number MN938385.1 MN938386.1 MN975263.1 MN975264.1 MN975265.1 MN970003.1 MN970004.1
Reading frame 3 3 3 3 3 2 2
Number of amino acids 95 95 95 95 95 96 96
Molecular weight 10640.22 10640.22 10640.22 10640.22 10640.22 11239.26 11239.26
Theoretical pI 9.87 9.87 9.87 9.87 9.87 8.9 8.9
Formula C472H752N134O138S4 C472H752N134O138S4 C472H752N134O138S4 C472H752N134O138S4 C472H752N134O138S4 C516H786N132O132S9 C516H786N132O132S9
Total number of atoms 1500 1500 1500 1500 1500 1575 1575
Theoretical extinction coefficients 12[thin space (1/6-em)]950 12[thin space (1/6-em)]950 12[thin space (1/6-em)]950 12[thin space (1/6-em)]950 12[thin space (1/6-em)]950 24[thin space (1/6-em)]200 24[thin space (1/6-em)]200
Instability index 20.51 20.51 20.51 20.51 20.51 29.66 29.66
Aliphatic index 80.11 80.11 80.11 80.11 80.11 89.27 89.27
Grand average of hydropathicity (GRAVY) −0.264 −0.264 −0.264 −0.264 −0.264 0.161 0.161
Estimated half-life 1.9 hours (mammalian reticulocytes, in vitro). 1.9 hours (mammalian reticulocytes, in vitro). 1.9 hours (mammalian reticulocytes, in vitro). 1.9 hours (mammalian reticulocytes, in vitro). 1.9 hours (mammalian reticulocytes, in vitro). 1.3 hours (mammalian reticulocytes, in vitro). 1.3 hours (mammalian reticulocytes, in vitro).
>20 hours (yeast, in vivo). >20 hours (yeast, in vivo). >20 hours (yeast, in vivo). >20 hours (yeast, in vivo). >20 hours (yeast, in vivo). 3 min (yeast, in vivo). 3 min (yeast, in vivo).
>10 hours (Escherichia coli, in vivo). >10 hours (Escherichia coli, in vivo).


The isolate SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein had 2 reading frames as compared to the rest of the isolates that had 3 reading frames. The presence of multiple reading frames suggests the possibility of overlapping genes as seen in many viral, prokaryotic and mitochondrial genomes. This could affect the way the proteins are formed. The number of amino acid residues in all the polyproteins were the same except one isolate, SI200040-SP, which had one amino acid more than the other polyproteins. The extinction coefficients of the two isolates, SI200040-SP orf1ab polyprotein and SI200121-SP orf1ab polyprotein, were much higher than that of the rest of the polyproteins. The extinction coefficient is important when studying protein–protein and protein–ligand interactions. The instability index of these two isolates was also higher than that of the others, indicating that these two isolates are predicted to be unstable. The regulation of gene expression by polyprotein processing is known in viruses, and this is observed in many viruses that are human pathogens.28

The isolates here like many other viruses may be using the replication strategy that could involve the translation of a large polyprotein with the subsequent cleavage by viral proteases. The two isolates SI200040-SP orf1ab polyprotein and SI200121-SP orf1ab polyprotein also showed shorter half-lives than the other isolates, indicating that they are susceptible to enzymatic degradation.

The tertiary structure analysis of the isolate SARS-CoV-2_HKU-SZ-001_2020 ORF1ab polyprotein is given in Table 2.

Table 2 Tertiary structure of the SARS-CoV-2 virus isolate SARS-CoV-2_HKU-SZ-001_2020 ORF1ab polyprotein with templates
PDB template Gene Identity
6nur.1.A NSP12 98.947
1khv.1.A RNA-directed RNA polymerase 8.97
1khv.2.A RNA-directed RNA polymerase 8.97
5z6v.1.A ABC-type uncharacterized transport system periplasmic component-like protein 19.74
6k1y.1.A ABC-type uncharacterized transport system periplasmic component-like protein 19.74
2ckw.1.A RNA-directed RNA polymerase 10.53
2uuw.1.A RNA-directed RNA polymerase 10.67
2wk4.1.A Protease-polymerase p70 10.67
2wk4.1.B Protease-polymerase p70 10.67
2yan.1.A Glutaredoxin-3 12.50
2yan.2.A Glutaredoxin-3 12.50


It was observed that the polyprotein showed 98.94 percent identity to the PDB structure 6nur.1.A, which is a hetero-1-2-1-mer. The polyprotein is an RNA-directed RNA polymerase. The protein is identical to the SARS-coronavirus NSP12 bound to NSP7 and NSP8 co-factors.29 In SARS, it is basically a nonstructural protein with NSP12 being the RNA-dependent RNA polymerase, and the co factors NSP7 and NSP8 having the function of forming hexadecameric complexes which also act as processivity clamp for RNA polymerase and primase.30 This structure as in SARS-CoVs here in SARS-CoV-2 may be involved in the machinery of core RNA synthesis and can be a template for exploring antiviral properties.

The phylogenetic tree of the seven polyproteins is shown in Fig. 1.


image file: d0nj00974a-f1.tif
Fig. 1 Phylogenetic tree of the seven polyproteins of severe acute respiratory syndrome coronavirus 2 isolates.

It is seen that the glycoproteins are similar in all the isolates. Multiple alignment of the polyproteins of SARS-CoV-2 is shown in Fig. S1 (ESI).

Based on the polyproteins function in the SARS CoV and its identity to the SARS-CoV-2, it is possible that it has the same functions in SARS-CoV-2 as an RNA polymerase which does de novo initiation and primer extension with possible exonuclease activities, the activity itself being primer dependent can be useful for understanding the mechanism of SARS-CoV-2 replication and can be used as an antiviral target.31–34

The two parts of the main protein from the whole genome of SARS-CoV-2 aligned with two SARS CoV proteins and the ligand binding sites were similar; the alignment positions, number of amino acids and ligands and the interacting residues are given in Table 3.

Table 3 Main protein with a sequence length of 4409 aa of SARS-CoV-2 viruses showing structural alignment with two other proteins of SARS-CoV
Template ID Template title Alignment positions Number of aa Ligands Interacting residues
3e9s.1 A new class of papain-like protease/deubiquitinase inhibitors blocks SARS virus replication 1568–1882 315 TTT Chain A: L.1729, G.1730, D.1731, P.1814, P.1815, Y.1831, Y.1835, Q.1836, Y.1840, T.1868
2a5i.1 Crystal structures of SARS coronavirus main peptidase inhibited by an aza-peptide epoxide in the space group C2 3268–3573 306 AZP Chain A: T.3292, T.3293, H.3308, M.3316, Y.3321, F.3407, L.3408, N.3409, G.3410, S.3411, C.3412, H.3430, H.3431, M.3432, E.3433, P.3435, H.3439, D.3454, R.3455, Q.3456, T.3457, A.3458, Q.3459


The polyprotein also has an identity of 19.74 percent to an ABC-type uncharacterized transport system periplasmic component-like protein; this protein is known to be a substrate-binding protein and possible binding can be explored here.35

The homology model developed from the residues 254 to 13[thin space (1/6-em)]480 in Frame 2 with 4409 amino acids from the complete genome sequence of the SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank Accession Number MN908947.3), which has 29903 bp with linear ss-RNA showed interesting template alignments, in all the model aligned with 50 templates from the PDB database with most of them being replicase polyprotein 1ab, which is a SARS-CoV papain-like protease.36 The maximum similarity of 97.3 percent was with the template structure of a Nsp9 protein from SARS-coronavirus, indicating that this novel coronavirus has high degree of similarity with the SARS-coronavirus and this can be used for gaining insights into vaccine development. Nsp9 is an RNA-binding protein and has an oligosaccharide/oligonucleotide fold-like fold; this protein can have an important function in the replication machinery of the virus and can be important when designing antivirals for this virus.37

Two models were developed, one of SARS-CoV-2 3CLpro protein from residues 3268–3573 in Frame 2 with 306 amino acids and the other of SARS-CoV-2 PLPro protein from the part of the genome residues 1568–1882 in Frame 2 with 315 amino acids of the SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank Accession Number MN908947.3). The models exhibited similarity to the 3C-like proteinase and a papain-like protease/deubiquitinase proteins, which are known antiviral drug targets. These 3CLpro and PLpro models constitute starting points for anti-SARS-CoV-2 drug design as the corresponding SARS proteins are validated drug targets.38,39

Ligand binding with these proteins and their action on viral replication and inactivation can be useful in stopping the viral replication.40 The homology models of the 4409 amino acid residues of the SARS-CoV-2 isolate Wuhan-Hu-1 with the ligand association with templates 2a5i and 3e9s are shown in Fig. 2 and 3 respectively.


image file: d0nj00974a-f2.tif
Fig. 2 Homology model of SARS-CoV-2 3CLpro derived from the SARS PDB template 2a5i with aza-peptide epoxide (APE) ligand binding.

image file: d0nj00974a-f3.tif
Fig. 3 Homology model of SARS-CoV-2 3CLpro derived from the SARS PDB template 3e9s with GRL0617 ligand binding.

The statistics of structural comparison with PDB templates is given in Table 4; it is seen that the proteins from SARS-CoV-2 are significantly close to the proteins of SARS-CoVs and the amino acid alignments in the binding region of both the viruses are the same.

Table 4 Statistics of structural comparison with PDB templates
Structure Template Similarity No. of equivalent positions RMSD Raw score
3e9s_SARS-CoV-2 3e9s Significantly similar 314 0.10 935.61
2a5i_SARS-CoV-2 2a5i Significantly similar 306 0.08 911.72


The alignment of the 305 residues from 3268–3573 aa of the novel SARS-CoV-2 with the template 2a5i is shown in Fig. 4, and the alignment of the 315 residues from 1568–1882 aa of the novel SARS-CoV-2 with the template 3e9s is shown in Fig. 5.


image file: d0nj00974a-f4.tif
Fig. 4 Alignment of the 305 residues from 3268–3573 aa of the novel SARS-CoV-2 with the template 2a5i.

image file: d0nj00974a-f5.tif
Fig. 5 Alignment of the 315 residues from 1568–1882 aa of the novel SARS-CoV-2 with the template 3e9s.

A PSI-BLAST with a length of 306 amino acid residues, 3268–3573, in Frame 2 from the SARS-CoV-2 isolate Wuhan-Hu-1 (GenBank Accession Number MN908947.3) was conducted to ascertain the conservation of these amino acids in 831 genome sequences of SARS-CoV-2, and it was found that there was a complete match in these genomes of the virus. The fact that the region is conserved in all these SARS-CoV-2 sequences further emphasizes that this ligand interaction of an aza-peptide epoxide with the protein can be used as an antiviral in SARS-CoV-2. Similarly, a PSI-BLAST of a length of 315 amino acid residues, 3268–3573, in Frame 2 and 315 amino acid residues, 1568–1882, in Frame 2 from SARS-CoV-2 virus isolate Wuhan-Hu-1 (GenBank Accession Number MN908947.3) was conducted to ascertain the conservation of these amino acids in 831 genome sequences of SARS-CoV-2, and it was found that there was a complete match in these genomes of the virus. The fact that the region is conserved in all these SARS-CoV-2 sequences further emphasizes that this interaction of ligand GRL0617 with the protein can be used as an antiviral in SARS-CoV-2.

The important templates that aligned with these 4409 amino acid residues of the whole genome of the SARS-CoV-2 isolate Wuhan-Hu-1 were 2a5i of the PDB database, which is a crystal structure of SARS coronavirus main peptidase inhibited by an aza-peptide epoxide in the space group C239 and 3e9s of the PDB database, which is papain-like protease/deubiquitinase that when combined with ligand GRL0617 acts as an inhibitor of SARS virus replication.38 The model with template 2a5i of the PDB database shows that an aza-peptide epoxide (APE; kinact/Ki = 1900 (±400) M−1 s−1), which is a known anti-SARS agent can be used to develop a molecular target with irreversible inhibitor properties. The substrate-binding properties and structural and chemical complementarity of this aza-peptide epoxide could be explored as an anti-SARS-CoV-2 agent. The structure of APE which is ethyl (2S)-4-[(3-amino-3-oxo-propyl)-[[(2S)-2-[[(2S)-4-methyl-2-phenylmethoxycarbonylamino-pentanoyl]amino]-3-phenyl-propanoyl]amino]amino]-2-hydroxy-4-oxo-butanoate with covalent bond formed with the catalytic cysteine and open epioxide groups producing the hydroxyl groups is shown in Fig. 6.


image file: d0nj00974a-f6.tif
Fig. 6 Structure of the aza-peptide epoxide (APE) with covalent bonds formed between the catalytic cysteine residue and open epioxide groups producing hydroxyl groups.

The model with template 3e9s of the PDB database shows that the coronavirus virus PLPro can complex with a ligand GRL0617 known to be a potent inhibitor of viral replication in SARS.38

The genome of MN908947.3 SARS-CoV-2 virus isolate Wuhan-Hu-1 encodes a 4409 aa long protein along with the other glycoproteins and polyproteins. The homology modelling of this protein showed sequence and structural alignment with two SARS proteases with structural accession numbers 3e9s and 2a5i at positions 1568–1882 and 3268–3573 respectively. The results suggest that the inhibition of virus replication by the TTT ligand and an aza-peptide epoxide occurs via binding with PLpro and 3CLpro respectively. The structural similarity of these templates are 83% and 96% respectively. The multiple sequence alignment shows complete conservation of the sequence, suggesting a high degree of homology. The comparison of hydrophobic interaction, hydrogen bonds, and salt bridges of the constructed model of the novel coronavirus protein from positions 3268–3573 aa to those of the template 2a5i with the ligand AZP is given in Table S2 (ESI). On comparison, it was observed that the binding properties are the same except for the presence of a water bridge in the template 2a5i.

The comparison of hydrophobic interaction, hydrogen bonds, π-stacking of the constructed model of the novel coronavirus protein from positions 1568–1882 aa to the ligand small-molecule noncovalent lead inhibitor with those of the template 3e9s is given in Table S3 (ESI). On comparison, it was observed that the binding properties are the same except for an additional π-stacking at Tyr in the template 2a5i. This shows that there is high possibility of binding of these antiviral compounds with the regions of novel coronavirus protein that is in homology with the SARS protein.

The comparison of the hydrophobic interaction for the binding of the ligand AZP between the SARS-CoV-2 protein and the template 2a5i of SARS-CoVs is shown in Fig. 7 and the comparison of the same between the SARS-CoV-2 protein and the template 3e9s of SARS-CoVs is shown in Fig. 8. It was observed that the interaction is the same in both proteins with the same amino acids participating in the interaction, indicating that there is a possibility that these ligands with antiviral properties can bind to the new virus.


image file: d0nj00974a-f7.tif
Fig. 7 Comparison of the hydrophobic interaction of the binding of the ligand AZP between the SARS-CoV-2 protein and the template 2a5i of SARS CoVs.

image file: d0nj00974a-f8.tif
Fig. 8 Comparison of the hydrophobic interaction of the binding of the ligand between the SARS-CoV-2 protein and the template 3e9s of SARS CoVs.

The protein–ligand interaction obtained via docking can provide us important information in determining the effectiveness of the binding in terms of its antiviral properties in the homology models obtained using the 3C-like peptidase (2A5I) and papain-like protease/deubiquitinase (3E9S) proteins as templates of SARS virus.

We used AutoDock Vina that uses a function having an empirical and knowledge-based powerful hybrid scoring; the software employs an optimized search which is iterated till a considerably accepted solution is found for the minimum-energy docking conformations.41 The comparison results of interactions of GRL0617 with the amino acid residues of PLPro and the model obtained using the template 3e9s are shown in Fig. 9a and b, respectively.


image file: d0nj00974a-f9.tif
Fig. 9 (a) Interaction profile of GRL0617 with amino acid residues of the homology model of SARS-CoV-2 papain-like protease. (b) Interaction profile of GRL0617 with amino acid residues of the template 3e9s.

Both show eight interacting amino acids, few of which exhibit multiple interactions. The complex with PLPro shows very high affinity, i.e. −10.2 kcal mol−1 as compared to the complex with the model, which shows lesser affinity, i.e. −7.9 kcal mol−1. The comparison of conserved amino acids show Asp1643 in the homology model and Asp165 in the template, both of which show H bonds at distances of 2.60 and 2.07 respectively. Additionally, Asp165 shows a pi–sigma bond at a distance of 3.53 and pi–anion at a distance of 4.39, accounting for the stronger affinity in PLPro as against the homology model. In the case of Pro1644 in the homology model, there was an alkyl bond at a distance of 4.70, whereas the template shows a pi–alkyl bond at a distance of 5.04, the pi–alkyl bond being stronger than the alkyl bond. Similarly, Pro1632 in the homology model shows a pi–alkyl bond at a distance of 5.06 and the PLPro shows 2 pi–alkyl bonds at distances of 4.31 and 4.72; the two pi–alkyl bonds at a close distance account for the stronger affinity of the template. Ala1635 in the homology model and Leu163 in the PLPro both are hydrophobic amino acids and show alkyl bonds at distances of 3.80 and 4.25 respectively. Ala1635 additionally exhibits a pi–alkyl bond at a distance of 4.24. Thr1642 in the homology model and Gln270 in the PLPro both exhibit H bonds. However, Gln270 exhibits 2 H bonds at distances of 2.83 and 2.74 via its –NH group. Thr1642 exhibits 1 H bond at a distance of 2.62 via its –OH and 1 pi–sigma bond at a distance of 3.87. Phe1636 in the homology model and Tyr269 template are both aromatic amino acids. They both show pi–pi interactions. Phe1636 exhibits pi–pi stacking at a distance of 5.60, whereas Tyr269 exhibits 3 pi–pi T shaped bonds at distances of 5.06, 5.25 and 5.44. It also exhibits an additional H bond at a distance of 3.07.

The comparison results of the interaction of AZP with the amino acid residues of 3CLpro and the model obtained using the template 2a5i are shown in Fig. 10a and b respectively.


image file: d0nj00974a-f10.tif
Fig. 10 (a) Interaction profile of the ligand AZP with the amino acid residues of the homology model of SARS-CoV-2 3C-like protease. (b) Interaction profile of the ligand AZP with the amino acid residues of the template 2a5i.

Both show five interacting amino acids, and the conserved amino acids are Gln3456 in the homology model and Gln110 in 3CLpro, both showing H bonds at distances of 2.40 and 2.39 respectively. The similarities present are Thr3292 in the homology model and Ser158 in 3CLpro both of which show H bonds at distances 2.78 and 2.71 respectively. Both of them have a –OH group that participates in the H bond. Tyr3321 in the homology model and Lys102 in 3CLpro both show H bonds at distances of 2.73 and 2.97 respectively, which is reflective of the electronegative group, i.e. participation of –OH in the former and –NH in the latter. Cys3412 in the homology model and Val297 in 3CLpro both show pi–alkyl bonds at distances of 4.82 and 5.30 respectively. Asp3454 in the homology model and Phe294 in 3CLpro exhibit a H bond at a distance of 1.97, the latter exhibits pi–pi stacking at a distance of 4.51. This is responsible for the slightly higher affinity of AZP to 3CLpro than AZP to the model, the former having an affinity (kcal mol−1) of −7.4 and the latter −7.1.

However, it is also interesting to note that even though alignment studies showed 82% and 96% identity in case of Model 1 (obtained using the template 3E9S) and Model 2 (obtained using the template 2A5I) to PlPro and 3CLpro respectively, the binding cavity interactions/milieu were very similar in the 2nd case in spite of not much conserved amino acid residues and in the former case, the binding cavity showed certain similarity in terms of the cavity milieu, however the intensity varied due to multiple, additional stability.

We were able to see the difference in the protein–ligand interaction in both the models by docking these ligands to the whole surface of a protein, as we had no prior knowledge of the target pocket. As the docking involved several runs and energy calculations for arriving at a favorable protein–ligand complex, the interactions observed showed that the interaction profile of the ligand AZP with amino acid residues of the homology model of SARS-CoV-2 3C-like protease showed an affinity of −7.1 kcal mol−1 and the interaction profile of GRL0617 with amino acid residues of the homology Model of SARS-CoV-2 papain-like protease showed an affinity of −7.9 kcal mol−1.

The similarity in the amino acids involved in the hydrophobic interactions that are short-range interactions and have an important role in the affinities of the ligands and receptors shows that the proteins of SARS-CoV-2 may bind with the same affinity as seen in SARS-CoVs, and this also shows a similar action of the ligand as seen in SARS-CoVs, indicating that these ligands could possibly be used as antivirals in SARS-CoV-2.

The targeting of this part of the genome of SARS-CoV-2 with the antiviral compounds that have shown to bind in the similar region of the SARS virus can have implications in the development of an effective antiviral compound against SARS-CoV-2. SARS-CoV-2 shows homology with the SARS coronaviral proteases, papain-like protease (PLpro) and 3C-like protease (3CLpro), and PLPro has the function of processing the viral polyprotein and also perform the function of stripping ubiquitin and the ubiquitin-like interferon (IFN)-stimulated gene 15 (ISG15) from the hosts to facilitate coronavirus replication and help in evading the immune response of the host. These inhibitors can also play a role in disrupting signalling cascades in infected cells, protecting the uninfected cells.

The chemical GRL0617 is 5-amino-2-methyl-N-[(1R)-1-(1-naphthalenyl)ethyl]benzamide and is known to inhibit the papain-like protease enzyme present in SARS-CoVs. This protease is a potential target for antiviral compounds.42,43 We found that SARS-CoV-2 PLPro has homology with SARS-CoV PLPro which complexes with ligand GRL0617 whose binding sites for protease in the structural protein of SARS-CoV-2 are very similar. This compound inhibits the enzyme that is required for the cleavage of the viral protein from SARS-CoVs. It also cleaves ubiquitin and has a structural homology with deubiquitinases (DUBs) of the ubiquitin-specific protease compound GRL0617 binding in the S4 and S3 enzyme subsites that gets the C terminal tail of ubiquitin.44,45 Our results indicate that an aza-peptide epoxide an irreversible protease inhibitor and GRL0617 a viral replication inhibitor can possibly be used to develop antivirals against novel SARS-CoV-2.

Authors contribution

AKS conducted the study, planned and executed the work, did homology modelling, analysed data, wrote the MS. DB analysed whole genome data, did protein–ligand interaction studies. AA did sequence alignment and analysis of whole genome data. SG did docking protein and ligand studies.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

AKS wishes to acknowledge Chitra Shanker for constant encouragement. AKS wishes to thank Geetha Vinodhini for English corrections. The authors wish to acknowledge Director, ICAR-Central Research Institute for Dryland Agriculture (ICAR-CRIDA) and Head, Department of Crop Sciences, ICAR-CRIDA for encouragement during the study. AKS wishes to acknowledge B. Venkateswarlu former Director, ICAR-CRIDA for encouragement.

References

  1. World Health Organization (WHO), 2004, accessed 11 Feb 2020, https://www.who.int/csr/don/2004_05_18a/en/.
  2. E. de Wit, N. van Doremalen, D. Falzarano and V. J. Munster, Nat. Rev. Microbiol., 2016, 14, 523–534 CrossRef CAS PubMed.
  3. E. de Wit, A. L. Rasmussen, D. Falzarano, T. Bushmaker, F. Feldmann, D. L. Brining, E. R. Fischer, C. Martellaro, A. Okumura, J. Chang and D. Scott, Proc. Natl. Acad. Sci. U. S. A., 2013, 110(41), 16598–16603 CrossRef CAS PubMed.
  4. World Health Organization (WHO). Coronavirus. Geneva: WHO, 2020, accessed 4 Feb 2020, available from: https://www.who.int/health-topics/coronavirus.
  5. World Health Organization (WHO), 2020, accessed 27 April 2020, https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200423-sitrep-94-covid-19.pdf?sfvrsn=b8304bf0_4.
  6. D. Forni, R. Cagliani, M. Clerici and M. Sironi, Trends Microbiol., 2017, 25(1), 35–48 CrossRef CAS PubMed.
  7. C. M. Luo, N. Wang, X. L. Yang, H. Z. Liu, W. Zhang, B. Li, B. Hu, C. Peng, Q. B. Geng, G. J. Zhu and F. Li, J. Virol., 2018, 92(13), e00116–e00118 CAS.
  8. L. Du, Y. He, Y. Zhou, S. Liu, B. J. Zheng and S. Jiang, Nat. Rev. Microbiol., 2009, 7(3), 226–236 CrossRef CAS PubMed.
  9. Y. Chen, S. N. Savinov, A. M. Mielech, T. Cao, S. C. Baker and A. D. Mesecar, J. Biol. Chem., 2015, 290(42), 25293–25306 CrossRef CAS PubMed.
  10. J. S. Morse, T. Lalonde, S. Xu and W. R. Liu, ChemBioChem, 2020, 21, 730 CrossRef CAS PubMed.
  11. K. Anand, J. Ziebuhr, P. Wadhwani, J. R. Mesters and R. Hilgenfeld, Science, 2003, 300(5626), 1763–1767 CrossRef CAS PubMed.
  12. T. Pillaiyar, M. Manickam, V. Namasivayam, Y. Hayashi and S. H. Jung, J. Med. Chem., 2016, 59(14), 6595–6628 CrossRef CAS PubMed.
  13. P. Mukherjee, P. Desai, L. Ross, E. L. White and M. A. Avery, Bioorg. Med. Chem., 2008, 16(7), 4138–4149 CrossRef CAS PubMed.
  14. N. Barretto, D. Jukneliene, K. Ratia, Z. Chen, A. D. Mesecar and S. C. Baker, J. Virol., 2005, 79(24), 15189–15198 CrossRef CAS PubMed.
  15. T. Sulea, H. A. Lindner, E. O. Purisima and R. Menard, J. Virol., 2005, 79(7), 4550–4551 CrossRef CAS PubMed.
  16. D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp and D. L. Wheeler, Nucleic Acids Res., 2000, 28(1), 15–18 CrossRef CAS PubMed.
  17. T. Hall, I. Biosciences and C. Carlsbad, GERF Bull. Biosci., 2011, 2(1), 60–61 Search PubMed.
  18. S. Kumar, G. Stecher, M. Li, C. Knyaz and K. Tamura, Mol. Biol. Evol., 2018, 35(6), 1547–1549 CrossRef CAS PubMed.
  19. E. Gasteiger, A. Gattiker, C. Hoogland, I. Ivanyi, R. D. Appel and A. Bairoch, Nucleic Acids Res., 2003, 31(13), 3784–3788 CrossRef CAS PubMed.
  20. E. Gasteiger, C. Hoogland, A. Gattiker, M. R. Wilkins, R. D. Appel and A. Bairoch, Protein identification and analysis tools on the ExPASy server, in The proteomics protocols handbook, Humana Press, 2005, pp. 571–607 Search PubMed.
  21. T. A. Kumar, Wide Spectrum, 2013, 1(9), 15–19 Search PubMed.
  22. A. Waterhouse, M. Bertoni, S. Bienert, G. Studer, G. Tauriello, R. Gumienny, F. T. Heer, T. A. P. de Beer, C. Rempfer, L. Bordoli and R. Lepore, Nucleic Acids Res., 2018, 46(W1), W296–W303 CrossRef CAS PubMed.
  23. M. Biasini, T. Schmidt, S. Bienert, V. Mariani, G. Studer, J. Haas, N. Johner, A. D. Schenk, A. Philippsen and T. Schwede, Acta Crystallogr., Sect. D: Biol. Crystallogr., 2013, 69(5), 701–709 CrossRef CAS PubMed.
  24. S. Salentin, S. Schreiber, V. J. Haupt, M. F. Adasme and M. Schroeder, Nucleic Acids Res., 2015, 43(W1), W443–W447 CrossRef CAS PubMed.
  25. T. Schwede, J. Kopp, N. Guex and M. C. Peitsch, Nucleic Acids Res., 2003, 31(13), 3381–3385 CrossRef CAS PubMed.
  26. R. A. Laskowski, E. G. Hutchinson, A. D. Michie, A. C. Wallace, M. L. Jones and J. M. Thornton, Trends Biochem. Sci., 1997, 22(12), 488–490 CrossRef CAS PubMed.
  27. O. Trott and A. J. Olson, J. Comput. Chem., 2010, 31(2), 455–461 CAS.
  28. S. A. Yost and J. Marcotrigiano, Curr. Opin. Virol., 2013, 3(2), 137–142 CrossRef CAS PubMed.
  29. R. N. Kirchdoerfer and A. B. Ward, Nat. Commun., 2019, 10(1), 1–9 CrossRef CAS PubMed.
  30. A. R. Fehr and S. Perlman, Coronaviruses: an overview of their replication and pathogenesis, in Coronaviruses, Humana Press, New York, NY, 2015, pp. 1–23 Search PubMed.
  31. A. J. Te Velthuis, S. H. van den Worm and E. J. Snijder, Nucleic Acids Res., 2012, 40(4), 1737–1747 CrossRef CAS PubMed.
  32. A. J. Te Velthuis, J. J. Arnold, C. E. Cameron, S. H. van den Worm and E. J. Snijder, Nucleic Acids Res., 2010, 38(1), 203–214 CrossRef CAS PubMed.
  33. L. Subissi, I. Imbert, F. Ferron, A. Collet, B. Coutard, E. Decroly and B. Canard, Antiviral Res., 2014, 101, 122–130 CrossRef CAS PubMed.
  34. L. Subissi, C. C. Posthuma, A. Collet, J. C. Zevenhoven-Dobbe, A. E. Gorbalenya, E. Decroly, E. J. Snijder, B. Canard and I. Imbert, Proc. Natl. Acad. Sci. U. S. A., 2014, 111(37), E3900–E3909 CrossRef CAS PubMed.
  35. J. E. Bae, I. J. Kim, K. J. Kim and K. H. Nam, Biochem. Biophys. Res. Commun., 2018, 497(1), 368–373 CrossRef CAS PubMed.
  36. C. M. Daczkowski, J. V. Dzimianski, J. R. Clasman, O. Goodwin, A. D. Mesecar and S. D. Pegan, J. Mol. Biol., 2017, 429(11), 1661–1683 CrossRef CAS PubMed.
  37. M. P. Egloff, F. Ferron, V. Campanacci, S. Longhi, C. Rancurel, H. Dutartre, E. J. Snijder, A. E. Gorbalenya, C. Cambillau and B. Canard, Proc. Natl. Acad. Sci. U. S. A., 2004, 101(11), 3792–3796 CrossRef CAS PubMed.
  38. K. Ratia, S. Pegan, J. Takayama, K. Sleeman, M. Coughlin, S. Baliji, R. Chaudhuri, W. Fu, B. S. Prabhakar, M. E. Johnson and S. C. Baker, Proc. Natl. Acad. Sci. U. S. A., 2008, 105(42), 16119–16124 CrossRef CAS PubMed.
  39. T. W. Lee, M. M. Cherney, C. Huitema, J. Liu, K. E. James, J. C. Powers, L. D. Eltis and M. N. James, J. Mol. Biol., 2005, 353(5), 1137–1151 CrossRef CAS PubMed.
  40. Y. M. Baez-Santos, S. E. S. John and A. D. Mesecar, Antiviral Res., 2015, 115, 21–38 CrossRef CAS PubMed.
  41. N. M. Hassan, A. A. Alhossary, Y. Mu and C. K. Kwoh, Sci. Rep., 2017, 7(1), 1–13 CrossRef CAS PubMed.
  42. R. Chaudhuri, S. Tang, G. Zhao, H. Lu, D. A. Case and M. E. Johnson, J. Mol. Biol., 2011, 414(2), 272–288 CrossRef CAS PubMed.
  43. H. A. Lindner, N. Fotouhi-Ardakani, V. Lytvyn, P. Lachance, T. Sulea and R. Menard, J. Virol., 2005, 79(24), 15199–15208 CrossRef CAS PubMed.
  44. R. W. King and D. Finley, Nat. Chem. Biol., 2014, 10(11), 870 CrossRef CAS PubMed.
  45. N. J. Schauer, R. S. Magin, X. Liu, L. M. Doherty and S. J. Buhrlage, J. Med. Chem., 2019, 63(6), 2731–2750 CrossRef PubMed.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/d0nj00974a

This journal is © The Royal Society of Chemistry and the Centre National de la Recherche Scientifique 2020