Yiwen
Hu
ab and
Markus J.
Buehler
*abc
aLaboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA. E-mail: mbuehler@mit.edu; Tel: +1-617-452-2750
bDepartment of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
cCenter for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
First published on 20th July 2022
As variants of the pathogen that causes COVID-19 spread around the world, estimates of infectiousness and lethality of newly emerging strains are important. Here we report a predictive model that associates molecular motions and vibrational patterns of the virus spike protein with infectiousness and lethality. The key finding is that most SARS-CoV-2 variants are predicted to be more infectious and less lethal compared to the original spike protein. However, lineage B.1.351 (Beta variant) is predicted to be less infectious and more lethal, and lineage B.1.1.7 (Alpha variant) is predicted to have both higher infectivity and lethality, showing the potential of the virus to mutate towards different performance regimes. The relatively more recent lineage B.1.617.2 (Delta variant), although contains a few key spike mutations other than D614G, behaves quite similar to the single D614G mutation in both vibrational and predicted epidemiological aspects, which might explain its rapid circulation given the prevalence of D614G. This work may provide a tool to estimate the epidemiological effects of new variants, and offer a pathway to screen mutations against high threat levels. Moreover, the nanomechanical approach, as a novel tool to predict virus-cell interactions, may further open up the door towards better understanding other viruses.
First reported in China in 2019,2–4 the novel coronavirus disease, COVID-19, has spread rapidly around the world, resulting in more than 180 million global cases as of July 2021.5,6 SARS-CoV-2, the virus that causes COVID-19, can easily transmit from human to human, compared to previous two highly pathogenic human coronaviruses—namely, SARS-CoV and MERS-CoV. Similar to other human coronaviruses, SARS-CoV-2 enters human cells through the interaction between the receptor-binding domain (RBD) of its spike protein and the human cell receptor, and it shares the same human receptor angiotensin-converting enzyme 2 (ACE2) with SARS-CoV.7 Composed of an amino (N)-terminal S1 subunit and a carboxyl (C)-terminal S2 subunit, the spike protein of coronavirus bears functional significance in the infection process. While the S2 subunit participates more in the membrane fusion, the S1 subunit, consisting of an N-terminal domain (NTD), the receptor-binding domain (RBD) and two smaller intermediate subdomains, is responsible for recognizing and binding to the host cell receptor.8 Previous literature reveals a prerequisite conformational state for receptor binding where at least one RBD should be in upward position, which makes it receptor-accessible, otherwise there would be steric clashes hindering the binding process.9 In experimental work, this specific type of receptor-accessible state has been captured for MERS-CoV, SARS-CoV and SARS-CoV-2.7,8 Given the essential role that the spike protein plays in receptor recognition, viral fusion and cell entry,7,10,11 it is also a promising target for drug and vaccine development.
While vaccination progress has been progressing well in many countries across the world, an emergence of more transmissible SARS-CoV-2 variants has caused great concern.12,13 SARS-CoV-2 D614G variant emerged in late January or early February 2020 and has quickly become the most prevalent form in the global pandemic.14 Not only does the D614G variant adopts a range of more open conformations than the original virus, but it also has added interactions to prevent premature dissociation of the trimer and increase the number of spikes that can facilitate infection.15,16 A number of experimental and statistical research have also shown that D614G variant has enhanced infectivity and transmissibility compared to the wild-type virus.14,17,18 The lineage B.1.1.7 (Alpha variant), first reported in the UK in December 2020, is predicted to be 43 to 90% more transmissible than the predecessor lineage based on statistical and dynamic modeling approaches.12 Moreover, transmission of SARS-CoV-2 between human and animals has also been observed, with the most notable mink mutation reported in Denmark causing the culling of all farmed mink in the country.19 Experimental evidence reveals that while mink-specific mutations do not increase entry into human cells, they do cause partial evasion from neutralization by a therapeutic antibody and convalescent plasma/sera.20 A concerning finding is that, an experiment suggests that mRNA-vaccine activity against SARS-CoV-2 variants that encode E484K-, N501Y- or K417N/E484K/N501Y-mutant spike may be reduced by a small—but significant—margin.21
Some recent research focuses on understanding and predicting epidemiological behavior of SARS-CoV-2 and its variants using different indicators from biological or clinical aspect. By evaluating the binding energy of SARS-CoV-2 spike protein and the host cell receptor with a machine-learning model, researchers report that most likely future mutations will make SARS-CoV-2 more infectious.22 It has also been shown that the viral load can be considered as an independent predictor of the transmissibility and mortality of SARS-CoV-2.23,24 Furthermore, with machine learning technique for natural language processing, the generated semantic landscapes for SARS-CoV-2 are able to predict mutations that could potentially evade the immune system.25
The research reported here focuses on the spike protein of coronavirus, where it makes physical interaction with the human cell receptor ACE2 and then enters human body. We conduct a normal mode analysis (NMA) of different SARS-CoV-2 variants in receptor-accessible conformation with one upward RBD, analyze their vibrational behaviors, and utilize the correlation that we reported in earlier work1 to make predictions on the infectiousness and lethality (Fig. 1).
Compared to previous predictive models,22–25 our presented predictive model is the first one to make predictions based on pure nanomechanical properties, i.e. the average fluctuation level and mobility ratio of receptor-binding domain that we identified in previous work.1 This work may provide a tool to estimate the epidemiological effects of new variants, and offer a pathway to screen mutations against high threat levels. Moreover, the nanomechanical approach, as a novel tool to predict virus-cell interactions, may offer a tool to better understand other viruses.
Variant name | WHO label | First reported | Spike mutations | Mutations implemented in this work |
---|---|---|---|---|
D614G | — | China | D614G | D614G |
Cluster 5 | — | Denmark | del69–70, Y453F, D614G, I692V, M1229I | del69 (chain A), Y453F, D614G, I692V |
Other mink associated mutations | — | Europe | F486L | F486L |
B.1.1.7 (501Y.V1) | Alpha | UK | del69–70, del Y144, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H | del69 (chain A), N501Y, A570D, D614G, T716I, S982A, D1118H |
B.1.351 (501Y.V2) | Beta | South Africa | L18F, D80A, D215G, R246I, K417N, E484K, N501Y, D614G, A701V | D215G, K417N, E484K, N501Y, D614G, A701V |
P.1 (501Y.V3) | Gamma | Brazil | L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, T1027I | D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, T1027I |
B.1.617.2 | Delta | India | T19R, G142D, 156del, 157del, R158G, L452R, T478K, D614G, P681R, D950N | G142D (chain A/C), L452R, T478K, D614G, D950N |
Double mutant | — | India | L452R, E484Q, D614G | L452R, E484Q, D614G |
![]() | (1) |
k(‖r0ij‖) = min {7424·‖r0ij‖–6,1500} | (2) |
We continue to use the similar two important factors that we identify in previous work,1i.e. flexibility level and flexibility ratio, to describe the vibrational behavior, and they are defined as:
![]() | (3) |
Fluctuation ratio = mean (F1, F2…Fn)/mean(f1,f2…fm) | (4) |
![]() | (5) |
The logarithm of global infection number and the logarithm of case fatality rate in percentage are defined as the infectivity level and lethality level in this paper, ensuring that negative values are also meaningful. When constructing the quantitative correlation model, we investigate the accuracy of polynomial model, mixture model of polynomial and exponential function and exponential model. Given the limited data, only linear dependency is explored, in the form:
y = a0 + a1f1(x1) + a2f2(x2) | (6) |
Y = a0 + a1θRBD-B + a2dRBD-B + a3θRBD-C + a4dRBD-C + a5θNTD-A + a6θNTD-B + a7θNTD-C + a8dNTD | (7) |
The fluctuation profile of coronavirus spike proteins, obtained through the use of normal mode analysis, reveals the vibrational flexibility of the spike. In previous work, we have shown that while having similar protein structure and sharing the same type of global movement—the swing of upward receptor-binding domain (RBD), MERS-CoV, SARS-CoV and SARS-CoV-2 spike proteins have quite different fluctuation profile.1
Given the small number of spike mutations compared to the total residue number near 3000, will the mutations make a difference to the flexibility profile? The answer is yes. In Fig. 2, we show the flexibility profiles of three notable SARS-CoV-2 variants that are still circulating globally—B.1.1.7, B.1.351 and P.1, identified by WHO as Variants of Concern (VOC) and labelled as Alpha, Beta and Gamma variant respectively. The fluctuation profiles of these three SARS-CoV-2 variants also follow the general characteristics that we discovered for MERS-CoV, SARS-CoV and SARS-CoV-2 spike proteins.
![]() | ||
Fig. 2 Flexibility profile and spike mutation spots of the three notable SARS-CoV-2 variants that are still circulating globally—B.1.1.7 (Alpha variant), B.1.351 (Beta variant) and P.1 (Gamma variant). (a) Fluctuation profile of SARS-CoV-2 B.1.1.7 (Alpha variant) spike protein and the flexibility difference between this variant and the original spike protein (PDB ID: 6Z97). The flexibility difference is calculated by subtracting the fluctuation of wild-type virus from that of the mutant. Labels on the top indicate the corresponding subunits S1 and S2 in chain A, B and C. The green labels located in the middle of S1 subunit labels demonstrate the locations of receptor binding domain (RBD). (b) equilibrated structure of SARS-CoV-2 B.1.1.7 (Alpha variant) spike protein, showing only the chain with upward RBD. The upward RBD is colored in green and other parts are colored in cyan. The mutation spots in this chain are colored in red and shown as spheres. The standard deviation (RMSD) that occurs during molecular dynamics (MD) relaxation is about 4.94 Å. (c and d) Fluctuation profile and equilibrated structure of SARS-CoV-2 B.1.351 (Beta variant) spike protein respectively, with RMSD around 4.51 Å during MD relaxation. (e and f) fluctuation profile and equilibrated structure of SARS-CoV-2 P.1 (Gamma variant) spike protein respectively, with RMSD around 3.85 Å during MD relaxation. Although B.1.1.7 (Alpha variant), B.1.351 (Beta variant) and P.1 (Gamma variant), also known as 501Y.V1, 501Y.V2 and 501Y.V3, share the important spike mutation N501Y, their fluctuation profiles behave differently. Compared with B.1.1.7 (Alpha variant) and P.1 (Gamma variant), B.1.351 mutant (Beta variant) has more flexible upward RBD and downward RBD. While spike substitutions generally decrease the mobility of RBDs, several residues in the upward and downward RBDs of B.1.351 (Beta variant) obtain increased flexibility due to mutations. |
We find that they all have a much more flexible S1 subunit and a comparatively rigid S2 subunit, with the slowly built-up largest flexibility locating in its upward RBD. Note that the dynamics of stalk hinge (central helix and HR in S2 subunit), which is reported in previous literature,39 is not observed in our analysis because the stalk region is not included in our model. On the other hand, the mutations generally decrease the mobility of N-terminal domains (NTD) and RBDs. No obvious flexibility changes to other regions, which indicates that the rigid regions are pretty stable and retain rigidity. Comparing B.1.1.7 (Alpha variant), B.1.351 (Beta variant) and P.1 (Gamma variant), which also known as 501Y.V1, 501Y.V2 and 501Y.V3 due to the shared N501Y mutation in RBD, there are some differences in their fluctuation profile. In general, the fluctuation difference induced by mutations in B.1.1.7 (Alpha variant) is smaller than the difference due to mutations in B.1.351 (Beta variant) and P.1 (Gamma variant), suggesting B.1.1.7 (Alpha variant) behaves more similar to the original SARS-CoV-2 spike protein. It is also important to note that compared with B.1.1.7 (Alpha variant) and P.1 (Gamma variant), B.1.351 (Beta variant) has larger mobility in both upward and downward RBD. While spike substitutions generally decrease the mobility of RBDs, several residues in the RBD of B.1.351 (Beta variant) obtain increased flexibility due to mutations, showing some specialty of lineage B.1.351 (Beta variant).
We further investigate important angles and distances in the coronavirus spike protein structure to understand the structural parameters that could influence the flexibility level and mobility ratio, the two significant vibrational factors that we identified in previous work.1 According to Fig. 3, among all structural parameters concerned, the factors associated with the upward RBD show a significant positive relationship with the flexibility level. The further upward RBD is from the central axis, the larger its standing angle is, the more “open” the spike conformation is, and thus the higher the vibrational flexibility level is. In terms of the flexibility ratio, the distance of upward RBD and the average distance of NTDs both make significant contributions. While the flexibility ratio increases with the distance of upward RBD, it scales negatively with the average distance of NTDs. Therefore, considering that the flexibility ratio is quantified as the ratio of RBDs in upward and downward conformation, the more tightly the NTDs pack against the central axis, the larger the flexibility ratio is, the more distinct the RBDs in different positions are.
Fig. 4 depicts the quantitative model to predict the infectivity and lethality level of SARS-CoV-2 variants of interest. The nanomechanical vibrational feature map of coronavirus spike proteins and SARS-CoV-2 variants is shown in panel (a). It is demonstrated that generally all variants of concern except B.1.351 (Beta variant) have decreased average fluctuation over upward RBD, which corresponds with the trend we observe with detailed fluctuation profiles in Fig. 2. Compared with the original SARS-CoV-2 spike protein, the variants are moving towards the bottom left side of the diagram, suggesting that their behavior differs even more from the behavior of SARS-CoV and MERS-CoV spike proteins.
![]() | ||
Fig. 4 Nanomechanical vibrational feature map of coronavirus spike proteins and SARS-CoV-2 variants, and analyses of correlations with viral infectivity and lethality, including the B.1.617.2 mutant (Delta variant). (a) Nanomechanical vibrational feature map of coronavirus spike proteins and SARS-CoV-2 variants. The largest sphere denotes MERS-CoV S (PDB ID: 5X5F); two middle-size spheres denote SARS-CoV S (PDB ID: 6CRZ and 6CRW); two small spheres denote SARS-CoV-2 S (PDB ID: 6Z97 and 7DDN). Points in diamond shape denote spike proteins of different SARS-CoV-2 variants. (b) Polynomial fitting model linking fluctuation level and ratio to infectivity, where infectivity level is calculated as the logarithm of global confirmed case number. (c) Polynomial fitting model relating fluctuation level and ratio to lethality, where lethality level is calculated as the logarithm of mortality rate in percentage. (d) Nanomechanical vibrational feature map indicating the possibility of more infectious and more lethal variant. Arrow in red indicates the direction where variants are of the same infectivity but more lethal; arrow in magenta indicates the direction where variants are of the same lethality but more infectious. Small area between two arrows is the phase space to generate both more infectious and more lethal variant. (e) Prediction on epidemiological features of SARS-CoV-2 variants. This analysis shows that, generally, most variants of concern are predicted to be more infectious and less lethal than the original SARS-CoV-2. Lineage B.1.351 (Beta variant) and mutant F486L are predicted to be less infectious and more lethal, while lineage B.1.1.7 (Alpha variant) is anticipated to have both higher infectivity and lethality. All containing N501Y substitution in spike, B.1.1.7 (Alpha variant), B.1.351 (Beta variant) and P.1 (Gamma variant) are predicted to have quite different epidemiological features compared to variant with single N501Y mutation. The result suggests that other spike mutations, which have not been discussed in previous literature, also have significant influence on the infectivity and lethality of SARS-CoV-2 variants. The emerging lineage B.1.617.2 (Delta variant), although contains a few key spike mutations other than D614G, behaves quite similar to the single D614G mutation in both vibrational and predicted epidemiological aspects, which might explain its fast circulation given the prevalence of D614G. From panel (d) we can see that the possibility of both more lethal and more infectious variants is quite small for SARS-CoV-2, but it changes with the vibrational features and may become a serious concern when both general flexibility level and fluctuation ratio of the coronavirus spike protein are large. |
Aside from the qualitative conclusion drawn from panel (a), we establish our quantitative model to further make predictions about the infectivity and lethality level of SARS-CoV-2 variants of interest based on the relationship observed between vibrational and epidemiological properties.1 As shown in panel (b and c), the polynomial fitting model is able to describe the quantitative relationship appropriately, with correlation coefficient R2 = 0.991 and R2 = 0.998 for infectivity and lethality fitting model respectively. The analysis indicates that most variants of interest are predicted to be more infectious and less lethal than the original SARS-CoV-2. Lineage B.1.351 (Beta variant) and mutant F486L are predicted to be both less infectious and more lethal, while lineage B.1.1.7 (Alpha variant) is anticipated to achieve higher infectivity and higher lethality simultaneously. Our model predictions on lineage B.1.1.7 (Alpha variant) not only correspond with its greater transmissibility reported in previous studies,12 but also agree with the results in literature where analysis on positive SARS-CoV-2 community tests show that B.1.1.7 may lead to increased mortality.40 Within the range of error bars, the qualitative predictions of infectiousness and lethality of most SARS-CoV-2 variants remain the same. The only exception would be mutant F486L, which lies in between the two different performance regions and could result in different qualitative predictions.
Focusing on the mutations associated with mink, we find that single mutation Y453F and cluster 5 variant, which contains Y453F and other two spike substitutions, have similar vibrational behavior and predicted epidemiological properties, and this proves the stability of our model to some extent. However, another mink-associated spike mutation, namely F486L, behaves distinctively from the previous two mutants, which might result from its special location at the receptor binding motif (RBM) of RBD.
Here, we can also further compare the predicted epidemiological characteristics of B.1.1.7 (Alpha variant), B.1.351 (Beta variant), P.1 (Gamma variant) and their shared N501Y spike substitution. It is shown that the four of them behave quite differently, with single mutation N501Y predicted to be the most infectious and the least lethal. The different behavior can be considered as the result of the competition between N501Y mutation and other spike mutations—N501Y pushes the variant to be more infectious and less lethal, and other mutations together drag the variant towards lower infectivity and higher lethality. This analysis result agrees with previous findings revealing that N501Y is associated with increased infectivity and virulence of SARS-CoV-2 in a mouse model.41 Notably, we emphasize here that other spike mutations, which have not been discussed in previous work, could also have significant impact on infectivity and lethality of SARS-CoV-2 variants, providing new aspects for further research on SARS-CoV-2 variants.
Both containing spike substitution in residue 484, B.1.351 (Beta variant) and the “double mutant” variant that has two key mutations in B.1.617 are predicted to have smaller fluctuation ratio than the original SARS-CoV-2 spike protein. Considering the comparatively higher fluctuation ratio of B.1.617.2 (Delta variant), which lacks the E484Q mutation present in the “double mutant”, we suggest that the spike substitution in residue 484 may be associated with lower fluctuation ratio in SARS-CoV-2 spike protein, leading to more similar RBDs in different conformations and larger possibility toward multiple receptor-accessible RBDs.
While all current variants of concern except lineage B.1.1.7 (Alpha variant) show single trend of either higher infectivity and lower lethality or lower infectivity and higher lethality, we demonstrate in panel (d) that that the possibility for the variants to be both more lethal and more infectious could change with different vibrational properties. The possibility is larger for MERS-CoV S compared with that of SARS-CoV S and SARS-CoV-2 S. In the top region of Fig. 4(d), the nonlinearity of epidemiological prediction becomes significant, and thus the likelihood of more lethal and more infectious variants could be quite large. Therefore, dangerous mutations could be a serious concern with future coronavirus whose spike protein has comparatively large flexibility level and fluctuation ratio.
Furthermore, noticing that three of the WHO identified Variants of Concern (VOC), namely B.1.1.7 (Alpha variant), P.1 (Gamma variant) and B.1.617.2 (Delta variant), have similar nanomechanical and epidemiological behavior compared with the single D614G variant, which has quickly become the most prevalent form since it occurred, we suggest that aside from the both more lethal and more infectious variants, there might be some area centering the D614G variant in the nanomechanical vibrational feature map where SARS-CoV-2 variants could be quite dangerous and pose serious threat to global public health.
As shown in Fig. 4, most variants of concern are predicted to be more infectious and less lethal than the original SARS-CoV-2. Lineage B.1.351 (Beta variant) and mutant F486L are predicted to be less infectious and more lethal, while lineage B.1.1.7 (Alpha variant) is anticipated to have both higher infectivity and lethality. The different behavior of B.1.1.7 (Alpha variant), B.1.351 (Beta variant) and P.1 (Gamma variant) suggests that spike mutations other than N501Y, which hasn’t been discussed in previous literature, also have significant influence on the infectivity and lethality of SARS-CoV-2 variants. Among all variants of interest, the emerging lineage B.1.617.2 (Delta variant), although contains a few key spike mutations other than D614G, behaves quite similar to the single D614G mutation in vibrational feature map and is predicted to have similar infectivity and lethality, which may help explain its fast circulation considering how quickly D614G has become the prevalent form since its occurrence.
For the SARS-CoV-2 virus, its higher infectivity and lethality performance regime appears to be located in the small area above it, as shown in Fig. 4(d). Given the connection between spike mutation at residue 484 and the lower vibrational fluctuation ratio revealed in Fig. 4, we suggest that spike mutations at residue 484 might help drive SARS-CoV-2 variants towards the special performance regime. What's more, the distinct epidemiological predictions found here for B.1.351 (Beta variant) and P.1 (Gamma variant) implies that the combination of spike mutations K417N, E484K and N501Y might be quite dangerous and could possibly lead to higher lethality. Although only lineage B.1.1.7 (Alpha variant), among all variants of interest, is predicted to have higher infectivity and lethality, the possibility of both more lethal and more infectious variants could change with vibrational features and may indeed become a serious concern when both general flexibility level and fluctuation ratio of coronavirus spike protein are large. This can be studied in more detail in future work, perhaps with the combination of advanced protein folding tools, molecular modeling, and experimental assays.
Overall, this work may provide a tool to estimate the epidemiological effects of new variants, offer a pathway to screen mutations against high threat levels and help design possible inhibitors that compensates for the motions induced by mutations. Moreover, the nanomechanical approach, as a novel tool to predict virus-cell interactions, may offer a tool to better understand other viruses. Considering the limitation of implementing mutations based on the current SARS-CoV-2 spike protein templet with several missing regions, future work may analyze certain missing spike mutations with more complete spike protein structures as templets. Other future work may assess such potential mutations from an evolutionary or genetic perspective, to assess the likelihood of their occurrence, or to identify the conditions under which they may emerge.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d1sm01181b |
This journal is © The Royal Society of Chemistry 2022 |