Inbal
Tuvi-Arad
* and
Yaffa
Shalit
Department of Natural Sciences, The Open University of Israel, Raanana, Israel. E-mail: inbaltu@openu.ac.il
First published on 15th May 2023
A preliminary step in the SARS-CoV-2 human infection process is a conformational change of the receptor binding domain (RBD) of its spike protein, characterized by a significant loss of symmetry. During this process, the residues which later on bind to the human angiotensin converting enzyme 2 (ACE2) receptor, become exposed at the surface of the protein. Symmetry analysis of a data set of 33 protein structures from experimental measurements and 32 structures from molecular dynamics simulation, show that the initial state carries clear indications on the structure of the final state, with respect to the local distortion along the sequence. This surprising finding implies that this type of analysis predicts the mechanism of change. We further show that the level of local distortion at the initial state increases with variant's transmissibility, for the wild type (WT) along with past and present variants of concern (WT ∼ alpha < beta < delta < Omicron BA.1), in accordance with the trend of their evolutionary path. In other words, the initial structure of the variant which is most infectious is also the most distorted, making its path to the final state shorter. It has been claimed that the RBD migration of the spike protein is allosterically controlled. Our analysis provides a quantitative support to a major theorem in this respect – that information about an allosteric process is encoded in the structure itself, suggesting that the path of local distortion is related to an allosteric information network.
In its closed state, the spike protein presented in Fig. 1, is a symmetric homotrimer constructed of two subunits, S1 and S2, covered by glycans. The S1 subunit is responsible for recognizing and binding to the human ACE2 receptor and stabilizing the S2 core. The S2 core contains the fusion machinery of the spike. Binding of S1 to ACE2 exposes the core of S2 and leads to substantial conformational change and membrane fusion. The S1 subunit is constructed of several domains: N-terminal domain (NTD), RBD, and two subdomains (SD1 and SD2), also referred to as C-terminal domains (CTD1 and CTD2).6,7 Short N2R (NTD to RBD) linkers connect the NTD and RBD domains.9 Table S1 in the (ESI†) specifies the residues in each domains. Here we focus on the first stage of the process in which the conformation changes from closed and symmetric (3-Down state) to open and asymmetric (1-Up state). Understanding this process is a key factor in preventing the binding of the spike protein to ACE2. Raghuvamsi et al.10 studied the interaction of the spike protein with ACE2 and suggested that locking the RBD in its closed state, by binding small molecules, would have a therapeutic value. Dokainish et al.,11 found cryptic pockets at the RBD interface. Applying virtual screening methodology, they identified several molecules that may serve as drugs. These can stabilize the protein in an intermediate conformation and prevent the conversion to the 1-Up state.
![]() | ||
Fig. 1 The 3-Down state of the SARS-CoV-2 spike protein of the Omicron BA.1 variant (PDB-ID: 7TF8). The S1 subunit is colored by domains: red-NTD, cyan-N2R, orange-RBD, silver-SD1, purple-SD2. The S2 subunit is colored light blue. |
Several studies suggested that the 3-Down to 1-Up conformational change of the Sars-CoV-2 spike protein is allosterically controlled, but the complete mechanism hasn't been fully explored.12–15 Gobeil et al. showed that the D614G mutation in the SD2 domain has an allosteric effect leading to the alteration of the Up/Down RBD conformation.15 They further claimed that variations in remote regions, i.e., the SD1 and SD2 domains, play an essential role in modulating the spike allostery and affecting the conformational change.6 Molecular dynamics (MD) simulations conducted by Verkhivker12 confirmed the role of the D614G mutation in modulating the allosteric communication, and suggested that allosteric coupling between stable regulatory centers and conformationally adaptable hotspots determine the binding affinity and long-range communications of the SARS-CoV-2 complexes with nanobodies.
In a recent review, Thirumalai et al.16 focused on the relation between symmetry and allostery and stressed that while there is high variability of pathways that connect allosteric states, the sensitivity of allosteric signaling to the structure of the specific system, implies that the ability for allostery is encoded in the structure itself. The network of the most dominant residues that transmit the allosteric signal defines an allosteric wiring diagram, that controls the information transfer between the two sites.16 While the variability of pathways that connect the binding site and the allosteric site can be high, the sensitivity of allosteric signaling to the structure of the specific system, implies that the ability for allostery is generally encoded in the apo state. In other words, the structural elements that construct the network of information could potentially be deduced by the structure of the protein in the absence of a ligand.16 Thirumalai et al. describe allosteric signaling as a strain propagation process, and postulate that specific regions in the protein must be stiff enough to absorb this strain. In addition, allosteric states must have lower symmetry than the disordered regions in order to transmit the signal across the structure.16 This view is in line with Papaleo et al.17 that summarized multiple evidences that flexible regions in the protein, such as loops and linkers play an important part in modulating allosteric processes. On the one hand, if the linkers are flexible and pre-encoded conformational sub-states of the linkers exist, they can lower the barriers for conformational transitions between connected domains and accelerate the allosteric process. On the other hand, when these linkers are stiff, they can prevent unfavorable inter-domain contacts, help separate between the domains, and affect the biological function.
Conformational modifications during an allosteric process naturally change the proteins' level of symmetry. This effect can be examined globally (for the whole protein) or locally (for domains or fragments of them). Experimental measurements by Zhang et al.2 showed that the D614G mutation, which is common to all the variants, increases the distortion of the 3-Down conformation of the spike protein, that is, creates a slightly more open conformation as compared with the WT trimer. Quantifying the distortion and the change of symmetry, may improve our ability to understand and control the allosteric mechanism. Continuous symmetry measures (CSMs)18–21 and the related continuous chirality measure (CCM)22 are particularly useful for this purpose. These molecular descriptors estimate the level of distortion of molecules and proteins with approximate symmetry by calculating the distance between a given molecule and its nearest symmetric (or achiral) structure. Recently we modified the method by utilizing the connectivity map of the molecules at hand in order to reduce the number of permutations that the code needs to scan. This modification created a fast, extensive and adequate method to calculate this set of three-dimensional descriptors, that are applicable for both molecules and proteins.21,23 These measures have the capability to capture the most subtle conformational changes during chemical processes.24–28 In recent years the method was used in various studies on protein structure. Bonjack and Avnir29 studied domain swapping with CSMs. They used a running ruler approach to calculate the deviation from C2-symmetry of homodimer fragments in order to identify the hinge region in each protein. Wang et al.30 introduced a chirality spectrum that presents the distance of each residue along the sequence from its nearest achiral structure. They showed that peaks in this spectrum indicate local distortive regions along the protein sequence due to helix kinks, β-sheets twists and secondary structure junctions. Recently, we used the CCM of residues as a conformational similarity descriptor to classify the tendency of amino acids to distort protein homodimers.31 Here we build on the above studies and apply the CSM and CCM methodology to analyze the changes of symmetry during the RBD migration of the SARS-CoV-2 spike protein, in an attempt to reveal the relation between symmetry, structure and function of this protein.
State | All spike | S1 | S2 | RBD | NTD | SD1 | SD2 | |
---|---|---|---|---|---|---|---|---|
3-Down (15) | Mean | 0.047 | 0.086 | 0.014 | 0.126 | 0.080 | 0.044 | 0.084 |
SD | 0.044 | 0.082 | 0.009 | 0.112 | 0.079 | 0.052 | 0.080 | |
1-Up (18) | Mean | 1.295 | 2.622 | 0.022 | 11.727 | 0.083 | 0.076 | 0.071 |
SD | 0.178 | 0.420 | 0.009 | 1.000 | 0.037 | 0.033 | 0.064 |
Table 1 highlights several observations. (1) Looking at the overall protein (“All spike” column), the symmetry of the 3-Down state is higher (smaller CSM values) than the 1-Up state. While this is expected, we note that the first is not zero, meaning that even at the closed state the protein is not perfectly symmetric. This result is in accordance with pervious findings regarding the CSM levels of symmetric proteins.23,31,33 (2) The S1 subunit is more distorted then the S2 subunit, and within the S1 subunit, it is the trimer of the RBD domains that exhibits the highest distortion. This is clearly evident for the open state, and to a smaller extent for the closed, 3-Down, state as well. As will be shown below, the distortion of the RBD trimer carries information about the migration process even before the process started. (3) The distortion levels of both states are similar in regions that maintain high symmetry (S2 subunit and the NTD, SD1 and SD2 domains) teaching that the symmetry of regions outside the RBD are mostly unaffected by the migration process. It should be noted that the CSM is not an additive function. Therefore, the CSM of the whole spike protein need not be equal to the sum of the CSMs of its subunits or domains. Indeed, it displays much more moderate levels of distortion, since the RBD is only a small portion of the overall structure. (4) When the CSM is small, the SD is relatively high. This is a known property of the CSM. As was demonstrated previously, the CSM distribution is commonly asymmetric around the mean, and tends to have a Gamma or Log-Normal distribution with a long tail, portraying structures with relatively high distortion.34 This by definition increases the SD of the CSM, particularly when the range of the CSM is small. Since this happens for relatively symmetric structures, it has a minor affect on the rest of our analysis.
![]() | ||
Fig. 2 A partial snapshot of the Omicron BA.1 spike protein near the RBD region. (a) 3-Down conformation (PDB-ID: 7TF8). (b) 1-Up conformation (PDB-ID: 7TO4). Orange: the migrating RBD (chain A). Residues P330 and P527 (in blue) mark the boundaries of the RBD domain on chain A (P527 is hidden in the 1-Up conformer). Pink and dark red: The other two RBD domains. Silver: SD1 domains. Cyan: N2R linkers between the NTD and RBD domains. |
We applied the CCM of each residue (including its side chain), as a three-dimensional conformational similarity descriptor, and analyzed the conformations of the residues along the RBD in order to characterize the changes due to the migrating domain. In previous studies we showed that the variability of the CCM along the sequence of a protein stems from differences in the secondary structures and the side chains.30,31Fig. 3 shows the correlation plot between the CCM per residue on chain A and chain B, for the 1-Up state of the Omicron BA.1 variant shown in Fig. 2. Chain A is the migrating chain. The Pearson correlation factor between the two chains is 0.76. Pearson correlation factors of 0.81 and 0.80 were obtained between chains A and C, and chains B and C respectively. The correlation factors for the 3-Down state were similar: 0.75, 0.76 and 0.79 for A–B, A–C and B–C respectively (see Fig. S1 in the ESI† for the correlation plots). The fact that the correlation is not perfect teaches us that the chain opening process may involve conformational change of the residues apart for the rotation of the whole domain. To draw more conclusive conclusions on this issue, we applied a statistical approach and calculated the Pearson correlation factors to our set of 33 spike proteins. The results are presented in Fig. 4 in the form of box and whisker plots. At the 3-Down state, no significant difference is shown between the different chains and the median correlation factor is 0.84. While the conformation of pairs of equivalent residues on each two chains may differ, these differences spread equally between the three chains. For the 1-Up state, this is not the case. Prior to this analysis the chain names were set such that A is always the migrating chain. Fig. 4 shows that the conformational similarity of residues on chains B and C, which do not migrate, is comparable to the 3-Down state. The correlation factor decreases when the migrating chain A is considered, teaching that the RBD migration involves local conformational changes apart for the domain rotation. CCM analysis at the backbone level, presented in Fig. S2 (ESI†), shows similar trends.
![]() | ||
Fig. 3 Correlation between the residues' CCM on chain A and chain B for the 1-Up state of the Omicron BA.1 variant. Red line represents the correlation line. Chain A is the migrating RBD domain. PDB-ID: 7TO4. |
The conformational changes are manifested in the secondary structure as well, as can be seen in Fig. S3 in the ESI† that displays similar trends to those in Fig. 4. Another perspective of the structural change comes from looking at the distribution of residues within the different secondary structures. Taking the full set of proteins in our study, regardless of the variant, we found that for the 3-Down conformer, on average, 60% of the residues in the RBD are characterized with flexible secondary structures (loops and irregular elements, bends and turns) that are more likely to deliver structural changes.17 The rest of the residues are more confined: 25% are in β-strands structures and 14% are in different types of helices. For the 1-Up the numbers changes to 65%, 24% and 11% respectively, showing that the conformational change is accompanied by increased flexibility of the RBD backbone on the expanses of decrease in helix type structures. The increase in flexibility supports our finding that the RBD opening mechanisms is more than a pure rotation over a hinge, and involves local conformational changes as well.
![]() | ||
Fig. 5 Distortion with respect to C3 symmetry of 10-residues fragments (with all atoms included) along the sequence of the RBD domain of the Omicron BA.1 variant of the SARS-CoV-2 spike protein. Black: 3-Down state, left Y scale (PDB-ID: 7TF8). Red: 1-Up state, right Y scale (PDB-ID: 7TO4). |
The nature of the CSM parameter is such that when branches of a symmetric structure drift apart as a whole, the CSM decreases, due to the normalization factor in eqn (1). In Fig. 5 we see the opposite trend – one RBD chain drifts apart from the other two, but the CSM increases, approximately by an order of magnitude. This can be explained by noting that the geometrical change is not a simple translation with respect to the original C3 rotation axis, but involves rotation (as suggested by the snapshots of Fig. 2) along with local conformational changes of the side chains as discussed above.
Several experimental studies and MD calculations suggested that the RBD domain rotates through a hinge, although its specific location is not always specified and varies between measurements and variants6,7,36,37 For the Omicron BA.1 variant, Verkhivker12 showed that several residues act as rigid hinges, i.e., F318, L387, and F429. Our calculations show that these residues are in regions of relatively conserved symmetry even at the 1-Up state. F318 is outside the RBD domain, in region of high symmetry like the rest of the protein. The other two residues are at the vicinity of the second and third minimum points on the left part of the curves in Fig. 5. The numbers on the x axis of the CSM spectrum mark the first index of a 10-residues fragment. Therefore, residues at the minimum points are not the only source for a fragment's CSM value and one shouldn't expect a perfect match. The MD study of Fallon et al.7 on the WT variant showed that a salt bridge between D389 and K528 on the RBD is broken upon RBD rotation. They defined the RBD between residues 335 and 530 which is slightly different then the range used here, but this small difference does not change the shape of the CSM spectra in Fig. 5. D389 is close to L387 found by Verkhivker,12 supporting the concept of a hinge in this region. Analysis of the 3-Down structure in Fig. 5 (PDB-ID: 7TF8) with VMD38 did not reveal a salt bridge between these or adjacent residues, suggesting that the specific locations of such bridges vary between variants. In this respect our approach of analyzing fragments of residues rather than specific residues seems more inclusive. All mentioned residues generally reside in CSM valleys, i.e., with relatively conserved symmetry. Similar trends are seen for the CSM spectra of other variants as well, as discussed below. Finally, we note that the distortion near D389 is higher than near K528, suggesting that movement around D389 on the migrating domain is more significant than around D528.
Having discussed the valleys in the CSM spectrum, we turn to explore its peaks in which the distortion is relatively high. Fig. 6 shows snapshots of the RBD domains for the two states in which residues at the tip of each peak are presented with van der Waals spheres colored by their secondary structure, as determined by VMD.38 Many of the residues in peak regions are located on the surface of the RBD domains in regions responsible for the interaction with the ACE2 receptor. Broadly considered, these regions are bound between G446 (S446 for Omicron BA.1) and V510.3 Generally, the secondary structure in peak regions is similar in both conformers, and as discussed above, they tend to be flexible. It is interesting to note that the peaks spread every ca. 30 residues that lie in close proximity to each other as shown in Fig. 6. This finding suggests that non-bonding interactions between these residues may form an information network, or an allosteric wiring diagram. Further research is needed to explore this direction.
![]() | ||
Fig. 6 Snapshots of the RBD domains of the Omicron BA.1 spike protein. Bottom: 3-Down (PDB-ID: 7TF8), residues with CSM ≥ 1 are colored. Top: 1-Up state (PDB-ID: 7TO4), residues with CSM ≥ 15 are colored. Colors follow the secondary structure: blue-310-helix, cyan-turns, purple-α-helix, ice-blue-Loops and disordered elements, yellow-β-strands. Ribbons mark the RBD chains. The migrating chain in the 1-Up model is marked with orange ribbons. |
![]() | ||
Fig. 7 CSM spectra of 10-residues fragments along the sequence of the RBD for 33 SARS-CoV-2 spike protein, averaged by variant. (A) 3-Down state. (B) 1-Up state. PDB-Ids are listed in Table S2 (ESI†). |
The CSM pattern of each variant in Fig. 7 is similar to Fig. 5 above, affirming the generality of the local distortion trend. This is to be expected since each mutation alter few residues along the RBD sequence, while the CSM spectra are based on 10-residues fragments. The 1-Up spectra display high similarity to each other, teaching that all variants eventually reach the same level of distortion which is required to bind the protein to ACE2. The most striking evident is the large variability of the 3-Down state spectra that allows excellent discrimination between the variants. The different mutations lead to small structural changes of the starting conformation of the RDB domains by creating local islands of distortion with varying levels. The differences between the variants are not arbitrary – they follow the trend of variants transmissibility. The more transmissible the variant (and the latest the variant on the evolutionary path), the higher is its local distortion at the 3-Down state. As discussed above, along with the RBD migration, distortion naturally increases. Our findings suggest that if the protein experiences higher distortion at the initial state – its path to the final state becomes shorter since a larger extent of the final distortion already occurred. This may reduce the energy barrier of the overall transition, and lead to a faster transition towards the 1-Up state.
The differences between the protein variants stems from sequence mutations, and most of them are outside the RBD domain.39 Table S4 (ESI†) lists the mutations in the RBD domain for variants included in this study. Few mutations appear at a center of a distortion peak (e.g., L452R of the Delta variant, and N501Y of the Alpha, Beta and Omicron BA.1 variants), while others appear unrelated to a CSM peak (e.g., K417N for the Beta and Omicron BA.1 variants). Direct link between the mutation in the RBD and distortion peaks was not found in our data set. It is possible that mutations outside the RBD affect the local distortion of the RBD. Further research is needed in order to explore this issue more closely, e.g., by applying a shorter CSM ruler and analyzing regions beyond the RBD domain.
![]() | ||
Fig. 8 CSM spectra of 10-residues fragments along the sequence of the RBD of the WT SARS-CoV-2 spike protein. Coordinates are based on MD analysis.7 Step 0 is the 3-Down state, and step 31 is the 1-Up state. |
The power of the approach is in its predictive nature. Analysis of both Cryo-EM and MD simulation data showed that the distortion of the 3-Down state implies on the distortion of the 1-Up state. Higher asymmetry at the 3-Down state is related with higher variant's transmissibility. Analysis of MD simulation data teaches that increased distortion appears at more advanced stages along the RBD migration process, suggesting that a higher transmissibility may stem from a shorter path between the 3-Down and 1-Up states. Along the CSM spectra, residues with extreme CSM values indicating on high distortion, are generally preserved between the two states. Their majority are located at the RDB-ACE2 binding interface, with relatively close proximity to each other that allows non-bonding interactions, indicating that they might be involved in an allosteric network of information. Further research that will extend the analysis beyond the RBD region is needed to establish this view. Our findings provide a clear and quantitative demonstration to the notion that the capacity to change is encoded in the initial structure of the protein,16 particularly in the framework of allosteric regulation. Extending this type of analysis to other protein systems can contribute to better understanding of the role of symmetry in protein structure and function.
The choice of 10 residues for the ruler size is empirical. Bonjack-Shterengartz and Avnir29 recommended to use a ruler of 10 to identify a hinge (which is a flexible and less symmetric region in a protein) for systems were nothing is known about its location. In a different study, we showed that the tendency of amino acids with long or polar side chains to distort the protein is larger compared to small and non polar residues, since the conformational freedom of the side-chains may cause distortion even if the backbone symmetry is high.31 This may cause the spectra calculated with a smaller ruler to be more noisy and more difficult to analyze. On the other hand, a large ruler tends to flatten most of the fluctuations and loses important information. After testing different ruler sizes on a small set of proteins (with 1, 3, 5, 7, 10, 15 and 20 residues), we found that for the Sars-CoV-2 spike protein, a ruler of 10 residues is a reasonable compromise, that averages over residues identity effects on the one hand while retaining important information regarding the level of local symmetry on the other hand.
We performed a symmetry analysis for 32 structures (including the initial and final states) taken from the above simultation,7 representing snapshots of the protein structure along the free energy path of the RBD opening process. Prior to the analysis, Glycan and solvent molecules were excluded, and the residues were renumbered to match the sequence numbers of the experimental data included in our study. CSM and CCM calculations were performed without the Hydrogen atoms since these do not exist in the experimental data.
![]() | (1) |
CCM = min[S(Sn)] n = 1, 2, 4, 6… | (2) |
The main challenge in CSM calculation is finding the closest symmetric structure that serves as the reference structure in eqn (1). An exact algorithm has been recently developed for small-to medium-sized molecules,21 in which only structure-preserving permutations are scanned to find the closest symmetric structure. In the case of proteins, this algorithm is not applicable due to the enormous number of possible permutations, and an approximate calculation is applied.23 This algorithm uses the Hungarian algorithm45 to efficiently solve the assignment problem and find the correct permutation. It utilizes the sequence of the peptides to reduce the size of equivalence groups of atoms, and compels the code to preserve both the sequence and the peptide structure. In this study, the approximated algorithm was used. The atoms of the backbone and the side chains were included in all CSM and CCM calculations. Hydrogen atoms are generally absent in experimental measurements of proteins, and were therefore excluded.
High B-factors do not necessarily indicate on high distortion in terms of CSM. Pervious studies showed that the CSM errors calculated based on the crystallographic B-factors in CSM calculations are negligible.33,48 No correlation between the CSM and the B-factors was found for our set of proteins. We also did not find significant differences in the shape of the CSM spectra presented in Fig. 5 and 7 between measurements with higher or lower B-factors. The CSM trend thus appear unrelated to the B-factors. We concluded that these values, although high, do not carry direct information regarding the CSM trends. The analysis presented here, based on a global descriptor of fragments of 10-residues naturally reduces the bias due to measurement errors. The fact that similar trends were obtained by analysis of several measurements of the same protein, strengthens the validity of our findings and the reliability of the final conclusions. Further support is obtained from analyzing the MD simulation by Fallon et al.,7 which was based on a high resolution measurement (PDB-ID: 6VXX, at 2.8 Å) with much smaller B factor values in the RBD domain, and a median value of 63 Å2, which is a reasonable value at this resolution.49 We note that this protein cannot be analyzed by CSM, since perfect symmetry was enforced. The shapes of the CSM spectra from the simulation files were similar to the spectra obtained from experimental data, teaching that the spectra are authentic and valid.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp00163f |
This journal is © the Owner Societies 2023 |