Interface design of SARS-CoV-2 symmetrical nsp7 dimer and machine learning-guided nsp7 sequence prediction reveals physicochemical properties and hotspots for nsp7 stability, adaptation, and therapeutic design†
Abstract
The COVID-19 pandemic, driven by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), necessitates a profound understanding of the virus and its lifecycle. As an RNA virus with high mutation rates, SARS-CoV-2 exhibits genetic variability leading to the emergence of variants with potential implications. Among its key proteins, the RNA-dependent RNA polymerase (RdRp) is pivotal for viral replication. Notably, RdRp forms dimers via non-structural protein (nsp) subunits, particularly nsp7, crucial for efficient viral RNA copying. Similar to the main protease (Mpro) of SARS-CoV-2, there is a possibility that the nsp7 might also undergo mutational selection events to generate more stable and adaptable versions of nsp7 dimer during virus evolution. However, efforts to obtain such cohesive and comprehensive information are lacking. To address this, we performed this study focused on deciphering the molecular intricacies of nsp7 dimerization using a multifaceted approach. Leveraging computational protein design (CPD), machine learning (ML), AlphaFold v2.0-based structural analysis, and several related computational approaches, we aimed to identify critical residues and mutations influencing nsp7 dimer stability and adaptation. Our methodology involved identifying potential hotspot residues within the dimeric nsp7 interface using an interface-based CPD approach. Through Rosetta-based symmetrical protein design, we designed and modulated nsp7 dimerization, considering selected interface residues. Analysis of physicochemical features revealed acceptable structural changes and several structural and residue-specific insights emphasizing the intricate nature of such protein–protein complexes. Our ML models, particularly the random forest regressor (RFR), accurately predicted binding affinities and ML-guided sequence predictions corroborated CPD findings, elucidating potential nsp7 mutations and their impact on binding affinity. Validation against clinical sequencing data demonstrated the predictive accuracy of our approach. Moreover, AlphaFold v2.0 structural analyses validated optimal dimeric configurations of affinity-enhancing designs, affirming methodological precision. Affinity-enhancing designs exhibited favourable energetics and higher binding affinity as compared to their counterparts. The obtained physicochemical properties, molecular interactions, and sequence predictions advance our understanding of SARS-CoV-2 evolution and inform potential avenues for therapeutic intervention against COVID-19.
- This article is part of the themed collections: PCCP 2023 Emerging Investigators and Computational protein design and structure prediction: Celebrating the 2024 Nobel Prize in Chemistry