Open Access Article
Juhong
Wu
a,
Jiehui
Sun
a,
Tian
Liang
a,
Yongqi
Zhang
a,
Han
Zhang
a,
Tianyi
Zhang
d,
Xianmin
Feng
d,
Ping
Gao
a,
Peng
Xu
*b and
Jinyu
Li
*ace
aCollege of Chemistry, Fuzhou University, Fuzhou 350116, Fujian, China. E-mail: j.li@fzu.edu.cn
bCollege of Biological Science and Engineering, Fuzhou University, Fuzhou 350116, Fujian, China. E-mail: pengxu@fzu.edu.cn
cCollege of Biological and Pharmaceutical Engineering, Jilin Agricultural Science and Technology University, Jilin 132101, China
dJilin Medical University, Jilin 132013, China
eFujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, Fujian, China
First published on 23rd December 2025
Accurate assessment of the effects of mutations on protein–protein interactions (PPIs) is crucial for understanding disease pathogenesis and the development of targeted therapies. Here, we present DSSA-PPI, a hybrid deep learning framework that enhances the prediction of mutation-induced binding affinity changes (ΔΔG) by leveraging structural and sequence information through a disentangled attention mechanism. Building upon the complementary strengths of a geometric equivariant graph neural network PPIFormer and a protein language model ESM-2, our framework employs a novel representation learning strategy that integrates sequence- and structure-specific contributions, thereby improving the precision of PPI ΔΔG predictions. DSSA-PPI demonstrates robust performance across diverse mutational contexts on the standard protein binding affinity SKEMPI v2 dataset and outperforms existing methods on multiple benchmarks under identical cross-validation. In a case study of the SARS-CoV-2 receptor-binding motif (RBM) interaction with angiotensin-converting enzyme 2 (ACE2), our model accurately identifies top-ranking mutations that enhance binding affinity. Additionally, it guided the optimization of a peptide inhibitor, improving its inhibitory activity against activated factor XI (FXIa) by over 40-fold. These results highlight DSSA-PPI as a versatile and reliable tool for predicting mutation-induced perturbations in PPIs.
Over the years, numerous computational methods have been developed to predict changes in protein–protein binding affinity upon mutations.11–16 Traditional physics-based approaches, such as FoldX,14 Flex ddG,15 meta-dynamics,16 and molecular mechanics/generalized born surface area (MM/GBSA),17 rely on rigorous free-energy calculations but suffer from high computational costs due to extensive conformational sampling requirements. In contrast, machine learning (ML)-based methods offer a more efficient, data-driven alternative. These approaches can be broadly categorized into sequence-based and structure-based models, depending on their input features.
Sequence-based models leverage evolutionary information and physicochemical properties derived from amino acid sequences to infer binding affinity changes. For instance, SAAMBE-SEQ employs hand-crafted sequence features, including position-specific scoring matrices (PSSMs) and physicochemical descriptors, in conjunction with a gradient-boosted tree (GBT) predictor.18 Similarly, ISLAND integrates sequence-based features with kernel-based similarity measures to enhance prediction accuracy.19 However, these methods fail to account for the structural context essential for accurately modeling binding interactions.
Structure-based models, by contrast, utilize geometric representations of protein complexes to encode topological and spatial features, often achieving superior predictive performance. These models can be broadly divided into two categories: featurization-based methods and end-to-end methods. Featurization-based methods rely on hand-crafted or algorithmically derived features to encode the structural and physicochemical properties of protein. A prominent line of work in this category leverages algebraic topology to simplify complex 3D structures while preserving essential biological features, has shown promising potential in the field of mutational effect on protein binding affinity changes prediction. For example, TopNetTree employs element- and site-specific persistent homology to extract topological invariants from protein–protein complexes, effectively capturing their structural complexity for mutation effect prediction.20 Among the most advanced models in this category, MT-TopLap integrates persistent Laplacian features with physicochemical descriptors and transformer-based protein language model embeddings, achieving superior performance in predicting binding affinity changes.21 In contrast, end-to-end methods directly learn structural representations from raw input without manual feature engineering. GeoPPI, for example, employs a graph attention network to learn structural representations through a self-supervised side-chain reconstruction task, followed by a GBT predictor to estimate mutational effects.22 MpbPPI extends this paradigm with an equivariant graph neural network (GNN) and multi-task pre-training to further enhance structure-aware learning.23 However, both models rely on GBT predictors, which do not explicitly model the physical mechanisms underlying PPI perturbations. Recent advances in geometric deep learning have led to frameworks such as MuToN, which employs a geometric transformer to capture local structural perturbations and receptor–ligand interactions.24 Despite their promise, these methods remain constrained by their dependence on high-quality structural inputs for mutant complexes. While tools like AlphaFold25 and RoseTTAFold26 can predict mutant structures, their computational cost becomes prohibitive when applied to large mutational libraries.27,28 Moreover, selecting the most plausible conformations from multiple predicted structures introduces additional challenges.
To address these limitations, PPIFormer adopts a pre-training strategy inspired by masked language modeling, reconstructing masked structural elements to predict binding affinity changes without requiring explicit mutant structures.29 By computing the log-odds of probability differences between wild-type and mutant residues, PPIFormer enables predictions based solely on a single input structure. Nevertheless, existing approaches—whether sequence- or structure-based—have yet to fully exploit the complementary strengths of both modalities. For example, ProAffiMuSeq iteratively combines sequence and structural descriptors but lacks a mechanism to model their interdependencies;30 while ProBASS concatenates embeddings from ESM-2 and ESM-IF1 without explicitly capturing their interplay.31
Here, we present DSSA-PPI, a hybrid deep learning framework that integrates sequence and structural information via a cross-attention mechanism to predict mutation-induced changes in protein–protein binding affinity. Building upon PPIFormer, our model retains its advantage of bypassing explicit mutant structures while enhancing predictive accuracy through the incorporation of ESM-2-derived sequence representations. DSSA-PPI demonstrates robust performance across diverse mutational contexts in the SKEMPI v2 dataset and outperforms existing methods on standard benchmarks. Stratified analyses further demonstrate its reliability across mutations of different physicochemical types. We further validate its utility through a case study on SARS-CoV-2 spike protein variants binding to angiotensin-converting enzyme 2 (ACE2), where DSSA-PPI accurately identifies affinity-enhancing mutations and captures key mutational signatures of variants of concern (VOCs). Furthermore, DSSA-PPI successfully guides the engineering of a cyclic peptide inhibitor's binding affinity to activated factor XI (FXIa), validating its practical application. Collectively, these results establish DSSA-PPI as a powerful and versatile tool for predicting mutation-induced perturbations in PPIs.
Under the LOCO split, we limited training to 3 epochs per fold due to the high computational cost of end-to-end training across multiple folds. Despite this constraint, DSSA-PPI maintained competitive performance relative to existing methods (Table S1). For mutation-level splitting, our model achieved an overall PCC of 0.77, a Spearman's correlation of 0.73, and a root-mean-square error (RMSE) of 1.52 for ΔΔG prediction (Fig. 2A). For binary classification of binding affinity changes, the model recorded an area under the receiver operating characteristic curve (AUROC) of 0.85 (Fig. 2B). Notably, the model exhibited consistent performance across folds in mutation-level cross-validation. In contrast, larger variability was observed in complex-level evaluation, suggesting that mutation-level splitting may overestimate performance due to potential information leakage among mutations within the same complex (Table S2).
We further evaluated the predictive performance of DSSA-PPI across varying mutational distances. While the model demonstrated consistent and robust performance on mutation-level splitting, its performance declined on the more challenging complex-level splitting, which involves testing on unseen protein complex structures (Fig. 2C). Specifically, the model exhibited reduced accuracy at higher mutational distances (mutational distance > 3). Such cases often involve extensive or highly disruptive alterations that may induce complex changes in protein–protein interactions, including significant conformational rearrangements. These dynamic structural perturbations present a fundamental challenge, as the current model implicitly assumes moderate local shifts upon mutation and does not explicitly model large-scale conformational changes. The impact of protein dynamics is a well-recognized limitation in this field.33–35 Future extensions incorporating global dynamic features, such as those captured by the Gaussian Network Model (GNM)36 and the Anisotropic Network Model (ANM),37 may help improve predictive performance in these challenging scenarios. With the rapid advancement of deep learning frameworks capable of modeling protein conformational ensembles,38–40 fully integrating dynamic information into ΔΔG prediction is expected to become increasingly feasible.
Finally, we investigated the predictive ability of DSSA-PPI across different structural regions of protein complexes. Using the SKEMPI v2 annotations, the dataset was categorized into core (COR), support (SUP), rim (RIM), surface (SUR), and interior (INT) regions based on mutation locations (Fig. 2D). Notably, despite being trained exclusively on binding interface residues, DSSA-PPI maintained consistent predictive accuracy across all structural areas (Fig. 2E).
| PCC ↑ | Spearman ↑ | RMSE ↓ | AUROC ↑ | |
|---|---|---|---|---|
| a The term “– w/o R2R” denotes the removal of structure-to-structure attention term, similar to other terms. | ||||
| DSSA-PPI | 0.77 | 0.73 | 1.52 | 0.85 |
| – w/o seq. enc. | 0.74 | 0.68 | 1.72 | 0.80 |
| – w/o cross-attn. | 0.72 | 0.65 | 1.80 | 0.77 |
| – w/o R2Ra | 0.75 | 0.70 | 1.65 | 0.80 |
| – w/o R2S | 0.76 | 0.71 | 1.59 | 0.83 |
| – w/o R2Spartner | 0.77 | 0.71 | 1.54 | 0.84 |
| – w/o S2R | 0.76 | 0.71 | 1.53 | 0.83 |
| – w/o Spartner2R | 0.76 | 0.70 | 1.53 | 0.85 |
To further assess the contributions of individual attention terms in the disentangled attention module, we performed ablations by removing one term at a time: including structure-to-structure (R2R), structure-to-sequence (R2S), structure-to-partner sequence (R2Spartner), sequence-to-structure (S2R), partner sequence-to-structure (Spartner2R). These ablation results reveal that all five terms are necessary to achieve the best overall performance. Notably, the removal of R2R and R2S results in drops in overall performance, with the PCC decreasing from 0.77 to 0.75 and 0.76, respectively. This underscored the important role of structure information and structural representation to attend to its own sequence context. Removing of Spartner2R have least impact, suggesting that binding partner sequence attention to structure is relatively less critical than other items.
Collectively, these findings validate our architectural design and emphasize the critical role of synergistically combining structural and sequence information, as well as the necessity of each individual attention term, for robust and accurate prediction of binding affinity changes.
However, recent studies have raised significant concerns that random mutation-level splits can lead to overly optimistic performance estimates,41,42 particularly when mutations from the same protein complex appear in both training and test sets. Consistent with these observations, we also found that performance metrics under mutation-level splitting were substantially higher than those obtained under complex-level cross-validation (Fig. 2A, B and Table S2).
To address this, we adopted a more rigorous evaluation protocol based on complex-level splitting, following the protocol of MpbPPI, which ensures that mutations from the same protein complex do not appear across different folds. We adopted predefined five-fold complex-level cross-validation splits for S1131 and S4169 from MpbPPI. For M1707, which lacks predefined partitions, we created analogous complex-level splits based on PDB identifiers to ensure methodological consistency. To ensure robust performance estimation, we repeated five-fold cross-validation five times using independent random seeds for each dataset. The same data splits were applied consistently across all baseline methods to ensure fair comparisons.
Across all benchmarks, DSSA-PPI generally outperformed baseline models on both regression (PCC, RMSE) and classification (AUROC) tasks (Tables 2 and S4). On the S1131 dataset, which consists of non-redundant single-point mutations localized at protein–protein interfaces. DSSA-PPI achieved a PCC of 0.71 and an AUROC of 0.77. This performance exceeded that of PPIFormer (0.68/ 0.72), MT-TopLap (0.62/0.72), ESM-2 with a three-layer feedforward network regressor (ESM-2-FFN) (0.65/0.74), ProBASS (0.65/0.80), and SAAMBE-SEQ (0.49/0.70). On the S4169 dataset, which includes a more diverse mutation landscape, DSSA-PPI maintained strong performance with a PCC of 0.58 and AUROC of 0.70. For the challenging M1707 dataset—characterized by up to nine mutations across multiple chains within a single complex—DSSA-PPI achieved a PCC of 0.42 and AUROC of 0.72.
| Method | S1131 | S4169 | M1707 | |||
|---|---|---|---|---|---|---|
| PCC ↑ | AUROC ↑ | PCC ↑ | AUROC ↑ | PCC ↑ | AUROC ↑ | |
| a Data are represented as mean ± SD (n = 5). b Bold values indicate the best results. The dash sign indicates the results of the corresponding methods are not available. c ESM-2-FFN refers to the ESM-2 model followed by a three-layer feed-forward network, which shares the same architecture as the sequence encoder used in DSSA-PPI. d Indicates statistically significant difference compared to DSSA-PPI (p < 0.05), determined by a one-tailed unpaired t-test. | ||||||
| DSSA-PPI | 0.71 ± 0.01 , | 0.77 ± 0.02 | 0.58 ± 0.01 | 0.70 ± 0.02 | 0.42 ± 0.02 | 0.72 ± 0.02 |
| PPIFormer | 0.68 ± 0.01d | 0.72 ± 0.02d | 0.54 ± 0.01d | 0.60 ± 0.01d | 0.31 ± 0.03d | 0.61 ± 0.02d |
| ESM-2-FFNc | 0.65 ± 0.02d | 0.72 ± 0.02d | 0.45 ± 0.02d | 0.62 ± 0.02d | 0.25 ± 0.05 | 0.63 ± 0.02d |
| MT-TopLap | 0.62 ± 0.03d | 0.74 ± 0.01d | 0.50 ± 0.03d | 0.64 ± 0.01d | — | — |
| MuToN | 0.70 ± 0.01 | 0.74 ± 0.01 | 0.57 ± 0.01 | 0.64 ± 0.01d | — | — |
| MpbPPI | 0.48 ± 0.04d | 0.75 ± 0.02d | 0.43 ± 0.03d | 0.67 ± 0.01d | 0.34 ± 0.02d | 0.64 ± 0.02d |
| ProBASS | 0.65 ± 0.01d | 0.80 ± 0.01 | 0.51 ± 0.01d | 0.68 ± 0.03 | 0.20 ± 0.04d | 0.59 ± 0.02d |
| SAAMBE-SEQ | 0.49 ± 0.01d | 0.70 ± 0.02d | 0.29 ± 0.01d | 0.59 ± 0.01d | — | — |
| FoldX | 0.44 | 0.70 | 0.30 | 0.62 | 0.30 | 0.59 |
Compared with physics-based approaches, DSSA-PPI also showed superior performance across S1131, S4169, and M1707 (Table S5). Under current benchmarking protocols, structure-based methods, including physics-based tools, are typically evaluated on single static structures because generating conformational ensembles through molecular dynamics is computationally demanding. This practice does not reflect the full capabilities of physics-based methods, which generally achieve higher accuracy when conformational sampling is incorporated.17,43 In contrast, ML-based models have been shown to exhibit greater robustness to structural variability, as reported in several studies.23,24 A systematic benchmarking framework that evaluates both physics-based and machine learning based methods using dynamically sampled conformations would provide a more balanced and rigorous comparison. Although such an assessment is beyond the scope of the present study, it represents an important direction for future work.
Beyond numerical performance, we conducted an in-depth architectural analysis to elucidate the sources of the predictive advantage of DSSA-PPI. We benchmarked it against five representative ML-based baseline models: PPIFormer (structure-based), ESM-2-FFN (sequence-based), ProBASS (structure–sequence hybrid), MT-TopLap (topology-based hybrid), and SAAMBE-SEQ (biophysics-informed), as summarized in Table S6.
Our comparative analysis revealed several key insights. First, end-to-end deep learning models generally outperformed methods based on handcrafted features, such as SAAMBE-SEQ, across all datasets. Interestingly, PPIFormer outperformed ESM-2-FFN in regression tasks, while ESM-2-FFN generally showed superior performance in classification (Table 2). This divergence may suggest that structural and sequence representations encode distinct but complementary aspects of protein binding. ProBASS attempts to leverage this by fusing multimodal embeddings from ESM-IF and ESM-2 via simple feature concatenation. While this yielded modest improvements over ESM-2-FFN on S4169 datasets (PCC of 0.51 vs. 0.45), it failed to generalize well to more complex cases such as M1707, where performance decreased compared to the ESM-2 backbone (PCC: 0.20 for ProBASS vs. 0.25 for ESM-2-FFN). A similar pattern was observed with MT-TopLap, which combines ESM-2 embeddings with handcrafted topological features. Despite this multimodal feature fusion, MT-TopLap did not outperform ESM-2-FFN, indicating that naive feature concatenation may not be sufficient to capture the complex interplay between structure and sequence representations.
By contrast, DSSA-PPI builds upon PPIFormer and ESM-2, but employs a disentangled structure–sequence attention module specifically designed to model inter-modal interactions. This architecture consistently outperforms both backbone models across benchmark datasets. To further assess the generality of the disentangled attention design, we evaluated DSSA-PPI-PF, a variant where the structure encoder is initialized with PPIFormer pre-trained weights and the pretraining is omitted. Despite lacking co-adaptive structure–sequence pretraining, DSSA-PPI-PF still outperformed PPIFormer and achieved performance comparable to the fully end-to-end DSSA-PPI model (Table S7). This finding highlights the plug-and-play robustness and effectiveness of the disentangled attention module.
In summary, our architectural investigation indicates that the performance gains of DSSA-PPI stem from the disentangled attention mechanism, which facilitates synergistic integration of structural and sequential representations.
We first categorized mutations based on the chemical characteristics of the substituted residues: charged, polar, hydrophobic, and special cases (Gly, Cys, Pro). As shown in Fig. 3A and B, DSSA-PPI demonstrated consistent performance across most categories. For regression, all groups achieved a PCC greater than 0.4, with four categories exceeding 0.8. In the classification setting, DSSA-PPI also performed reliably, with AUROC scores above 0.6 across all categories. These results suggest that DSSA-PPI captures the general physicochemical behavior of amino acid substitutions, enabling generalization across diverse mutation types. Due to the severe class imbalance in the S4169 dataset, we did not perform further breakdown analysis at the level of individual wild-type to mutant amino acid pairs (Fig. S2).
To further investigate specific failure modes, we manually inspected the top 10 mutations with the largest absolute prediction errors (|ΔΔGExp − ΔΔGPred|). These outlier mutations often involved wild-type residues participating in intricate multichain interactions or exhibiting extreme physicochemical shifts. For instance, in PDB complex 2NZ9 (Fig. 3C), the wild-type H1064A participates in hydrogen bonding with D102D and G95C, and engages in π–π stacking with F36C. Upon mutation to alanine, DSSA-PPI correctly predicted a decrease in binding affinity but substantially underestimated the magnitude of ΔΔG, likely due to the disruption of multichain cooperative interactions that are difficult to model and may not be fully captured by the coarse-grained, residue-level representations employed in DSSA-PPI. Another failure case involves a drastic substitution from tryptophan to arginine in 1MAH. The wild-type tryptophan resides in a tightly packed hydrophobic pocket at the interface (Fig. 3D). Introducing a charged arginine residue alters the local electrostatic environment and induces desolvation penalties. DSSA-PPI failed to fully capture this physicochemical disruption, underestimating the binding destabilization.
These findings suggest that while DSSA-PPI demonstrates strong overall performance, it faces challenges in specific edge cases involving complex interaction networks. In such cases, the perturbations introduced by certain substitutions may not be fully captured by the coarse-grained, residue-level representations used in DSSA-PPI. Future improvements could include incorporating explicit side-chain modeling or surface-based physicochemical representations to better characterize the local interaction environment and enhance prediction accuracy in structurally intricate contexts.
We conducted saturation single-point mutational analysis on approximately 1700 RBM mutations, with predictions completed within seconds for DSSA-PPI, demonstrating high computational efficiency. Compared to DMS experimental data,45 DSSA-PPI consistently outperformed all baseline methods across multiple evaluation metrics (Tables 3 and S8). Notably, DSSA-PPI predictions captured the overall mutational impact patterns observed in DMS data at both the site level (Fig. 4B) and the mutation level (Fig. S3). Additionally, we employed a hit-rate metric to assess the ability of models to prioritize top-ranking mutations. Among the top 10 high-affinity mutations identified by DMS experiments, DSSA-PPI successfully identified 4 within the top 1% (18 mutations) and 6 within the top 5% (86 mutations) of its predictions. In comparison, PPIFormer, ESM-2-FFN, and MuToN identified 2, 2, and 1 mutation(s) within the top 5%, respectively. MpbPPI, ProBASS, SAAMBE-SEQ, and FoldX failed to rank any of these mutations within the thresholds.
| Method | Spearman ↑ | Top10Hit@1% ↑ | Top10Hit@5% ↑ |
|---|---|---|---|
| DSSA-PPI | 0.31 | 40% | 60% |
| PPIFormer | 0.19 | 0% | 20% |
| ESM-2-FFN | −0.04 | 0% | 20% |
| MuToN | −0.01 | 0% | 10% |
| MpbPPI | 0.08 | 0% | 0% |
| ProBASS | −0.01 | 0% | 0% |
| SAAMBE-SEQ | −0.20 | 0% | 0% |
| FoldX | −0.65 | 0% | 0% |
Further, we investigated hot-spot mutations in the RBM region that are associated with early SARS-CoV-2 variants of concern (VOCs), including N501Y, E484K, T478K, K417N, L452R, and Q493R. We compared the ranking and predicted binding fitness scores from DSSA-PPI with experimental data from DMS relative to the wild-type spike protein (Table 4). In general, DSSA-PPI ranked these hot-spot mutations in a manner consistent with experimental data, except for E484K. Interestingly, while DMS reported T478K and Q493R as mildly destabilizing mutations, DSSA-PPI predicted both to be stabilizing, aligning with their known roles in enhancing ACE2 binding in the Delta and Omicron variants.46,47 These discrepancies suggest that DSSA-PPI may capture co-evolutionary and contextual information beyond the wild-type sequence, potentially enabling the identification of future adaptive mutations. Additionally, DSSA-PPI ranked N501Y as a top affinity-enhancing mutation, consistent with its well-established role in strengthening ACE2 binding. To probe the model's underlying reasoning, we visualized the attention scores associated with residue N501. The attention map revealed that N501 predominantly attends to Q498 and T500 within the RBD, as well as Y42 and K353 on ACE2 (Fig. 4C). Notably, the inter-chain attention patterns align with structural evidence: N501Y forms a π–π stacking interaction with Y42 and a cation–π interaction with K353 (Fig. 4D), both of which are known to enhance ACE2 binding. These results underscore the efficiency and predictive accuracy of DSSA-PPI, and suggest its ability to identify mutations of high biological relevance.
| Mutation | N501Y | E484K | T478K | K417N | L452R | Q493R |
|---|---|---|---|---|---|---|
| a Experimental rankings (%) represent the percentile of each mutation's binding fitness relative to all ∼1700 single-point RBM mutations in the DMS dataset. A lower percentile indicates higher binding affinity. | ||||||
| Exp. Rank (%)a | 0.24% | 4.88% | 22.03% | 52.23% | 4.11% | 29.61% |
| Pred. Rank (%) | 0.06% | 70.56% | 13.69% | 52.35% | 12.93% | 2.29% |
| Exp. Score | 1.07 | 0.09 | −0.13 | −0.89 | 0.13 | −0.28 |
| Pred. Score | 1.85 | −1.24 | 0.13 | −0.72 | 0.16 | 0.74 |
![]() | ||
| Fig. 5 Reengineering of mupain-1 against FXIa guided by DSSA-PPI. (A) Structural model of the FXIa–mupain-1 complex. (B) Preferred amino acid frequencies at each position among the top 50 DSSA-PPI-predicted variants, highlighting mutation hotspots at Y7 and D9. (C) Binding affinities (Ki, nM) of the peptides toward human FXIa are shown as mean ± SD (n = 3). Data for mupain-1–16 are from Xu et al.51 | ||
To adapt DSSA-PPI for peptide–protease interactions, we fine-tuned the model on the SKEMPI v2.0 dataset, which comprises experimentally validated protein–peptide interaction data. Binding affinity measurements of mupain-1 variants targeting plasma kallikrein (PK)51—a serine protease homologous to FXIa—were used as an independent validation set to select the refined model exhibiting the highest Spearman correlation. The optimized DSSA-PPI was subsequently employed to predict and prioritize mutations that could enhance mupain-1 binding to FXIa.
A systematic single-point mutation scan revealed that hydrophobic substitutions at D9 were most likely to improve FXIa affinity (Fig. S4). However, given the intrinsically weak binding of the parental peptide (Ki > 1000 µM), we reasoned that single-site substitutions might not achieve sufficient enhancement. We therefore performed an exhaustive two-site combinatorial mutation scan, in which all possible double mutants were generated and ranked according to their predicted ΔΔG values. Analysis of the top 50 predicted variants (Fig. 5B) revealed that the cyclic architecture was strictly conserved, with cysteine residues at positions 1 and 10 retained, consistent with their established critical role in maintaining peptide binding affinity.50 Hydrophobic substitutions at D9 were strongly favored, consistent with the single-point scan, while substitutions of Y7 with alanine occurred at relatively high frequency. Guided by these predictions, two candidate peptides were designed: CPAYSRALFC (FXI-1) and CPAYSRALWC (FXI-2). To further enhance FXIa binding, the arginine residue at position 6 was replaced with a non-natural analogue (L-3-(N-amidino-4-piperidyl)alanine), previously shown to non-specifically improve affinity towards multiple serine proteases.51,52 Experimental validation confirmed that both designed peptides exhibited markedly enhanced FXIa binding affinity, with Ki values of 117 ± 4 nM for FXI-1-16 and 63.5 ± 2.2 nM for FXI-2-16 (Fig. 5C and S5), compared with 2560 ± 350 nM for the parental mupain-1–16.51 These engineered peptides therefore provide promising starting points for further optimization toward potent FXIa inhibitors for anticoagulant development.
Collectively, these results demonstrate the practical utility and transferability of DSSA-PPI in guiding peptide optimization and protein engineering, enabling rapid identification of high-affinity variants.
is the Cα coordinate of the ith residue. F0i,Fi1 are two kinds of feature: type-0 scalars (3D-invariant)
, and type-1 vectors (3D-equivariant)
. Feature F0i contains the one-hot encoded representations of the interface residue type (dimension of 20) and a semantic binary value indicating its location in a binding partner. Feature Fi1 represents virtual beta-carbon orientations, calculated using backbone atom positions (Cα, N, C atoms) based on ideal bond angles and lengths.53 To capture the interactive relationships between residues, we define E as a {0, 1}N×N matrix, where 1 indicates an edge between two residue nodes if the Euclidean distance between them is less than 7 Å; otherwise, it is 0.
In our framework, following the design of PPIFormer,29 we construct our geometric structure encoder using eight stacked SE(3)-Equiformer blocks (Fig. S1A). Each block performs message passing over the protein interface graph and updates type-specific features via an attention mechanism that respects SE(3) symmetry. The input to each block l is the graph G, with node coordinates
and connection matrix E, along with node feature matrices
of different types k ∈ {0, 1}. The layer-wise update process is defined as follows:
| h0(1),h1(1) = f(G(X,E,F0,F1)) | (1) |
| h0(l),h1(l) = f(G(X,E,H0(l−1),H1(l−1))), l = 2,…,8 | (2) |
denote the structure representation of a protein complex interface (with N residues), and let
be the sequence representation of the protein and its binding partner, respectively. To model cross-modality and cross-partner interactions between structure (r) and sequence (s) and binding partner sequence (sp), we apply learnable linear projections to compute queries and keys as follows:![]() | (3) |
. These projections produce 8 attention heads, each with a dimensionality of 64. The projected tensors are then reshaped into the multi-head attention format:![]() | (4) |
The cross-attention score
for each residue i is computed as the sum of five directional attention terms:
![]() | (5) |
This design enables the structural representation to integrate information not only from itself, but also from both the intra-protein sequence and the binding partner's sequence, allowing comprehensive modeling of intra- and inter-protein interactions across both structural and sequence modalities. To stabilize training, the combined attention score is normalized by
, where d = 64 is the head dimension. The final hidden state for residue i is:
![]() | (6) |
This updated hidden state is passed through a fully connected layer (size: 128), followed by another softmax layer, yielding a probability distribution for the 20 amino acid types for each residue in the protein complex (Fig. 1A).
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5sc08898d.
| This journal is © The Royal Society of Chemistry 2026 |