Open Access Article
Lianjie
Xu
a,
Xibao
Tian
a and
Wen-Bin
Zhang
*ab
aBeijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China. E-mail: wenbin@pku.edu.cn
bAI for Science (AI4S)-Preferred Program, Shenzhen Graduate School, Peking University, Shenzhen 518055, P. R. China
First published on 8th August 2025
Synthesis of nontrivial protein topologies calls for genetically encoded protein entangling motifs, especially those of heterogeneous nature, to achieve structural complexity and functional relevance. Herein, we report the systematic discovery of heterodimeric entangling motifs using criteria like Gauss linking number, buried surface area and terminal distances. These motifs were analyzed to reveal their formation mechanisms (i.e., precursor cleavage, synergistic folding and segment piercing/wrapping) and biological significance (i.e., stability enhancement crucial for executing functions like regulation and catalysis). Six premium motifs were selected for experimental validation. Upon ring closure mediated by orthogonal split inteins, all six motifs led to protein hetero[2]catenanes with varying efficiency, providing versatile templates for making mechanically interlocked protein conjugates, such as Förster resonance energy transfer pairs and bispecific binders. The study not only helps untangle the influence of chain entanglements on protein properties but also provides a modular platform to enrich the toolbox of protein topology engineering.
To date, rather limited heterodimeric intertwined motifs have been identified and used in protein topology engineering. They were either rationally engineered from known homodimeric precursors20 or developed by artificially splitting an entwined protein domain like lasso peptides,21 dihydrofolate reductase22 and green fluorescent protein.23 Notably, active templates were thus developed with the capacity of guiding chain entanglement upon reconstitution and catalyzing the covalent bond formation simultaneously, both of which are needed to achieve concatenation.24 The intrinsic asymmetry of these motifs can dramatically enhance the structural complexity of the resulting topological proteins, as evidenced by the successful synthesis of protein [n]catenanes (n = 3, 4, and 5) in radial configuration25 and both symmetric and asymmetric protein olympiadanes.26 It also provides a novel mechano-bioconjugation strategy for developing advanced protein therapeutics with multi-function integration possessing additional functional benefits such as aggregation resistance, prolonged circulation and enhanced antitumor efficacy.27 Nevertheless, both approaches are not particularly effective in developing new heterodimeric intertwined motifs, impeding the understanding of chain entanglements in biological systems.
Recent advancements in our research have led to a significant expansion of the symmetric entangling motif database through systematic screening and structural analysis of homomeric protein assemblies in the Protein Data Bank (PDB).16 Building on this progress, we further established deep learning frameworks capable of predicting entanglement features directly from amino acid sequences.28,29 Through this workflow, we successfully identified multiple novel entangling motifs of C2 or C3 symmetry within the vast genomic space.28,29 These achievements in searching and mining of homomeric entangling motifs provide a robust methodological foundation, which now motivates our exploration of heteromeric entangling motifs to unlock new opportunities for engineering topological complexity in protein architectures. It was found that AlphaFold sometimes mistakenly predict topological links in heterodimeric complexes, which suggests that the topological features associated with chain entanglements may not be well captured by current protein structure prediction methods.30 Thus, from a practical perspective, we want to systematically look into the entangling heterodimeric motifs in PDB to expand the design space of topological proteins. In this article, we report the systematic discovery as well as the feature analysis of intertwined heterodimeric motifs from PDB (Scheme 1). The formation mechanisms of chain entanglements in these motifs, as well as their biological implications, are discussed. We also demonstrate their utility in designing intriguing topological proteins by the synthesis of protein heterocatenanes based on the selected premium motifs.
| Entry | Operation | Number of entries |
|---|---|---|
| 1 | Downloaded | 23 741 |
| 2 | 30–400 a.a. | 7013 |
| 3 | Resolution ≤3.5 Å | 6932 |
| 4 | Clustering | 1709 |
| 5 | |GLN| ≥0.4 | 155 |
| 6 | BSA ≥600 Å2 | 143 |
| 7 | Manual curation | 20 |
The GLN calculation was adapted from the Gauss linking integral and has been applied to open curves including protein backbones, proving a convenient method to measure the extent of chain intertwining in a heterodimeric complex.15,31 A larger |GLN| value generally means a higher extent of chain intertwining. As shown in Fig. 1a, most of the heteromeric complexes have GLN around 0, and only 9.2% have |GLN| ≥0.4, which suggest that intertwined structures are rather rare in heteromeric complexes. Interestingly, highly intertwined motifs are significantly rarer in heterodimeric complexes as compared to homomeric assemblies, as represented by only 0.9% with a |GLN| ≥1 (Table S1). It implies that chain intertwining is more difficult to form in heterodimeric protein complexes than in homomeric assemblies. Unlike conventional synthetic polymers that tend to form entanglements as the chain gets longer, proteins mostly adopt well-defined folds and form entanglements through specific interactions that are chain-length-independent (Fig. 1b). Hence, special formation pathways may be required for two different protein chains to be significantly entangled. Typical examples of heterodimer proteins with varied GLN values are shown in Fig. 1c, providing an intuitive understanding of the correlation between the |GLN| value and extent of chain entanglement. The distribution of BSA of the heterodimeric proteins is shown in Fig. 1d. Notably, there is a moderate correlation between |GLN| values and BSA of heterodimeric complexes (Fig. 1e), suggesting that chain entanglements could also promote the stability of heteromeric complexes. We also calculated the terminal distance (dN–C) of each chain within the heteromeric complex (Fig. 1f). In general, motifs with smaller dN–C are easier to cyclize. However, this is not mandatory for topology synthesis. In many cases, chain cyclization can be achieved through orthogonal ligation tools such as SpyTag–SpyCatcher reactive pairs32 and split intein pairs.33 The use of flexible linkers of sufficient length and the reconstitution of the reactive partners can bring the termini closer with high specificity and coupling efficiency. Therefore, we do not impose a strict limit on dN–C and put more emphasis on the GLN and BSA cutoffs during the selection of promising motifs.
A quantitative criterion (i.e., |GLN| ≥0.4 and BSA ≥600 Å2) was applied to screen candidate intertwined motifs on the heterodimeric complexes. To facilitate their usage as entangling templates, we further selected a small set of premium motifs based on empirical criteria, such as the overall size, host strain, expression yield, and predicted topologies that can be formed upon cyclization. During manual curation, we also discarded the heterodimeric proteins that were particularly unqualified for protein topology engineering, e.g., loosely associated ones, heterodimeric coiled coils, reactive split inteins and so on (Fig. S1). The overview of screening results is summarized in Table 1, and the selected premium intertwined motifs are listed in Fig. S2. To evaluate the uniqueness of our collected motifs, we compared the 20 premium entangling heterodimers with the 1873 links in LinkProt and found no entries in common, which is probably due to the fact that protein links pose an additional requirement for covalent bond formation. We also compared our motifs with those discovered via the pulling-based method by Cieplak et al. and found only 5 motifs in common (i.e., 2BYK, 2ACM, 1B0N, 3A1G, and 4CZD).19 The small number of motifs in common is probably because the latter is mostly for homomeric complexes. We attribute this uniqueness to the different detection/screening method and the distinct research focus of our work. Therefore, we have convincingly developed a reliable platform to screen and discover those useful heteromeric entangling protein motifs.
The first mechanism, i.e., cleavage of monomeric precursors, is perhaps the most intriguing. Although there are numerous designed split proteins, most of them are not highly entangled.34 Herein, we identified several entangling heterodimers that come from the cleavage of monomeric precursors, accomplished through either auto-proteolysis or enzymatic cleavage. An example of the former is the SEA domain of mucin-1 (PDB ID: 2ACM), which undergoes spontaneous proteolysis attributed to folding-induced conformational stress at the Gly1097–Ser1098 loop (Fig. 2a).35 Similar autocatalytic cleavage was also observed in S-adenosylmethionine decarboxylases (PDB IDs: 1I7C, 1MHM, 3IWB and 5TVO),36 oxamate amidohydrolase HpxW (PDB ID: 5HFT),37 and ornithine acetyl transferase (PDB ID: 2YEP).38 There are also many examples of the latter. The Notch RR (PDB ID: 3I08), with the largest |GLN| among all the 155 entries, is formed via cleavage at the S1 site by furin-like protease during maturation (Fig. 2a).39 Human heparanase (PDB ID: 7RG8/5E8M) is activated as a heterodimer by stepwise proteolytic cleavage of a signal peptide and a linker segment from its precursor.40,41 Surprisingly, most of the cleavage sites in the above examples lie in the loops between adjacent β-strands, suggesting the importance of β-sheet structures in dictating chain entanglements. These naturally occurring intertwined split proteins would also shed light onto the topology engineering of transforming native single-domain linear proteins into their topological isoforms.22,23 It turned out that non-covalent entanglements are widespread among protein domains, which implies the vast design space of entangling motifs through chain rethreading.42,43
The second mechanism, i.e., mutual synergistic folding, involves the association and cooperative folding of two individual subunits that may otherwise be unstable or poorly folded in isolation. Therefore, preparing this class of heterodimers should usually resort to co-expression of two genes. One example is the heterodimeric complex of ACTR (activator for thyroid hormone and retinoid receptors) and CBP (cAMP responsive-binding protein) (PDB ID: 1KBH), which is archived in the Mutual Folding Induced by Binding (MFIB) database.44 Both ACTR and CBP are disordered on their own but tightly associate into a stable globular fold with substantial entanglement and high affinity (Kd = 34 nM) (Fig. 2b).45 Other examples include the BAG6 (Bcl-2-associated athanogene 6)–Ubl4a (ubiquitin-like protein 4a) complex (PDB ID: 4X86)46 and a sirohaem decarboxylase AhbA/AhbB (tend to precipitate when expressed individually, PDB ID: 4CZD).47 This class of heterodimeric entangling motifs are particularly suitable as templates for cellular synthesis of heterocatenanes since the undesired side products, i.e., cyclic monomers, are unstable and tend to form inclusion bodies facilitating the purification.48
The third mechanism is the direct contact-induced entanglement involving chain segment piercing through or wrapping around a folded domain. This pathway is straightforward but difficult to realize for two large folded domains. As aforementioned, the folded domain has to undergo significant conformational change and even unfolding in order to get deeply entangled with the other chain in direct contact. Nevertheless, it is easier to achieve via piercing or wrapping of much smaller segments. A representative case for segment piercing is the vasohibin (VASH) 2/vasohibin binding protein (SVBP) heterodimer (PDB ID: 6JZC), where SVBP, folded as a single α-helix, threads through the N-terminal loop region of VASH2 to form a three-helix bundle (Fig. 2c).49 Similar helix threading was also observed in the complex of C. elegans HMP-1/α-catenin and HMP-2/β-catenin (PDB ID: 5XA5), where the helical HMP-2 is inserted into the N-terminal four-helix bundle of HMP-1, and a protein rotaxane could be generated upon cyclization of HMP-1.50 On the other hand, segment wrapping-induced chain entanglement is usually realized by the winding of a small protein around the surface of a large domain. For example, in the AtaR–AtaT toxin–antitoxin complex (PDB ID: 6GTQ), the intrinsically disordered C-terminal region of AtaR wraps around the surface of AtaT to block all its functional hotspots for toxin neutralization (Fig. 2c).51 A similar association pattern is also adopted by some other antitoxin systems like RelE–RelB (PDB ID: 1WMI)52 and PaaA2–ParE2 (PDB ID: 5CZF)53 and the eukaryotic initiation factor 4E (eIF4E)–eIF4G complexes (PDB IDs: 1RF8 and 6FC0).54,55
To obtain a more in-depth insight, we also analyzed the biological functions of the intertwined heterodimers in order to find out whether the entangled structures are enriched in certain functions. Out of the 1709 heterodimeric complexes, 1203 were successfully mapped to UniProtKB, matching 136 out of 198 annotation keywords of molecular function.59 The function annotations for the 100 entangling heterodimers (with |GLN| ≥0.4) are listed in Table S3. Surprisingly, similar to that observed previously in C2 assemblies, there is a strong preference for chain entanglements in DNA-binding proteins, with a total of 31 entries, accounting for 31% of all the annotated entangling heterodimers (Fig. 3a). The percentage of entangling heterodimers within each function type also exhibits such a preference. Among the top 10 molecular functions in terms of the percentage of intertwined heterodimers, 7 of them are associated with gene regulation, i.e., DNA-directed DNA polymerase, activator, DNA-binding, initiation factor, sigma factor, chromatin regulator, exonuclease and nuclease (Fig. 3b). The functional bias highlights the importance of chain entanglements within protein complexes in gene regulation and signaling. Interestingly, the preference of chain entanglements towards specific molecular functions was also observed for monomeric globular domains. It was revealed that proteins containing non-covalent lasso entanglements were enriched in lyase activity, transferase activity, catalytic activity on nucleic acid, hydrolase activity and so on, which agrees well with our observation in entangled heterodimers and may imply possible advantages of chain entanglements for enzymes.42 However, it was found that lasso entanglement-containing monomers are depleted in DNA-binding functions, while multimeric entangled protein complexes are somewhat enriched. This suggests that topological constraints may play distinct roles in different molecular functions.
![]() | ||
| Fig. 3 Biological implications of chain entanglements in heterodimeric complexes. (a) Top 10 function types in terms of the number of intertwined heterodimeric complexes; (b) top 10 function types in terms of the percentage of intertwined heterodimeric complexes with a minimum of total 10 entries; (c) structure of the CLOCK:BMAL1 transcriptional activator complex (PDB ID: 4F3L); (d) structure of sirohaem decarboxylase AhbA/B (PDB ID: 4CZD). | ||
The tight interacting pattern of entangled chains is likely to offer additional stabilizing effects crucially associated with the execution of various biological functions. A typical example is the mouse CLOCK:BMAL1 transcriptional activator complex (PDB ID: 4F3L), a tightly intertwined heterodimer (GLN = −0.885), that could bind E-box DNA with high affinity (Kd ∼ 10 nM) and play a crucial role in regulating the circadian clock (Fig. 3c).60 The interface-perturbing mutations revealed that the stability of the heterodimeric regulator is key to maintaining the circadian periodicity. We also noticed that 6 of the top 10 molecular functions in terms of the number of intertwined heterodimers are enzymes including transferase, hydrolase and so on, some of which are associated with nucleic acids. One class of enzymes with highly entangled structures, as aforementioned, are those generated from the cleavage of monomeric precursors. A notable case is the sirohaem decarboxylase AhbA/B from Desulfovibrio desulfuricans (PDB ID: 4CZD), where the two subunits possessing a near identical fold (with a sequence similarity of 39%) adopt a highly entangled association pattern (GLN = −0.797) much resembling that of a homodimer with C2 symmetry (Fig. 3d).47 Since both AhbA and AhbB are unstable when expressed individually, the domain swapping behavior may serve as a stabilizing strategy to form an active heterodimeric enzyme. It should be noted that the fundamental influence of chain entanglements on protein properties and functions, e.g., evolutionary advantages of entangled enzymes, remains an open question to be systematically investigated, especially with strictly controlled experimental systems. As topological differences are often mingled with constitutional differences, it is very difficult to reach a conclusion on the sole effects of entanglement or topology.
![]() | ||
| Fig. 4 Structures of the six selected premium intertwined heterodimeric motifs for experimental validation. | ||
The protein expression and purification followed previous protocols.61 The crude products after Ni-NTA affinity purification and the samples purified by SEC (Fig. S3) were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and liquid chromatography-mass spectrometry (LC-MS). Species with molecular weight close to the expected hetero[2]catenanes were observed in all constructs with varying amounts of single-ring side products. To prove the catenane topology, TEV protease (TEVp)-mediated cleavage experiments were conducted. Upon complete cleavage, the putative catenane bands in SDS-PAGE as well as the cyclic chain B bands in 4 of the 6 designs disappeared, leaving only cyclic chain A and linear chain B, as further confirmed by LC-MS (Fig. 5b and S4). The two exceptions are 2O97 and 3IWB. For the former, after complete cleavage, there remains a faint band with an apparent molecular weight close to the heterocatenane, which is confirmed by LC-MS to be the homo[2]catenane of HUβ. This is not surprising since HUβ is also capable of self-dimerization. For the latter, the situation is more complicated. The 3IWB heterocatenane is particularly resistant to TEVp cleavage even after long time incubation. We reasoned that it is probably caused by its tendency to oligomerize as shown in the SEC spectra, and the recognition site was hence too buried to be accessible by TEVp. We further quantified the heterocatenation efficiency (ε) by the molar percentage of heterocatenane in the purified mixture by densitometry analysis using ImageJ (Fig. S5). The results listed in Table 2 show that 2ACM and 4X86 have the highest selectivity and efficiency for heterocatenation. Therefore, all six motifs can be used to prepare hetero[2]catenanes with varying efficiency, and some of them exhibit comparable and even higher efficiency than the previously engineered heterodimeric motif derived from p53dim.20
| PDB ID | GLN | BSA (Å2) | d N1–C1 (Å) | d N2–C2 (Å) | Formation mechanism | Reported stability | Predicted −log(K) | Heterocatenation efficiency ε |
|---|---|---|---|---|---|---|---|---|
| 2ACM | −1.192 | 1997 | 28 | 26 | Cleavage of the monomeric precursor |
T
m ∼ 75 °C 35 |
7.46 | 91% |
| 2O97 | 0.518 | 1807 | 19 | 18 | Mutual synergistic folding |
K
d ∼ 25 nM 64 |
8.25 | 77% |
| 3A1G | 0.540 | 1604 | 24 | 30 | Mutual synergistic folding | Tight binding67 | 6.05 | 67% |
| 3IWB | −0.874 | 2239 | 20 | 20 | Cleavage of the monomeric precursor | — | 5.15 | 27% |
| 4CZD | −0.797 | 3341 | 20 | 9 | Mutual synergistic folding | Stable when coexpressed47 | 6.85 | 42% |
| 4X86 | 0.510 | 1267 | 30 | 22 | Mutual synergistic folding |
K
d ∼ 2.2 nM 46 |
8.17 | 96% |
The distinct catenation efficiency further prompts us to interrogate the influence of various structural features on topology synthesis. The relevant parameters are also listed in Table 2. Among them, the terminal distances (dN1–C1 and dN2–C2) have little effect on ε since they all have dN1–C1 and dN2–C2 within 30 Å for convenient cyclization. While motifs with larger |GLN| may afford complex topologies like Solomon link, they do not necessarily bring about higher ε, presumably due to the higher kinetic barrier for their formation (Fig. S6a). As long as |GLN| goes above a certain critical value (e.g., >0.5), the extent of chain entanglement and ε are no longer strongly correlated, as reflected by 4X86 with the smallest |GLN|, yet the highest ε. Similarly, the BSA does not seem to be essential for efficient catenation (Fig. S6b).
Three factors, including the stability, chain organization, and tendency to self-association, seem to be more influential. First of all, to compare the relative stability of the six motifs, we used the AREA-AFFINITY web server to predict the binding affinity of these motifs, which are represented as −log(K) where K is the dissociation constant.66 The positive correlation between −log(K) and ε confirms the importance of higher stability in promoting nontrivial protein topologies (Fig. S6c), which is straightforward because the more stable the template is, the more likely the catenation occurs. Second, the structural features of chain organization, such as the distribution and orientation of N- and C-termini of both subunits and the simplicity of the fold, may play a significant role because it dictates the formation of an appropriate spatial relationship for ring closure. For example, the 4CZD motif contains two multi-domain subunits threaded together. The complex structure may hinder the correct folding and association of the two subunits fused with split inteins, leading to low ε. In contrary, both 2ACM and 4X86 have quite simple fold, which helps ensure robust and synergistic association of the two subunits and thus leads to high ε. Third, the tendency to self-association could lead to unintended oligomeric species as side products, as seen in the case of 3IWB. Therefore, a good entangling motif, as exemplified by 4X86, shall thus have stable, yet simple intertwined fold for robust assembly, possess favourable terminal orientation and distance for facile ring closure, and exhibit low tendency to self-associate for minimal side reactions. Some of these features are difficult to describe quantitatively and accurately at this stage, which necessitates manual curation and experimental validation to assess the usefulness of the heteromeric entangling motifs. Compared to homodimeric entangling motifs such as p53dim used in previous studies, these heterodimeric entangling motifs offer more design space of topological proteins. As shown above, heterocatenanes with varied structural features could be readily synthesized using heterodimeric entangling motifs. Although homodimeric motifs could be potentially engineered into heterodimeric version as shown in the case of X+/X−, the overall structural symmetry remains unchanged, which greatly limits the control over the geometric features of topological proteins.20 Moreover, more complex topological proteins such as higher-order hetero[n]catenanes, which are extremely difficult to design with homomeric entangling motifs, could in theory be realized using heterodimeric ones with mutual orthogonality. Another advantage of these heterodimeric entangling motifs is their robustness (as shown in Table 2), while p53dim has only moderate binding affinity (Kd ∼ 56 μM) and much lower thermal stability (Tm ∼ 37 °C).20
![]() | ||
| Fig. 6 Design of a heterocatenane of CFP and YFP using the 4X86 motif. (a) Illustration of the construct and synthesis of cat-CFP#YFP; (b) SDS-PAGE analysis and SEC overlay of cat-CFP#YFP (E: elution from Ni-NTA resin; S: SEC-purified product); (c) TEVp-mediated cleavage of the purified product to prove its catenane topology; (d) normalized fluorescence emission spectra of cat-CFP#YFP before and after TEVp-mediated cleavage. | ||
To explore the generality of this approach, we further inserted various therapeutic protein domains into the heterocatenation scaffold. Specifically, mechanical conjugation of two antibody mimics via an intertwined motif leads to a unique bispecific antibody mimic. As an example, we fused the 4X86 subunits with two functional domains, one affibody that binds the human epidermal growth factor receptor (AffiEGFR)68 and the other affibody that binds human epidermal growth factor receptor 2 (AffiHER2),69 to give a bispecific affibody catenane (cat-bsAffi). To illustrate the topological effect, we also constructed a linear fusion of AffiEGFR–AffiHER2 (l-bsAffi) with a long, flexible linker in between two domains. To improve the catenation efficiency and facilitate purification of the target bispecific heterocatenane, we designed a coexpression system, where one gene in the pACYCDuet-1 vector encoding NpuC-AffiHER2-4X86chB-NpuN (with His-Tag) and another gene in the pET15b vector encoding VidC-AffiEGFR-4X86chA-VidN (with TEV site) were used to co-transform the BL21(DE3) competent cell (SI sequence 3). The higher copy number of pET15b over pACYCDuet-1 ensures higher expression level of VidC-AffiEGFR-4X86chA-VidN than NpuC-AffiHER2-4X86chB-NpuN, which could presumably promote the intertwined assembly of the two components to achieve high catenation efficiency.70
Both cat-bsAffi and l-bsAffi were well expressed and readily purified via affinity chromatography and SEC (Fig. 7a and S8a). Their structures were confirmed by LC-MS with expected molecular weights (Fig. S8b), and the catenane topology of cat-bsAffi was confirmed via proteolytic cleavage by TEVp (Fig. 7b). The bispecific activity of both cat-bsAffi and l-bsAffi was then assayed. It turned out that cat-bsAffi showed comparable activity as l-bsAffi, implying that the additional 4X86 domain and the catenane topology did not compromise the binding capability (with Kd values of 3.84 and 6.97 nM towards EGFR and HER2, respectively) of the two affibody domains (Fig. 7c). Therefore, the successful design and synthesis of an active bispecific affibody in catenane form provides a novel platform of bispecific antibody designs, proving the feasibility of harnessing heterodimeric entangling motifs to design protein heterocatenanes with multiple functions. While the intrinsic stability of affibodies may obscure observable stability improvements arising from catenation, the topological scaffold offers additional engineering flexibility for future applications requiring controlled domain orientation or protease-responsive activation, which may otherwise be difficult to achieve in linear counterparts. The availability of versatile heterodimeric entangling motifs in different geometries shall further allow the fine tuning of the binding capabilities towards synergistic enhancement or mutually exclusive activity, which are topics of further investigation.
![]() | ||
| Fig. 7 Design of a bispecific affibody catenane of AffiEGFR and AffiHER2 using the 4X86 motif. (a) SDS-PAGE analysis of l-bsAffi and cat-bsAffi; (b) TEVp-mediated cleavage of cat-bsAffi to prove its catenane topology; (c) binding characterization of cat-bsAffi and l-bsAffi towards EGFR and HER2, respectively. | ||
Here, we have systematically analyzed the chain entanglements within heterodimeric protein complexes and discovered a toolbox of heterodimeric entangling motifs that are suitable for protein topology engineering. The formation mechanisms and molecular functions of these entangling heterodimeric motifs were thoroughly analyzed, which revealed the biological implications of chain entanglements. These motifs, such as 2ACM and 4X86, are highly efficient in templating the cellular synthesis of protein heterocatenanes for function integration via mechanical interlocking. Notably, both 2ACM and 4X86 are human proteins and thus should have low immunogenicity, holding great promise in developing novel protein therapeutics. When used in combination, these motifs could further greatly expand the design space of topological proteins, leading to novel biomaterials with emergent properties such as switchable protein machines71,72 and advancing protein therapeutics.20,27,73,74 Although the study was conducted on natural heterodimeric proteins, the insights gained from this work should be broadly applicable to other topological proteins as well. For example, the entangled heterodimers generated from cleavage of monomeric precursors such as SEA domains are particularly instructive for creating topological isoforms of a single-domain protein via backbone rethreading. Therefore, our study not only establishes a platform for mining heteromeric entangling motifs for expanding the design space of protein topologies but also spurs the exploration of the evolutionary links underlying protein chain entanglements and functions. By going beyond the linear paradigm of protein backbones, the results are fundamental to understanding protein folding and offers topology as a powerful dimension in protein engineering to tailor protein, the workhorse of life, for practical and emergent biomedical applications.
It is crucial to acknowledge that our investigation remains constrained by the limited dataset of heterodimeric protein complexes currently available in PDB. This constraint becomes particularly evident when considering the vast space of potential interaction partners within the proteome that currently lack structural characterization, some of which may exhibit novel entanglement conformations. Although the preference of chain entanglement within certain protein families may imply its fundamental roles in regulating protein functions, the underlying relationships between chain entanglement and protein function remain to be illustrated on the basis of a larger dataset with significant entanglement, hopefully aided by the more powerful computational tools like deep learning.
To conclude, our work establishes a methodological paradigm that bridges bioinformatic analysis with experimental validation and protein engineering applications, offering a novel avenue for topological protein research. The entangling motifs discovered here proved useful in synthesizing protein hetero[2]catenanes and are currently inspiring the development of higher-order topological protein architectures with integrated functions.
:
100 into 300 mL of 2xYT medium containing corresponding antibiotics. When OD600 reached 0.6 to 0.8, the culture was placed in a shaker with pre-set temperature. IPTG was then added at the concentration of 0.25 mM to induce target protein expression. The cultures were shaken for 12 hours at 16 °C before cells were harvested via centrifugation (4 °C, 5000g, and 15 min). Cell pellets were resuspended in buffer A (20 mM NaH2PO4, 500 mM NaCl, 20 mM imidazole, and 5% v/v glycerol, pH = 8.0) and lysed by ultrasonication. The supernatant was collected by centrifugation (4 °C, 12
000g, and 30 min). The clear lysate was mixed with Ni-NTA resin (GE Healthcare, Inc.), equilibrated with buffer A and agitated on a rotator at 4 °C for 1 h. The sample was then loaded into an empty column and washed with buffer A for several column volumes. The target protein sample was then eluted with buffer B (20 mM NaH2PO4, 500 mM NaCl, 250 mM imidazole, and 5% v/v glycerol, pH = 8.0).
:
1 at 30 °C until full digestion (checked by LC-MS). The completely digested products were collected and characterized by SDS-PAGE and LC-MS.
:
1 binding model was assumed for the binding kinetics analysis, and the BLI data were analyzed using Octet System Data Analysis software.
Protein sequences, supplemetary figures and tables have been included as part of the SI. See DOI: https://doi.org/10.1039/d5sc03953c.
| This journal is © The Royal Society of Chemistry 2025 |