Xiyue Lenga,
Katherine I. Albanese
abc,
Lia R. Goluba,
Arthur A. Normand,
Jonathan Clayden
a and
Derek N. Woolfson
*abd
aSchool of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK. E-mail: D.N.Woolfson@bristol.ac.uk
bMax Plack-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
cDepartment of Chemistry, Wake Forest University, Winston-Salem, NC, USA
dSchool of Biochemistry, University of Bristol, Medical Sciences Building, University Walk, Bristol BS8 1TD, UK
First published on 27th August 2025
Computational protein design is advancing rapidly. However, approaches and methods are needed to increase success rates and to elaborate designs. Here we describe the combination of rational and computational design to deliver three-helix bundle (3HB) peptide assemblies and single-chain proteins with control over topology and thermal stability. First, we garner sequence-to-structure relationships from antiparallel 3HBs in the Protein Data Bank. This gives core-packing rules, including layers of hydrogen-bonded polar residues, which are combined with surface-charge patterning to design complementary sequences for acidic (A), basic (B), and neutral (N) helices. By altering the design of the N helix, two sets of synthetic peptides are generated for clockwise and anticlockwise arrangements of the three-helix assemblies. Solution-phase characterisation shows that both ABN peptide mixtures form stable, heterotrimeric assemblies consistent with the targeted ‘up-down-up’ topologies. Next, AlphaFold2 models for both designs are used to seed computational designs of single-chain proteins where the helices are connected by loop building. Synthetic genes for these express in E. coli to yield soluble, monomeric, and thermally stable proteins. By systematically introducing additional polar layers within the core, the thermal stability of these proteins is varied without compromising the specificity of the helix–helix interactions. Chemical and thermal denaturation reveals comparable thermodynamic parameters to those of highly stable natural proteins. Four X-ray crystal structures confirm that the design models and AlphaFold2 predictions match to sub-Å accuracy.
To reach this point, de novo protein design has undergone several phases, which we can learn from.4–7 Historically, minimal protein design employed fundamental chemical principles—such as sequence patterns of hydrophobic and polar residues—to render mimics of simple, natural protein folds. Aided by developments in bioinformatics, rational design emerged, integrating analyses of natural protein sequence and structural databases, such as the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB),8,9 into de novo design. This has led to improved sequence-to-structure relationships to inform better designs.4 As the field has matured, computational methods have increasingly been used in design pipelines, including parametric design to sketch out backbones, and physical forcefields to assess sequences that best fit these.10,11 Now, AI-based models built from large sequence and structural datasets that capture relationships en masse are being used to generate backbones and sequences either separately or simultaneously.1,3,6,12,13
However, while AI-based methods can exploit both known and unknown relationships, they often lack interpretability, making it difficult to uncover how these relationships govern design outcomes. Despite recent success in delivering complex de novo protein scaffolds and functions,14–17 the explainability and success rates of AI-driven methods remain low. Therefore, to continue advancing the field, these gaps in understanding and applicability need to be filled. This does not necessarily mean replacing ‘black-box’ methods. Rather, they need to be augmented to add reasoning and insight to enhance their predictive power, efficiency, and robustness. In short, we would like to achieve fully programmable protein design based on understanding sequence-to-structure/functional relationships in proteins. En route to this, we advocate combining rational and computation protein design for targets when possible leading to more explainable AI for de novo protein design.
α-Helical coiled coils (CCs) have been particularly fruitful targets for de novo protein design due to many well-established principles that link primary sequences to a variety tertiary and quaternary structures.4,18,19 As such, they are well suited to the challenge of combining rational, computational, and AI-based approaches in protein design. Generally, CC structures have two or more right-handed α helices supercoiled around one another to form usually left-handed helical bundles. These are encoded by 7-residue (heptad) sequence repeats denoted (abcdefg)n. The a and d sites are predominately occupied by hydrophobic residues, giving the pattern (hpphppp)n. When configured into an α helix, this produces a hydrophobic seam that drives helix association, and sets a framework for specifying helix orientation (parallel or antiparallel), oligomeric states (dimer and above), partner preferences (homo- or heteromeric), and overall topology and handedness of the tertiary or quaternary structure as described below.18,20 In addition, the introduction of electrostatic or polar interactions flanking the hydrophobic core—usually at the e and g sites for dimers to tetramers, or b and c sites for pentamers and above—can be used to define CC assemblies further.21,22 Combined, established combinations of amino acids at the g-a-d-e positions can be used to generate toolkits of de novo CC assemblies.7,23,24 In turn, these peptide assemblies can be repurposed to design functional peptides for catalysis,25,26 materials assembly,27–30 and in cell and synthetic biology.31,32 Most recently, some of the peptide sequences and experimental 3D structures have been used as “seeds” to deliver more-complex, single-chain proteins through computational protein design leveraging the new AI-based methods.33,34
Here, we focus on CC assemblies of three α helices, termed 3-helix bundle (3HB) CCs. These are appealing due to their relative simplicity and small size, making them ideal for studying structure, folding, and stability, and, therefore, for developing further fundamental design rules and principles35–37 and new applications.38–41 The homotrimeric peptide coil-Ser provides an early example of an ‘up-up-down’ 3HB architecture—i.e., with one helix aligned antiparallel to the other two—discovered serendipitously in an attempt to design a parallel dimeric CC.42 Notable progress has also been made towards desymmetrizing 3HBs using attractive ion pairs at the interfacial positions, effectively guiding the specific formation of heterotrimeric assembly over alternative oligomeric states or homo- or mixed-trimers.41,43–45 DeGrado and colleagues have advanced this by developing an iterative design transitioning coil-Ser-derived sequences to native-like globular proteins.35 Recent innovations include using buried hydrogen bonds and shape-complementary packing to create highly specific heterotrimers that serve as biological scaffolds.46
We build on this foundation here to describe the design of a series of 3HB CC peptide assemblies and single-chain proteins with up-down-up arrangements of three helices but with different and complementary sequences; that is, an ABN system composed of acidic (A), basic (B), and neutral (N) strands. By strategically positioning interhelical charges, we can dictate the overall conformational handedness of the 3HBs, achieving either clockwise (CW) or anticlockwise (ACW) topologies in solution. Models for each of these peptide assemblies are then used to seed the computational design of single-chain 3HB proteins, which are expressed from synthetic genes in E. coli, and confirmed by X-ray structures with the intended topologies and handedness of the tertiary structures. By introducing polar residues into the hydrophobic core, we fine-tune thermal stability without compromising folding specificity. This study demonstrates the feasibility of designing 3HB CCs with precise control of topology and thermal stability by combining rational, computational, and AI-based de novo design, and shows how design success rates can be improved through such pipelines.
![]() | ||
Fig. 1 The rational design of new sequences to form antiparallel CC heterotrimers. (A) Amino-acid propensities for each position of the heptad repeats for 3HB CCs pulled from the CC+ database.47 Raw counts (Table S1) were normalised using the amino-acid frequencies in SWISS-PROT to provide the propensity scale shown as a heat map (high to low: red to blue). (B) Helical-wheel diagram for a clockwise antiparallel trimeric assembly, showing heptad assignments and with the helical termini closest to the viewer labelled. (C and D) Helical-wheel representations of apCCTri-BĀN (clockwise) and apCCTri-BĀN′ (anticlockwise), respectively. Sequences have canonical heptad repeats, abcdefg. The residues at key positions of the designs, g, a, d, and e, are highlighted: grey for hydrophobic, Leu; green for polar, Thr and Asn; red for acidic, Glu; and blue for basic, Lys. (E and F) Slices through the third heptad of the AF2 models for apCCTri-BĀN and apCCTri-BĀN′, respectively, with the key side chains shown as sticks. For B–D, arrows indicate the overall handedness of the quaternary structure. This is defined as follows: with the first helix at 12 o'clock and coming out towards the viewer, the assemblies have either a clockwise (B-Ā-N) or an anticlockwise (B-Ā-N′) arrangement. |
The core-defining a and d positions were made Leu, as this was the most preferred residue at both sites. The core-flanking e and g positions were made combinations of Glu and Lys, as they featured highly at these sites and offered possibilities for directing helix–helix partnering through electrostatic interactions.18,48 Building on previous work43,44 to design away from homomeric assembly and to favour heteromeric association, we designed one sequence to be acidic (A) with e = g = Glu, and a second basic (B) peptide with e = g = Lys. Typically, such AB systems are designed for even oligomers as the alternating charges can be satisfied by C2, C4 or D2 symmetry.31,49–51 Therefore, to target an ABX-type heterotrimer, the third helix was made neutral (N) with e = Glu plus g = Lys to give complementary interactions to both flanking acidic and basic helices (Fig. 1C). As noted previously by DeGrado and co-workers,35 helical wheels suggests an alternative design for the assembly with the opposite (anticlockwise) cyclic order of the three helices, which can be achieved by switching the polarity e = Lys and g = Glu giving an alternative neutral peptide (N′) (Fig. 1D). To help specify the antiparallel arrangement further, specifically with the A helix antiparallel (denoted Ā) to the B and N helices, we incorporated a layer of polar residues at d-a-a sites of the Ā-B-N/N′ combinations in the otherwise hydrophobic cores (Fig. 1E and F).22,52,53 This was chosen by inspecting the CC+ dataset derived above for buried, three-residue constellations of polar side chains. This revealed STAT5a (1y1u), which has an ordered, hydrogen-bonded network between Thr-155 (a), Thr-236 (d) and Asn-289 (a).54 This polar layer was compatible with both the clockwise (CW, BĀN) and anticlockwise (ACW, BĀN′) target assemblies. The designs were completed as 4-heptad sequences with b = c = Ala to provide specificity for the assembly, and combinations of polar and aromatic residues at the f sites to enhance solubility and to introduce chromophores (Table 1 and S3).
Ahead of experimental work, all combinations of the A, B, and N and A, B, and N′ sequences were modelled using AlphaFold2-multimer (AF2, Table S4). The predicted models were consistent with the target heterotrimeric assemblies with the designed helical topologies, i.e., mixed parallel and antiparallel helices and clockwise or anticlockwise peptide assembly. The target and alternate state models (AAA, AAB, ABB, etc.) were assessed by predicted template modelling (pTM) and local distance difference test (pLDDT) scores. However, the average pLDDT scores were all above 95%, so, effectively, AF2 did not discriminate between the models (Table S4).
First, circular dichroism (CD) spectroscopy was used to probe the folding and thermal stability of the peptides at 100 μM concentration in phosphate buffered saline at pH 7.4 (PBS; Fig. 2A). In contrast to AF2 predictions, peptide A was completely unfolded under these conditions. Peptide B was partially folded at 5 °C but unfolded readily with increased temperature (midpoint of thermal denaturation, TM < 15 °C). These behaviours are consistent with the design hypothesis, as highly charged peptides are not expected to self-associate appreciably. By contrast, the neutral, peptides N and N′ were highly helical and thermally stable (TM ≈ 78 and 54 °C, respectively). The difference in the TM values can be explained by the order of Glu and Lys residues in the sequence.21 However, both mixtures BĀN and BĀN′ were also highly helical and highly thermally stable (TM ≈ 73 and 67 °C, respectively, Fig. S2). For completeness, data for the pairwise combinations were collected and compared with the respective theoretical averages, showing reduced helicity and cooperative folding (Fig. S2 and Table S6). The solution-phase oligomeric states for the stable complexes, N3, N′3, BĀN and BĀN′, were determined by sedimentation-velocity (SV) and sedimentation-equilibrium (SE) experiments in analytical ultracentrifugation (AUC) at 20 °C (Fig. 2C and D, Table S6, Fig. S3 and S4). These revealed monodisperse trimeric species as designed.
Despite multiple attempts, we could not crystallise any of the species formed. Therefore, we used a fluorescence-quenching assay to probe helix orientation in the BĀN and BĀN′ systems in solution.55 First, guided by the AF2 models, we placed spatially proximal fluorophore (4-cyanophenylalanine, 4CF) and quencher (L-selenomethionine, MSE) pairs on two different peptides. For instance, the B peptide was remade with an N-terminal MSE, apCCTri-B-nMSE; and two variants of the A peptide were made with 4CF at the N-terminal g site or the C-terminal e site, apCCTri-A-n4CF and apCCTri-A-c4CF, respectively. As expected for an antiparallel arrangement of A and B helices, no quenching of 4CF fluorescence was seen when apCCTri-B-nMSE and apCCTri-A-n4CF were mixed with the unlabelled N peptide (apCCTri-N), Fig. 2E. However, quenching did occur with the other mixture with apCCTri-A-c4CF, Fig. 2F. This demonstrated that the A helix was oriented antiparallel to B in solution as designed. To probe the orientation of the N helix, we carried out the analogous experiments using the two 4CF-labelled A peptides, unlabelled B, and with the N peptide N-terminally labelled with MSE, apCCTri-N-nMSE. This gave similar results to those shown in Fig. 2E and F indicating that the N helix also aligns antiparallel to A and, therefore, parallel to B (Fig. S5).
Following this solution-phase characterisation, and consistent with our systematic naming of de novo CC peptide,18 we named the two heterotrimeric peptide assemblies apCCTri-BĀN and apCCTri-BĀN′.
Synthetic genes for the sequences were expressed in E. coli (Table S3). As the parent peptide assemblies were thermally stable, the cell lysate was heat-shocked at 65 °C for 10 min. Immobilised metal affinity column chromatography (IMAC) and size exclusion chromatography (SEC) were used to yield purified proteins (Fig. S6). CD spectroscopy of all proteins showed that they were highly α helical and hyper-thermally stable, and AUC experiments confirmed they were monomeric in solution (Fig. 3A–E).
We solved X-ray crystal structures at 2.10 Å, 2.15 Å and 2.20 Å resolution for the two BĀN- and one BĀN′-based designs, respectively (Fig. 3F–I, Tables S7 and S8). This was done by molecular replacement using the AF2 models of the peptide assemblies as the search models. All confirmed the three-helix bundles with up-down-up topologies for the B-Ā-N/N′ helices. All had consolidated hydrophobic cores with knobs-into-holes packing between the core Leu residues confirmed by Socket2 (Fig. 3H–I).58 In addition, water-mediated hydrogen-bonded layer of Thr (A) – Asn (B) – Thr (N/N′) was present in a clockwise design (Fig. 3I). Moreover, the conformational handedness of the two structures was different and as designed. Therefore, we named these proteins sc-apCC3-CW (for the clockwise BĀN combination, 9rgv and 9rgw) and sc-apCC3-ACW (for the anticlockwise BĀN′, 9rgx) to form part of the growing toolbox of de novo CC peptide assemblies and single-chain proteins.18,33,34
All four modified constructs expressed well in E. coli, were monomeric, α helical, and thermally stable (Fig. S6–S9). However, the CD spectra revealed a progressive decrease in helicity from the parent design (Fig. S7 and S10), with a ≈10% reduction for the constructs with flexible linkers and a further ≈15% drop for those with the mismatched loops compared (Table S9 and Fig. S10). This suggests that even with optimised core designs—which are sufficient to drive the correct assembly of the ABN-peptide assemblies—loop optimisation is important for single-chain protein design. Moreover, at least for our design target, flexible linkers are better than using mismatched loops.
Despite extensive trials, crystals could not be obtained for any of the flexible or mismatched constructs. Therefore, AF2 was used to predict models for all variants. Consistently, this gave high-confidence models (Table S4). Moreover, secondary structure analysis59,60 predicted high helical contents of ≈ 73–78%, which is higher than the experimentally observed values (Fig. S7, S10 and Table S9). This indicates that, while AF2 confidently predicts local secondary and overall structure extremely well, it does not capture subtleties in sequence-to-structure relationships. That said, inspection of the predicted models revealed that some lower-ranked model for sc-apCC3-CW-mismatch had the anticlockwise topology, hinting that loop mismatching may permit access to alternative folds and that AF2 may well capture this (Fig. S11).
To alter thermal stability in our designs without compromising the 3HB fold and topology, we sought to destabilise sc-apCC3-CW1 systematically by modifying its consolidated, largely Leu-based hydrophobic core. For this section, this parent design with a single polar layer of Asn@a, Thr@d, and Thr@d (N-T-T) in the B, A, and N helices (Fig. 1) is referred to as sc-apCC3-CW-1NTT. As hydrophobic core packing is a key stabilising force in water-soluble globular proteins,62 we investigated whether introducing additional polar layers could attenuate core stability without compromising the overall 3HB structure.
Additional polar layers were introduced sequentially at the second, first, and fourth heptads to give sc-apCC3-CW-2NTT, sc-apCC3-CW-3NTT, and sc-apCC3-CW-4NTT, respectively (Fig. 4A). This progression ensures that the central regions of the protein are destabilised first and remodelled with polar interactions, while maintaining fully hydrophobic and stabilising heptad repeats at the termini. As a control, we included a variant without a polar layer, namely sc-apCC3-CW-0NTT. All sequences were evaluated using AF2, which consistently predicted clockwise 3HB topologies with high confidence (Table S4). The variants with 0 and 2 N-T-T layers crystallised, and the X-ray crystal structures aligned with the parent design, with Cα RMSDs of 1.015 Å and 0.605 Å, respectively (Tables S7 and S8; pdb ids 9rgy, 9rgz). The variants with 3 and 4 N-T-T layers could not be crystallised.
These observations were supported by the CD spectroscopy. CD spectra showed a progressive decrease in helicity with the addition of N-T-T layers (Fig. 4C and S7, Table S6). The most extreme case was sc-apCC3-CW-4NTT, which retained only ∼50% helicity relative to the 1NTT and 0NTT variant. Variable-temperature CD measurements revealed that sc-apCC3-CW with 0, 1, and 2 N-T-T layers were hyperthermally stable and did not unfold upon heating (Fig. 4D). In contrast, sc-apCC3-CW-3NTT and -4NTT had measurable TM values of 86.5 ± 0.8 and 25.3 ± 0.7 °C, respectively (Fig. 4D and S7). Whilst sc-apCC3-CW-3NTT showed good reversibility from the pre- and post-melt CD spectra, sc-apCC3-CW-4NTT remained unfolded after heating and was disregarded from further analysis (Fig. S7).
We extended the AF2 predictions of the 3HBs to these 0–4 NTT layer variants (Table S4). Interestingly, there was little difference between the predicted models and confidence metrics for any of these constructs. Specifically, secondary structure calculations from the models all gave ≈74% helical residues. Thus, apart from the 0NTT construct, which coincidentally had 75% helicity by CD spectroscopy, there was no correlation between the predicted and observed helicities (Table S9).
Returning to sc-apCC3-CW-3NTT, this is an example of a de novo designed single-chain protein with an accessible, reversible, and cooperative thermal-unfolding transition. This is unusual in contemporary designed single-chain proteins, which are often hyper-stable. Therefore, we characterised the thermodynamics of its unfolding in more detail.
The CD spectra and TM values for sc-apCC3-CW-3NTT did not change with protein concentration (Fig. S12), confirming that the protein folds as a non-associating monomer as designed. Given its appreciable thermal stability, we used guanidinium hydrochloride (GdmHCl) as a chemical denaturant to access melting transitions at lower TM values for a full thermodynamic analysis (Fig. S13–S15). This gave well-defined, sigmoidal, thermal-unfolding curves, which were fitted to a two-state model to estimate TM over a range of GdmHCl concentrations (Fig. S13–S15 and Table S10).63
The resulting TM values were linearly related to the GdmHCl concentration (Fig. 4D). Encouraged by this, we sought to estimate the thermodynamics of unfolding at 0 M GdmHCl using the linear extrapolation method (LEM). Traditional van't Hoff analysis assumes ΔCp = 0. However, this neglects the substantial structural reorganisation and exposure to solvent associated with protein unfolding, both of which contribute to a positive ΔCp,unf.5,64 Therefore, we used a global nonlinear Gibbs–Helmholtz fitting procedure to model the free energy of unfolding (ΔGunf) as a function of temperature and GdmHCl concentration under the LEM (Table S10).65,66 This gave ΔGunf of 6.16 ± 0.07 kcal mol−1 at 25 °C in the absence of denaturant (Fig. S15). This is consistent with the observed stability of the protein under those conditions. Moreover, when expressed per residue, it is comparable to the most stable natural globular proteins under similar conditions (Fig. 4E and S16),67–69 most of which are not all-α-helical proteins. The modest change in enthalpy (ΔHunf = 24.08 ± 0.53 kcal mol−1) and entropy (ΔSunf = 0.06 ± < 0.01 kcal mol−1 K−1) reflect the small size of the 3HB. The small heat capacity change (ΔCp = 0.09 ± 0.01 kcal mol−1 K−1) indicates opposing hydration effects upon burying polar (negative ΔCp) and nonpolar side chains (positive ΔCp), which partly cancel.70–72 Nonetheless, the net value, along with its dependence on the denaturant (ΔΔCp,[GdmHCl] = 0.46 ± 0.01 kcal mol−1 K−1 M−1), are consistent with a compact native state and the progressive solvation of the largely hydrophobic core upon unfolding.
Miniprotein designs of the type delivered here provide relatively straightforward model systems to test relationships between designed sequence and structure, stability and function.73–76 This is essential for the field to move towards quantitative and fully programmable protein design. As part of this quest, here we show that in addition to sequence-to-structure relationships for the well-defined secondary and tertiary elements of the targeted fold, loop design is important. Specifically, for connecting adjacent elements of secondary structure, design-specific loops appear to be better than flexible linkers, which are better than mismatched loops. In addition, we show that the hyperstability often observed with modern de novo designed scaffolds can be attenuated through robust rationale. In our case, the core packing of the parent design consists of eight layers, with three residues in each layer. All but one of these layers comprise solely leucine residues. The remaining layer is a hydrogen-bonded constellation of one asparagine and two threonine side chains. By introducing two more of these layers, such that 3/8 of the layers and 9/24 of the core residues are polar, the protein unfolds reversibly below 100 °C in aqueous buffer. Detailed analysis of experimental unfolding data reveals that the modified design has thermodynamic parameters comparable to natural proteins of similar size. This targeted approach adds to and complements the emerging quantitative understanding of stability determinants from high-throughput studies of other de novo miniproteins.76
We posit that further systematic analyses of this and similar de novo proteins will help to uncover sequence-to-structure/stability relationships and advance fully predictive and quantitative de novo protein design.75–77 In addition, as we25,78 and others40,79 have demonstrated, small de novo peptide assemblies and single-chain proteins of the type delivered here provide robust scaffolds to generate functional de novo proteins by grafting or otherwise introducing functional residues such as binding and catalytic sites. Regardless of how functionalisation is introduced—i.e., rationally, computationally, or generatively with AI—an advantage of using predesigned and well-characterised scaffolds is that the roles and positions of most residues are known and understood at the atomistic level, providing a robust foundation for designing, introducing, and controlling functional modifications.
This journal is © The Royal Society of Chemistry 2025 |