Elise A.
Naudin
a,
Katherine I.
Albanese
ab,
Abigail J.
Smith
c,
Bram
Mylemans
ab,
Emily G.
Baker
ac,
Orion D.
Weiner
d,
David M.
Andrews
e,
Natalie
Tigue
f,
Nigel J.
Savery
*cg and
Derek N.
Woolfson
*abcg
aSchool of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK. E-mail: d.n.woolfson@bristol.ac.uk
bMax Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
cSchool of Biochemistry, University of Bristol, Medical Sciences Building, University Walk, Bristol BS8 1TD, UK. E-mail: n.j.savery@bristol.ac.uk
dCardiovascular Research Institute, Department of Biochemistry and Biophysics, University of California, 555 Mission Bay Blvd. South, San Francisco, CA 94158, USA
eOncology R&D, AstraZeneca, Cambridge Science Park, Darwin Building, Cambridge CB4 0WG, UK
fBioPharmaceuticals R&D, AstraZeneca, Granta Park, Cambridge CB21 6GH, UK
gBrisEngBio, School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
First published on 20th September 2022
The design of completely synthetic proteins from first principles—de novo protein design—is challenging. This is because, despite recent advances in computational protein–structure prediction and design, we do not understand fully the sequence-to-structure relationships for protein folding, assembly, and stabilization. Antiparallel 4-helix bundles are amongst the most studied scaffolds for de novo protein design. We set out to re-examine this target, and to determine clear sequence-to-structure relationships, or design rules, for the structure. Our aim was to determine a common and robust sequence background for designing multiple de novo 4-helix bundles. In turn, this could be used in chemical and synthetic biology to direct protein–protein interactions and as scaffolds for functional protein design. Our approach starts by analyzing known antiparallel 4-helix coiled-coil structures to deduce design rules. In terms of the heptad repeat, abcdefg—i.e., the sequence signature of many helical bundles—the key features that we identify are: a = Leu, d = Ile, e = Ala, g = Gln, and the use of complementary charged residues at b and c. Next, we implement these rules in the rational design of synthetic peptides to form antiparallel homo- and heterotetramers. Finally, we use the sequence of the homotetramer to derive in one step a single-chain 4-helix-bundle protein for recombinant production in E. coli. All of the assembled designs are confirmed in aqueous solution using biophysical methods, and ultimately by determining high-resolution X-ray crystal structures. Our route from peptides to proteins provides an understanding of the role of each residue in each design.
Over the past 4 decades, 4-helix bundles (4HBs) have been one of the go-to targets for de novo peptide and protein design.1,12–14 Historically, 4HB design began with minimal approaches employing patterns of hydrophobic (e.g., leucine) and polar (e.g., glutamate and lysine) residues to design single short amphipathic α helices that self-associate due to the hydrophobic effect, or to program libraries of single-chain 4-helix proteins that fold through hydrophobic collapse.15–18 Again, these design approaches have evolved by incorporating biological information and rational design, which have led more readily to high-resolution X-ray crystal structures.4,19–22 Most recently, interest has shifted to using computational methods that use backbone and fold parametrization, optimization of core packing, and specific interaction networks between core residues.23–28 Furthermore, 4HBs present a variety of assembly modes for protein designers to target, including: single peptides that associate to tetramers, helix-loop-helix constructs that can dimerize, and self-contained single-chain proteins.17,29–31 However, they also present pitfalls—or alternate states—that designers must learn to navigate away from using negative-design principles.22,32,33 For instance, for the tetramers, adjacent helices can have all-parallel, antiparallel, or mixed arrangements; and for helix-loop-helix and single-chain systems various topologies are possible.31,34,35 These different architectures, the relatively large hydrophobic cores, and the apparent robustness to modification, have been exploited to functionalize 4HBs and introduce small-molecule binding,36 catalysis,37–39 allostery,40 and the control of protein–protein interaction including regulation of gene expression.41,42
One specific type of 4HBs form α-helical coiled coils (CCs). In CCs, tight and regular packing between side chains of neighbouring helices—known as knobs-into-holes (KIH) packing—specifies the structure, including defining oligomer state, partner preferences, and helix orientation. This has proved extremely powerful in rational and computational design of CCs.43–45 In more detail, CCs are supercoiled assemblies of amphipathic α helices. Generally, CC assembly is programmed by sequence repeats of hydrophobic (h) and polar (p) residues, hpphppp, often called heptads and denoted abcdefg (Fig. 1A).44,46 Many sequence-to-structure relationships, especially at the hydrophobic a/d interface, have come from analyses of natural structures and empirical studies.19,47 In turn, these have been used to deliver a wide range of structured and increasingly functional CC designs.4,43,44 For example, our own basis set of de novo CCs currently comprises parallel assemblies from dimer to nonamer,20,48,49 and these are being used increasing by us and others in various applications.41,50–55 That all said, designing antiparallel CC assemblies from first principles has been more challenging.33,56–58 Moreover, subtle changes in primary sequence or even experimental conditions can induce switches from energetically close parallel assemblies to antiparallel conformations.33,59–61 For example, recently, we reported the rational redesign of an antiparallel CC tetramer, apCC-Tet, following the serendipitous discovery of up–down–up–down tetramers adopted by point mutations in our original parallel hexamer, CC-Hex.33
Fig. 1 The rational design of new sequences to form antiparallel CC tetramers. (A) Helical-wheel representation of an antiparallel four-helix CC. Sequences have heptad repeats, abcdefg. The interfacial positions, a, d, e, and g, where our designs focused are highlighted in blue. Selected residues in our designed sequences are shown on the top-right helix. The N-to-C-terminal directions of the helices are indicated with the ‘N’ or ‘C’ with the darker font indicating that end is closer to the viewer. (B) Propensity table of residues for each amino acid at each position of the heptad repeat for antiparallel 4-helix CCs found in CC+.65 Raw counts (Table S1†) were normalized using the amino-acid frequencies in SWISS-PROT to give the propensity scale shown as a heat map (high, red; low, blue). A propensity of 0 indicates that no examples of that amino acid were found at that position in the database. Residues identified for the design of the new antiparallel tetramer sequences are highlighted with dark square boxes. (C) Heptad-repeat slices through the AlphaFold2-multimer67–69 models for each designed sequence: pLLL (top left), pLLI (top right), pQLL (bottom left), and pQLI (bottom right). Images for panel C were generated in PyMOL (https://www.pymol.org). |
Establishing clear principles for de novo design, such as sequence-to-relationships for a given target, would help navigate the complex energy landscape of helical assemblies. Moreover, it would deliver design rules to direct the assembly of different helical states to improve and expand toolkits such as the CC basis set and similar sets from others.62–64 In turn, these would provide platforms for protein redesign and applications where the impact of modifications required for functionalization could be anticipated.
Here, we elaborate a set of sequence-to-structure relationships for designing CC-based antiparallel 4HBs. By inspecting the structural database of CCs (CC+),65 we deduce clear design rules for this target. In turn, these are used to deliver three de novo structures: an antiparallel homotetramer, apCC-Tet*, a heterotetramer, apCC-Tet*3-A2B2, and a single-chain 4HB, sc-apCC-4. All three designs are characterized fully in solution, and to high-resolution by determining X-ray crystal structures. The designs are hyperstable with respect to thermal and chemical denaturation, and they fold, assemble, and function in E. coli. These properties make them ideal scaffolds to functionalize for future in vitro and subcellular applications.
Consequently, our analysis led to four distinct sequence combinations with the potential to form antiparallel tetramers: namely, L/Q-L-b-c-I/L-A-f in g → f repeats. For stable CC designs,20,33,48,49 we concatenated 4 copies of each repeat into each of 4 designed homomeric peptide sequences. We used the unspecified b and c sites to direct antiparallel assemblies further, specifically in homomers. Our rationale was to create a ‘bar-magnet’ charge pattern in the sequences by placing negatively charged glutamic acid (Glu, E) at the b and c sites of the first two heptad repeats, and positively charged lysine (Lys, K) at these sites in the two C-terminal repeats.33 The sequences were completed with the remaining 4 f sites filled with Gln, Lys, tryptophan (Trp, W), and Gln, respectively. The final sequences were capped with glycine (Gly, G) at both ends and N-terminally acetylated and C-terminally amidated (Table 1). Initially, we named the sequences after the residues at the g, a and d sites, i.e., pLLL, pLLI, pQLL, and pQLI.
Ahead of experiments, we modelled the four new sequences using the AlphaFold2-multimer predictor (Fig. 1C and S1–S4†).67–69 Encouragingly, the AlphaFold2 predictions for both Q@g sequences, pQLL and pQLI, gave antiparallel tetramers as designed and with high confidence (Fig. 1C) even when an oligomeric state larger than 4 was provided as a target to AlphaFold2 (Fig. S3 and S4†). By contrast, although the L@g sequences, pLLL and pLLI, could be predicted to form antiparallel 4HBs by AlphaFold2 (Fig. 1C), this was not consistently observed when higher chain numbers were used; in these cases, higher-order α-helical assemblies were predicted (Fig. S1 and S2†).
In an attempt to access an unfolding transition for one of the Q@g designs, we measured CD spectra of the pQLI peptide in guanidinium hydrochloride, Gn·HCl. Surprisingly, neither the equilibrium spectra recorded at 5 °C nor the mean residue ellipticity at 222 nm (MRE222) signal recorded over 5–95 °C changed appreciably in the range of 0–6 M Gn·HCl (Fig. S17 and S18†). Thus, pQLI is another hyperstable de novo peptide assembly. To probe this further, we truncated both pQLL and pQLI to 3-heptad repeats, yielding pQLL3 and pQLI3, respectively (Table 1). The overall charge pattern was preserved, though only the first and the last heptads had charged residues at b and c positions and the central repeat had Gln at these sites. Both truncated designs retained stable α-helical structures by equilibrium and variable-temperature CD measurements (Fig. 2A and B). However, reducing the peptides concentrations to 5 μM accessed reversible thermal unfolding transitions, which were sigmoidal indicative of cooperativity, with estimated midpoints of 91 °C and 76 °C for pQLI and pQLL, respectively (Fig. S19†). Moreover, tetrameric assemblies for both peptides were confirmed by SV and SE experiments in AUC consistent with the target assemblies (Fig. 2C, S20 and S21†).
Next, we screened the 3- and 4-heptad variants of pQLL and pQLI for crystallization. Interestingly, only the pQLI peptides yielded crystals (Table S3†). Both peptides gave good-quality X-ray diffraction data. These allowed structures to be determined by molecular replacement using ideal α helices implemented in Fragon71 for pQLI, or using the AlphaFold267–69 model for pQLI3 to resolutions of 0.96 and 1.42 Å, respectively (Fig. 3A, B and Table S4†). The solved structures confirmed the pQLI designs as antiparallel CC tetramers with knobs-into-holes packing identified by SOCKET2 (Table S5†).72,73 Inspection of a one-heptad slice through either structure (Fig. 3C) illustrates this packing and immediately highlights the selection rules used in the design, namely: (i) a core of Leu@a that pack into holes on neighbouring helices; (ii) a wide helix–helix interface formed by the bulky Ile@d residues and flanked by Gln@g; and a narrow helix–helix interface with Ala@e allowing close helical contacts consistent with Alacoils33,66,95 and flanked by Glu@b → Lys@b′ salt bridges. In the two narrow interfaces of pQLI, 4 of such salt bridges are made with Cδ → Nζ distances of 3.5 Å. Finally, the new X-ray crystal structures aligned closely with AlphaFold2 model for both pQLI analogues (RMSDall-atom = 0.359 Å and 0.584 Å for pQLI and pQLI3, respectively, Fig. S22†).
Fig. 3 X-ray crystal structures for the antiparallel homotetrameric assemblies of pQLI analogues. (A) pQLI (apCC-Tet*, PDB ID: 8a3g) with 4 heptad repeats. (B) The shorter pQLI3 (apCC-Tet*3, PDB ID: 8a3i). The chains of both structures are coloured in chainbow from the N (blue) to the C termini (red). (C) (Left) Helical wheels for the heptad repeats of pQLI. (Right) Slice through a heptad of the X-ray crystal structures for pQLI. Each position of the heptad is depicted in different color following the SOCKET2 scheme.73 Amino acids that compose the design rules are depicted in ball-and-stick representation. |
We propose that the new designs with their clear and interpretable sequence-to-structure relationships offer stable modules for future applications in protein design and for chemical and synthetic biology. Therefore, we rename pQLI as apCC-Tet* to add to our basis set of robust and fully characterized de novo CCs. To demonstrate its potential utility, next we developed the design in a number of different assemblies as described below. The pQLL sequences were not taken forward from this point.
Equilibrium and variable-temperature CD spectra revealed that the individual 4-heptad acidic and basic peptides were both folded and stable in PBS (Fig. S27†), and AUC-SV experiments showed that these isolated peptides formed tetramers like the parent homo-assembly despite the lack of complementary charges (Fig. S28†). An equimolar mixture of apCC-Tet*-A and apCC-Tet*-B spontaneously aggregated. Annealing the sample by heating up to 90 °C and then slowly cooling at room temperature resulted in soluble complexes, which were characterized as a folded and stable heterotetramer (Fig. S27 and S28†). However, the annealed mixture had a lower α-helical content than the respective isolated peptides. Overall, these properties are far from ideal for a de novo designed module that can be used in other contexts and applications. Therefore, we turned to the 3-heptad pair, apCC-Tet*3-A plus apCC-Tet*3-B. Although fully or partly folded (Fig. 4A), the individual acidic and basic peptides had accessible thermal unfolding transitions with midpoints of 61 °C and 42 °C, respectively (Fig. 4B). (N.B. The helicity of the basic peptide increased upon cooling back to below 20 °C.) When mixed at 20 °C, the acidic and basic peptides formed a partly helical assembly (Fig. 4B). Moreover, upon heating between ≈40–55 °C, the mixture folded to a more-helical and hyperthermally stable assembly without an observable melting transition up to 95 °C (Fig. 4B). AUC-SV experiments of annealed samples confirmed the presence of monodispersed tetramers in solution, consistent with an apCC-Tet*3-A2B2 design (Fig. S29†).
Fig. 4 Biophysical and structural characterization of the heterotetrametric complex apCC-Tet*3-A2B2. (A) CD spectra at 5 °C and (B) thermal response curves (ramping up, solid lines; and ramping down, dashed line) for apCC-Tet*3-A (red), apCC-Tet*3-B (blue), the pre-annealed mixture apCCTet*3-A2B2 (grey), and the annealed mixture apCCTet*3-A2B2 (green). Conditions: 50 μM peptide, PBS, pH 7.4. (C) X-ray crystal structure of the heteromeric assembly apCCTet*3-A2B2 (PDB ID: 8a3j) with the chains coloured from the N (blue) to the C termini (red). (D) Alignment of the crystal structures of apCCTet*3-A2B2 (apCC-Tet*3-A, red; apCC-Tet*3-B, blue) and the related homotetramer apCC-Tet*3 (grey). (E) Fluorescence-quenching assay for labelled apCCTet*3-A2B2 peptides. 4CF is the 4-cyano-L-phenylalanine fluorophore (yellow star) and MSE is the L-selenomethionine fluorescence quencher (grey triangle). ‘n’ and ‘c’ indicate mutations near the N and C termini, respectively. In this panel only, peptide names are shortened for clarity. Conditions: 50 μM concentration of each peptide in phosphate buffer (8.2 mM sodium phosphate dibasic, 1.8 mM potassium phosphate monobasic), pH 7.4. |
We crystallized a mixture of apCC-Tet*3-A and apCC-Tet*3-B near neutral pH and obtained X-ray diffraction data out to 2.1 Å resolution (Tables S3 and S4†). The resulting crystal structure revealed an antiparallel hetero-tetramer confirming the target apCC-Tet*3-A2B2 complex (Fig. 4C). Like those for apCC-Tet* and apCC-Tet*3, the structure of apCC-Tet*3-A2B2 had a well-packed hydrophobic core with narrow and wide interfaces. Indeed, the heterotetramer overlaid well with the 3-heptad homotetramer (RMSDall-atom = 0.390 Å, Fig. 4D). Again, the design rules—a = Leu, d = Ile, e = Ala, and g = Gln—are readily identifiable from visual inspection of the structure (Fig. 4D).
Despite this experimental structure revealing an antiparallel orientation, there is one potential issue in moving from the ‘bar-magnet’ charge pattern of the homomeric system to the all-acidic plus all-basic design of the hetero-tetramer: the latter opens the possibility of accessing a parallel arrangement of helices in solution. To test this, we probed the arrangement of the assembled helices in solution using fluorescence-quenching experiments introduced by Raleigh.77 Guided by the X-ray crystal structure, we inserted the fluorescent 4-cyanophenylalanine (4CF) at the C-terminal e site of the B peptide to give apCC-Tet*3-B-c4CF (Table S2 and Fig. S30†); and we added a quencher, selenomethionine (MSE), at the N-terminal b position of the A peptide (apCC-Tet*3-A-nMSE, Table S2 and Fig. S31†). As a control, we placed the 4CF residue at the N-terminal c position of the B peptide (apCC-Tet*3-B-n4CF, Table S2 and Fig. S32†), which should be too distant from the MSE residue for quenching in an antiparallel assembly with apCC-Tet*3-A-nMSE. Indeed, this control combination fluoresced comparably to the apCC-Tet*3-B-n4CF peptide alone (Fig. 4E). Conversely, fluorescence was substantially quenched when apCC-Tet*3-B-c4CF was mixed with apCC-Tet*3-A-nMSE, indicating that the 4CF and the MSE groups were proximal, and confirming the assembly of antiparallel helices in solution (Fig. 4E).
We hypothesized that helix packing would drive folding of the single-chain protein with only minor influences from the loops, and that no extensive design of the latter should be required. Therefore, we searched for loop sequences of reasonable composition that matched distances between the termini of the helices in the apCC-Tet* structure, while avoiding extended structures that can have unfavourable entropy contribution in the folding.82,83 From the apCC-Tet* structure, we calculated end-to-end inter-helix distances of 17.4–18.5 Å and 12.5–15.0 Å for the wide and narrow faces, respectively. We treated these distances similarly to find loops in the PDB and from the literature to span both interfaces. The selected loops70,82,84 were arbitrarily incorporated into apCC-Tet*. Alphafold2 predictions indicated that the resulting sequence (Table 1) should form the desired single-chain 4HB (Fig. S41†). We called this single-chain protein sc-apCC-4.
A synthetic gene for sc-apCC-4 was expressed in E. coli, and the protein product was purified in sodium phosphate buffer (Fig. S42 and S43†). Biophysical characterization by CD spectroscopy showed a highly α-helical structure that was fully resistant to thermal denaturation like the parent apCC-Tet* peptide (Fig. 5A and B). Moreover, sc-apCC-4 was hyperstable to chemical denaturation, i.e., up 6 M Gn·HCl (Fig. S44 and S45†). AUC-SV and SE experiments indicated that the de novo protein was a monodispersed monomer in solution (Fig. 5C and S46†). Finally, an X-ray crystal structure for sc-apCC-4 was obtained at 2.0 Å resolution. The structure was solved by molecular replacement using apCC-Tet* as starting model. It confirmed a monomeric four-helix CC bundle with an antiparallel (up-down-up-down) topology (Fig. 5D). The sc-apCC-4 structure is consistent with all of our designs in this series: it has a well-packed hydrophobic core, wide and narrow faces, and the sequence-to-structure relationships are clear from visual inspection (Fig. 5E). Moreover, and interestingly, it overlaid extremely well with the AlphaFold2 prediction with all all-atom RMSD of 0.475 Å (Fig. S47†). This suggests that core packing drives the folding over the loops demonstrating that design rules for apCC-Tet* are robust and transposable to build larger and well-defined proteins with analogous biophysical and structural properties.
Fig. 5 Characterization of the single-chain de novo protein, sc-apCC-4. (A) CD spectra at 5 °C and (B) thermal response curves (ramping up, solid lines; and ramping down, dashed line) for sc-apCC-4 (purple) in comparison with apCC-Tet* peptide (grey). Conditions: 25 μM protein in 50 mM sodium phosphate, 150 mM NaCl, pH 7.4 for the single-chain analogue; and 50 μM peptide, PBS, pH 7.4 for apCC-Tet*. (C) Sedimentation-velocity data from AUC for sc-apCC-4. The fit returned a weight of 0.9× monomer mass. Conditions: 25 μM protein in 50 mM sodium phosphate, 150 mM NaCl, pH 7.4. (D) (Left) X-ray crystal structure of sc-apCC-4 (PDB ID: 8a3k) coloured chainbow from the N (blue) to the C terminus (red). (Right) sc-apCC-4 structure viewed from the termini with chainbow colouring and surface representations. (E) Orthogonal views of the overlay between the structures of sc-apCC-4 (purple) and apCC-Tet* (grey) with a RMSDall-atom of 0.447 Å. |
We would like to note that this de novo protein design was achieved in one step from the successful apCC-Tet* design and, thus, without any computational or experimental iterations.
From the success of this rational approach, we contend that we now understood the contribution made by each amino acid in our designed sequences for 4-helix bundles. In turn, we anticipate that the newly designed peptides and protein will provide robust modules for further protein design to introduce function; and in chemical and synthetic biology as synthetic oligomerization domains. Such studies will be facilitated by the biophysical and structural characterizations that we provide here. Moreover, the different designs—of homo- and hetero-tetrameric peptides, and a monomeric protein—present opportunities to target and fine-tune different functions and uses. As an example of this potential, the relatively large and well-defined hydrophobic cores of tetrameric coiled coils and 4-helix bundles have been exploited by others to introduce cavities, small-molecule-binding pockets, and catalytic functionalities.37,39,85–87 Moreover, because our designed peptides and protein assemble efficiently in cells, such as E. coli, we anticipate applications to intervene in and to augment natural sub-cellular processes.10,53,58,62,88,89
In short, we posit that our work adds fundamental understanding of the structural principles and sequence-to-structure relationships for coiled coils generally and 4-helix bundles specifically; and that our new designs provide platforms for future de novo design, and chemical and synthetic biology programs.
Of course, many others have designed de novo antiparallel 4-helix bundles and coiled coils over the past four decades.1,4 These have been achieved by modifying natural protein domains (e.g., the GNC4 leucine zipper, and the tetramerization domain of the Lac repressor),19,90,91 through rational approaches that focus on designing amphipathic helices,18,29,30 and by taking computational approaches.26,27,70 This has led to many different sequences for similar design targets. Therefore, to place our work in this broader context and to explore the sequence variations used for these target, we examined other engineered and de novo designed sequences that (i) have been confirmed with high-resolution structures, and (ii) contain knobs-into-holes packing as detected by SOCKET2 (Table S6†).73 Interestingly, we found that most of the foregoing sequences have no clear residue fingerprints at the g, a, d and e sites that we have focused on. Indeed, there was no discernible consensus from these sequences. Those with the most regular hydrophobic cores and most similarity to our own designs are based on Harbury's GCN4-pLI sequence.19 These have Leu@a and Ile@d, but less regularity at the flanking e and g positions, which can be Leu, charged, or other residues (Table S6†).60 Clearly, these and the other sequences ‘work’ and are solutions to the 4-helix-bundle design problem. However, we suggest that the heterogeneity in sequences and the lack of pinpointable sequence-to-structure relationships may make them less attractive as robust and mutable modules for future redesign and design studies.
Finally, it is interesting to speculate on the broader implications and applications of the approach of transforming self-assembling peptides to single-chain proteins as we demonstrate here in one step, and others have done elsewhere.18,30,78 This can be likened to a possible evolutionary process in which primitive proteins might have assembled from the association and subsequent concatenation of smaller peptides,80 similar to the oligomerization of apCC-Tet* peptide to form robust tetramer and then the single-chain protein. The ease of looping the four helices together while maintaining the core folding provides some support to such a mechanism.92 Our future research aims to apply this approach to transform other well-understood multi-chain de novo coiled-coil peptides20,48,49 into single-chain proteins with clear sequence-to-structure features. We anticipate that the resulting synthetic proteins will be robust and stable, and, therefore, highly mutable to allow the incorporation of residues for binding, catalysis, and other functions.36,51,52,78,88,93,94
Footnote |
† Electronic supplementary information (ESI) available: Methods and ESI data. See https://doi.org/10.1039/d2sc04479j |
This journal is © The Royal Society of Chemistry 2022 |