Coiled coil protein origami: from modular design principles towards biotechnological applications

The design of new protein folds represents a grand challenge for synthetic, chemical and structural biology. Due to the good understanding of the principles governing its pairing specificity, coiled coil (CC) peptide secondary structure elements can be exploited for the construction of modular protein assemblies acting as a proxy for the straightforward complementarity of DNA modules. The prerequisite for the successful translation of the modular assembly strategy pioneered by DNA nanotechnology to protein design is the availability of orthogonal building modules: a collection of peptides that assemble into CCs only with their predetermined partners. Modular CC-based protein structures can self-assemble from multiple polypeptide chains whose pairing is determined by the interaction pattern of the constituent building blocks. Orthogonal CC sets can however also be used for the design of more complex coiled coil protein origami (CCPO) structures. CCPOs are based on multiple CC modules concatenated into a single polypeptide chain that folds into a polyhedral protein cage as the peptide segments assemble into CC dimers. The CCPO strategy has hitherto led to successful de novo design of protein cages in the shape of a tetrahedron, square pyramid and triangular prism. Recent advances in the design of CC modules and design principles have enabled the construction of CCPOs that self-assemble in vivo without any apparent toxicity to human cells or animals, opening the path towards therapeutic applications. The CCPO platform therefore has potential for diverse applications in biomedicine and biotechnology, from drug delivery to molecular cages.


Introduction
Proteins are able to fold into a large variety of three-dimensional structures underlying different functions with the number of natural folds estimated in the order of thousands. 1,2 Protein tertiary and quaternary structure is determined by a large number of weak, cooperative long-and short-range interactions. The folding of polypeptide chains is largely dominated by the hydrophobic effect. Most natural proteins comprise a dense packing of non-polar residues in their hydrophobic core and adopt a specific arrangement of secondary structure elements, while the precise geometry of side chains is defined by electrostatic and van der Waals interactions. 3  variations enable generation of an almost countless number of proteins. However, only a tiny fraction of possible sequences code for defined protein structures. This large sequence and fold space clearly could not have been sampled by evolution. 4 Prediction of proteins' tertiary structures based on their sequence is still challenging in the absence of homologs with known tertiary structure. 5 Nonetheless, recent advances in computational protein design have enabled the creation of novel protein structures with high accuracy, even without relying on the sequence homology of any template. 6,7 Biomimetic design of nanoscale molecular scaffolds and design of functional molecular machines represent motivation for exploring the space of protein folds. Proteins, due to their intrinsic biocompatibility and structural plasticity, represent an appealing material for both biotechnological and therapeutic applications. Even though proteins can be easily manufactured, the complexity of their folding landscapes hinders the prospect of designing new functional protein assemblies. In contrast, DNA nanotechnology based on the straightforward base-pairing complementarity of polynucleotide chains, while offering a much lower chemical versatility, enables design of complex programmable structures with high predictability and reliability.

Modular nanotechnology and modular origami
The characteristic structural flexibility possessed by nucleic acids has been successfully repurposed to construct complex high-order three-dimensional structures. [8][9][10] In nature, RNA can fold into defined compact structures such as e.g. aptamers or combine with polypeptide chains in the ribosome, which is one of the largest and most complicated molecular machines. 11 Due to the much higher chemical variability of amino acid side chains in comparison to nucleic acids, proteins have been selected by evolution as the principal structural and functional material, while polynucleotides have been designated for the conservation and transcription of genetic information due to their straightforward base-pairing complementarity and stability. These two properties, in combination with the possibility to synthesize synthetic polynucleotides of any desired length or sequence, underlined the invention of DNA nanotechnology. Researchers in this field constructed high-order molecular shapes mainly via design of multiple DNA chains that assemble in a highly predictable manner and form defined two-and three-dimensional structures reaching up to micrometer scale. [12][13][14] Moreover, DNA nanotechnology was able to introduce dynamic rearrangement in complex structures to design molecular machines such as molecular walkers or information processing molecular devices. 15,16 DNA nanotechnology typically involves either short multiple chains' self-assembly or a single long chain structured via addition of multiple shorter chains (DNA origami) that are slowly annealed in vitro by temperature ramping or by slow dialysis. 17,18 Although it has been recently demonstrated that the design of the folding pathway of DNA nanostructures is also able to encode rapid folding of single chain knotted structures, 19 the multichain assembly strategy avoids the problem of kinetic traps due to the formation of knots. However, single chain strategies have important advantages over multichain self-assembly due to independence from concentration, which in turn facilitates technological or in vivo production.
The combination of the versatility of polypeptides with the robustness of the DNA nanostructure design strategy could pave the way to construct new complex protein folds. This has been achieved by concatenation of multiple coiled coil (CC) orthogonal building modules, that mimic the pairwise complementary of nucleic acids, for the construction of polyhedral protein cages. 20,21 As in the case of DNA origami, the designed 3D structure is defined by long-range interactions between complementary modules that direct the final self-assembly, whereas the DNA duplex modules are replaced by dimeric CC building modules. In the first demonstration, a tetrahedral cage was designed using a set of 12 orthogonal CC units that upon slow refolding assumed a regular shape, conferring to the tetrahedral cage fold the presence of a peculiar internal cavity. 20 A recent publication extended the approach to polyhedra formed from 16 and 18 units, folding into a square pyramid and a triangular prism, respectively. Additionally, the successful expression in mammalian cells and in mice showed that coiled coil protein

Žiga Strmšek
Žiga Strmšek obtained his MSc in Industrial Pharmacy and BSc in Biotechnology from the University of Ljubljana, Slovenia. Since July 2015 he has been working as a PhD student at the Department of Synthetic Biology and Immunology at the National Institute of Chemistry, Slovenia. The focus of his research is presentation of protein domains on CCPO structures in order to investigate potential biotechnological applications.

Roman Jerala
Roman Jerala is head of the Department of Synthetic Biology and Immunology at the National Institute of Chemistry in Ljubljana, Slovenia, and professor at the University of Ljubljana. Within synthetic biology he is investigating designed modular bionanostructures, particularly coiled-coil based protein origami, mammalian cell synthetic biology and medical applications of synthetic biology and within immunology he focuses on the molecular mechanism of signaling in innate immunity and on cancer immunotherapy. origami (CCPO) structures are stable and do not elicit adverse reactions in vivo. 21 The purpose of this review is to illustrate the field of modular protein design relying on the orthogonally interacting CC as the basic structural unit. First, the special properties and designability of the CC motif are discussed, followed by a review of reported designed orthogonal CC sets. Then we describe the successful designs of protein nanostructures using orthogonal CC sets with an emphasis on CCPO structures and potential applications.
2 Modular coiled coil units

The coiled coil motif
Coiled coils represent a highly suitable building block for building modular protein structures due to relatively well-understood rules governing their folding and specificity. Coiled coils are one of the most widespread protein structure elements in nature, estimated to be present in as much as 10% of the eukaryotic proteome, 22 where they perform both structural and functional roles, acting as protein-protein interaction domains and DNA-binding domains. 23 Coiled coils are described by the interaction between two or more alpha helices that in a canonical form assume a twisted left-handed supercoiled structure with a seven residue periodicity (7/2) and a pitch angle of 201 (Fig. 1). 24 Those structural parameters, initially proposed by Francis Crick, impose a regular, tight side chain packing interface termed as knobs into holes, which is permitted only by a distortion of the number of residues per turn from 3.6 in normal helices to 3.5 in CCs. [25][26][27] The seven-amino-acids periodicity that confers structural regularity to the CC motif is typically called the heptad repeat, where each residue is commonly represented  29 (e) Heptad wheel representation of an antiparallel coiled coil with a list of most frequently observed amino acids in a, d, e and g positions. 29 as a letter in the string abcdefg (Fig. 1). The helices that compose CCs are usually highly amphipathic and exhibit a strong affinity conferred by both hydrophobic and electrostatic interactions. In canonical dimeric CCs, hydrophobic residues occupy positions a and d, and polar residues occupy positions e and g. The former are important in establishing the tight knobs into holes packing while the latter determine the formation of salt bridges between the two helices ( Fig. 1). 28,29 Such regularity, in addition to specificity of binding, made CCs a malleable and versatile tool in the hands of protein engineers and they have been used in multiple ways in the last few decades.
After the crystal structure of GCN4, 30 a parallel homodimeric CC transcription factor in yeast, was solved at high resolution, several peptides were designed and characterized starting from GCN4, giving a more accurate description of the roles of core residues and of the relation between sequence and oligomerization propensity in CCs. 31 For instance, one of the first examples of CC engineering, based on the GCN4 sequence, was the pairing system proposed for the design of the Peptide Velcro heterodimer. 32 This synthetic heterodimer formed a stable complex consisting of two helices designed to have e and g positions occupied respectively by either lysine or glutamate residues, highlighting the importance of electrostatic interactions in these positions. Similarly, the peptides called EE and KK, also designed by exploiting electrostatic interactions between residues at e and g positions, showed a high degree of specificity 33 and are still widely used as a model for CC interaction and for applications that require the heterodimerization complementarity. 34,35 Charges can also be utilized to regulate the orientation of the two helices by matching complementary charges along the peptides. When designing the synthetic CC APH, Gurnon et al. enforced an antiparallel orientation to the homodimer by placing the appropriate amino acids in e and g positions in order to ensure an interaction between the two opposite termini of the peptides. 36 Therefore, at least in principle, it is possible to look at CCs as a simplified study case in which protein-protein interaction surfaces can be engineered by the rule of thumb, resembling the simplicity of pairwise interactions that characterizes DNA. Although the correct formation of salt bridges provides a large thermodynamic contribution towards stable complexes, also van der Waals interactions and steric repulsion are involved in defining the CC specificity. As elegantly shown by systematically replacing residues in a and d positions with either leucine, isoleucine or valine residues in GCN4, Harbury et al. observed the formation of different oligomerization states and provided the insight regarding the influence of these buried residues on the assembly of CCs. 31 Notably, polar amino acids can be found in buried positions along the hydrophobic patches of CCs. 37,38 Taking as example the Peptide Velcro again, it has been shown how changing the position of a couple of buried asparagine residues could determine a change in the orientation from parallel to antiparallel. 39 The ability to control the oligomerization state of CCs has been displayed by the successful design of a series of CC assemblies, which span from dimers to tetramers, via modification of the core residues in a and d positions. 40 However, moving beyond dimers, helical bundles having higher oligomerization state can be built by extending the hydrophobic surface, engaging also e and g positions. A series of bundles, from natural pentamers all the way up to de novo designed hexameric 41 and heptameric assemblies, 42 were investigated, 43 and indexed in a large set of CC structures. 44 Although in classic dimeric CCs, b, c and f positions are not involved in protein-protein interactions, they play a role in determining the stability of CC dimers. Increasing the local helical propensity by formation of salt bridges via pairwise (i, i + 3) and (i, i + 4) interactions allows modulating the stability of the dimer without modifying the specificity of the interaction. 45

Orthogonal coiled coil sets
The ability to construct complex modular protein assemblies depends on the availability of required building blocks. Whereas nature offers large sets of specific CCs, 46 the design of toolsets of CC elements that bind their target with high specificity remains a challenge. Sets of CC elements that bind solely to their designated partner peptide and do not cross-interact, also called orthogonal sets, designed so far possess only a limited size. This is mostly due to the small free energy differences between the desired and off-target associations. To facilitate the design of CC nanostructures several orthogonal CC sets have been developed in the last decade, differing in size, length, and orientation of constituent peptides.
One of the first examples were the a-helical tectons designed by Bromley et al. 47 The set was composed of 6 three-heptad-long peptides that specifically formed 3 parallel CC heterodimers. Gradišar et al. 48 reported the design of a set of 4 parallel CC heterodimers based on the combinations of patterns of charge interactions and a pattern of asparagine residues at heptad position a and evaluated by the energy scoring function introduced by Hagemann. 49 The peptides comprised 4 heptads and contained a N-terminal capping sequence intended to stabilize the a-helical sequence. Orthogonal sets have been constructed from subgroups of synthetic peptides called SYNZIP. 50 It was discovered that these peptides, initially designed to specifically bind the leucine zipper region of bZIP transcription factors, 51 also exhibit strong heteroassociation within the set. In vitro biophysical characterization of 14 SNYZIP peptides revealed that they were capable of assembling into 22 different heterodimeric CCs with groups of up to 4 CC pairs in each orthogonal set. Here, the peptides were of varying length, spanning 5-7 heptad repeats, which was reflected in the high stability of CC dimers, with the majority of the measured K D values in the nM range. Recently, Crooks et al. developed the largest CC set so far. 52 Using the bCIPA algorithm 53 a set of 8 parallel heterodimers was constructed comprising 4 heptad repeats. However, T m measurements revealed that only 7 pairs behaved as designed with T m 4 70 1C and at least a 10 1C gap before the most stable off-target interaction.
Negron et al. 54 reported the design of the only orthogonal set of antiparallel CC dimers found in the literature. Two sets of three homodimeric antiparallel CCs consisting of 6 heptads were designed. Three of the designed peptides preferentially formed antiparallel homodimers and were furthermore orthogonal to a previously designed antiparallel CC dimer, 36 while higher order structures were observed for two of the designs. The availability of antiparallel orthogonal CCs is of high significance in the design of CC based protein origami, since it was shown that certain CCPO structures can be achieved only by inclusion of both parallel and antiparallel CC pairs.
The design of the above described orthogonal sets was achieved by a mixture of rational and computational approaches. Although different computational algorithms were used, they share similar core features. Firstly, the explored sequence space is restrained in accordance with previous rules discovered to govern CC oligomerization and pairing specificity. In the previously mentioned examples concerning the design of parallel CC sets, 47,48,52 only lysine and glutamic acid were allowed at e:g positions, while the a heptad positions could be occupied either by asparagine or isoleucine and at d positions only leucine was allowed. Only a, d, e and g heptad positions were subjected to design, while b, c, and f positions were occupied by helicity promoting amino acids. Secondly, the interaction energy between peptide pairs is evaluated using sequence-based scoring functions as a weighted sum of terms for hydrophobic and electrostatic interactions. The scoring functions differ in weights used and inclusion of certain terms (e.g. helicity 53 ).
In most cases, 47,48,52 the above described sequence rules were used to construct a library of possible sequences based on the limited amino acid variability at interacting positions. Scores were then assigned to all possible peptide pairs and orthogonally interacting sets were selected from the library. Since this approach comes at a high computational cost, algorithms have been developed that optimize the energy gap between the on-target and off-target interactions already at the sequence design level. 51,54 Interestingly, while the above described orthogonal CC sets are synthetic in nature, considerable degree of specificity was observed also for human bZIP transcription factors despite their high sequence homology. 55 First successes of modular protein design exploiting orthogonally interacting CC sets highlighted the importance of such sets for the field of protein design. Strategically linking non-associating a-helical tectons via glycine-glycine linkers resulted in 6-9 nm long helical nanorods as revealed by CD, DLS and AUC measurements. 47 Using the same strategy that yielded a-helical tectons an additional set of 3 heterodimeric CC was designed with two pairs interacting in a parallel fashion, while helix orientation in the third pair was antiparallel. 56 The latter comprised 3 heptads, while parallel dimers were composed of 4 heptad repeats. The developed set allowed successful construction of a three-stranded chassis intended to function as a hub in synthetic molecular motors ( Fig. 2a and c). The resulting assembly was verified via CD, AUC and MALDI-TOF measurements (Fig. 2a). SYNZIP peptides served as the basis for the design of a two-dimensional nanotriangle composed of three polypeptide chains. Similarly to nanorods, the triangle shape was specified by cleverly linking non-interacting peptide pairs by a 10 amino acid linker (Fig. 2b  and d). 57 A combination of structure characterization techniques (DLS, SAXS, AFM) confirmed the fusion proteins assembled into a triangular shape as intended with a characteristic particle dimension of 10 nm (Fig. 2b).
In the above described studies, the final assemblies were achieved as a result of correct pairing between multiple polypeptide chains, which depended on concentration and equilibrium determined by the affinity of CC segments. However, discrete protein structures could be realized more accurately by connecting orthogonally interacting CC dimers into a single polypeptide chain. 58 This strategy was utilized to design the CCPO structures, i.e. polyhedron-shaped protein cages (Fig. 3), 20,21 from building blocks provided by the CC set introduced by Gradišar et al., 45,48 the APH toolkit 54 and modified naturally occurring CC dimers. 59,60 Since finding the right sequential order of peptide modules for more complex polyhedral shapes becomes quickly intractable using back-of-the-envelope approaches, this design strategy relies on the foundations established by the mathematical graph theory.

Modelling of CCPO designs
At its core, the design of CCPO structures consists of connecting orthogonal CC peptides into a single-chain that will guide the polypeptide chain to fold into a polyhedron-shaped protein cage as the peptide modules self-assemble into intramolecular CC dimers forming the edges of the polyhedron. The task of finding the right arrangements of peptide segments is equivalent to the mathematical problem of finding a strong trace, a subset of double Eulerian paths, i.e. an oriented path that traverses each edge of the graph object exactly twice and interlocks the path into a stable structure, which means that all edges are connected with others in vertices. 61,62 While the principles of designing CCPO structures have been described in detail, 21 here we provide a brief overview and underline some important considerations.
CCPO design can be divided into multiple steps (Fig. 4): (i) Selection of the target polyhedral structure. From graph theory it follows that any arbitrary polyhedron-like cage based on a single polypeptide chain composed of concatenated segments that form intramolecular CC dimers (or any other dimer  building module) serving as edges can be uniquely constructed. 58 In reality, there are certain limitations -most importantly the number of available orthogonal building blocks.
(ii) Construction of a double Eulerian path and selection of a circular permutation. After a target polyhedral shape has been chosen, an Eulerian double path, also called topology, is calculated using the method of 1-face embedding developed by Fijavž et al. 62 In principle, a polyhedral protein cage can be realized via multiple different topologies, which differ in the number of required parallel and antiparallel CC pairs. It needs to be mentioned that most topologies involve both parallel and antiparallel CC modules, while only certain polyhedral topologies can be constructed from exclusively parallel (e.g. octahedron) or antiparallel (e.g. rectangular pyramid) modules. The latter also showcases the limitation of DNA for construction of single-chain polyhedral cages, since DNA allows for only anti-parallel edge orientation. Since Eulerian paths are circular, an incision has to be made in one of the vertices of the polyhedron in order to make the path linear and suitable for conversion into an amino acid sequence. Consequently, the C-and N-terminus of the resulting protein cage coincide in the same vertex. For a polyhedron with N edges there are 2N possible linear paths, called circular permutations, resulting in an additional increase of possible sequences. For example, a tetrahedral protein cage can be achieved via 3 topologies, leading to 36 possible circular permutations, while a square pyramid can be achieved via 52 topologies or 832 circular permutations. The question that arises is how to select the optimal order of segments. For this purpose, total contact order (TCO) was introduced which scores different arrangements according to the average distance between pairing segments. 21 TCO is closely related to the relative contact order (RCO) which has been shown to be correlated in natural proteins with protein folding rates and affects the folding pathway. 63 Therefore, a lower TCO is expected to lead to smoother folding and increase the likelihood of successful designs. However, the direct connection between the TCO of CCPO structures and folding rates or design success rate is not yet clear. 21 (iii) Selection and placement of the CC building blocks at appropriate positions in the sequence. Next, an amino acid sequence is generated by connecting orthogonal CC building blocks from the toolbox of orthogonal CC dimers via flexible linkers in a manner defined by the selected circular permutation. CC dimers in the CCPO CC toolbox differ in stability, charge, length and helical propensity; however natural or other designed CCs can be used as well. Experimental testing of approx. 20 CCPO designs revealed several design rules such as e.g. to avoid positioning of less stable CC pairs at the C-or N-terminus as that can lead to fraying, 21 or at positions that are far apart in the polypeptide chain. Regarding the choice of linkers, current experimental results suggest that their sequence does not play a key role in determining design success or stability of the CCPO structures as long as they comprise helix-breaking, small, polar residues, enabling flexible connection between rigid modules that define the fold. To facilitate the design of CCPO structures, a freely available computational design platform, CoCoPOD, was developed, allowing the above described design steps to be performed in a semi-automated manner. 21 In addition to facilitating amino acid sequence design, CoCoPOD also permits construction of atomistic model structures for designed CCPO cages. CoCoPOD can be accessed at github.com/NIC-SBI/CC_protein_origami, and comes with three tutorial videos intended as a quick start for users.

De novo design of CCPO cages
The CCPO folds are based on a highly modular design strategy, based on long range, designable native contacts defined at the level of dimeric CC units to form the edges and guide the assembly of the cage. This strategy therefore bypasses the complexity of the design of cooperative protein core interactions. The affinity and specificity of CC segments to their partner modules underlies the formation of the CCPO protein fold. Therefore, the abovementioned importance of developing orthogonal CC sets assumes particular relevance for the construction of high order CCPO structures. CCPO cages form an internal cavity, whose shape and volume are determined by the geometry of the chosen polyhedron and the length of the edges. In a recent publication, 21 the boundaries of CCPO design have been further extended to high order polyhedra with experimentally confirmed construction of cages possessing the shape of a square pyramid and a triangular prism in addition to alternative tetrahedral topologies. The first generation CCPO structures had to be refolded in vitro from the produced protein, as in most designs of DNA nanostructures, which limited the potential technological and therapeutic applications. One of the elements pivotal for the success of the second generation CCPOs was the design and usage of supercharged CC elements that ushered the correct in vivo self-assembly of CCPO under the physiological conditions, without the requirement of in vitro refolding steps. 20 The design platform CoCoPOD provided a suitable environment for the design of different polyhedra, showcasing the utility of the developed software.
Three representative structures, a tetrahedron, a square pyramid and a triangular prism, formed by 12, 16 and 18 CC segments respectively, were confirmed by both small angle X-ray scattering (SAXS) and single particle TEM reconstruction (Fig. 3). As an indication of the flexibility of these nanostructures, SAXS experiments revealed that the trigonal prism is present in solution in both rectangular and oblique conformation. Furthermore, Kratky plots of other CCPO structures also indicated partial flexibility. Since CC elements represent rigid modules, conformational changes are due to the flexibility of the loops, which allows angles of non-constrained faces a certain degree of freedom, resulting in cages with a limited conformational variability. In addition, the tetrahedron TET12SN was structurally characterized by application of chemical cross-linking coupled with proteolytic digestion and mass spectrometry, 21 which can be employed to investigate the fold of modular CCPOs. Crosslinking was performed with three different reagents, DSS, BS(PEG) 5 , and BS(PEG) 9 , that can bridge Ca-Ca distances up to 2.4, 3.4 and 4.8 nm respectively, covering the range of distances relevant for the tetrahedral protein cage. After cross-linking, the protein was subjected to proteolytic cleavage resulting in crosslinked peptide fragments. The latter were analyzed using mass spectrometry. In the case of the shorter cross-linker, several connections between pairing CC segments were detected, confirming that in the context of TET12SN peptides assembled into CC dimers as expected. With the longer cross-linkers, BS(PEG) 5 and BS(PEG) 9 , long-range crosslinks between non-neighboring pairs of peptides (in terms of sequence) were detected. These connections were consistent with distances observed for the corresponding peptide fragments during MD simulations of the model TET12SN cage, indicating that the polypeptide chain folded according to the design.
The increase in the complexity of CCPO structures is also reflected in the increase of TCO values (4.3, 5, and 5.6 for the CCPO tetrahedron, square pyramid and trigonal prism). Although CCPO structures are defined by long-range interactions between CC modules and not a tightly packed hydrophobic core, the kinetics of folding for the CCPO tetrahedron, square pyramid and trigonal prism, obtained via stopped-flow CD and stoppedflow FRET experiments, 21 were discovered to be comparable to that of natural proteins of similar length. 63 The experimentally determined secondary structure folding rates (17 s À1 , 14 s À1 and 7.7 s À1 , respectively for the tetrahedron, square pyramid and trigonal prism) were in agreement with the overall folding rates observed via the FRET effect of fluorescently labeled N-and C-terminal ends (31 s À1 , 15 s À1 and 10 s À1 respectively for the tetrahedron, square pyramid and trigonal prism). Interestingly, the increase in TCO values was correlated with a decrease in folding rates as expected from theoretical considerations. 64 However, this correlation, as well as the complete characterization of the folding pathways of CCPO cages (e.g. potential kinetic traps, folding rates after annealing), still requires additional studies that would offer a better understanding and a means of controlling the CCPO folding process. A further increase of the CC module length will likely introduce knotted CCPOs, which will represent strong kinetic barriers that will need to be considered but also exploited as demonstrated before for single chain DNA nanostructures. 19 The current state of the art CCPO cages undergo a reversible unfolding process, retaining their monomeric state after temperature unfolding followed by rapid cooling, while at 4 1C they are stable for weeks. Successful testing of approx. 20 CCPO cages reflected the robustness of this strategy. Additionally, biophysical characterization and SAXS analysis of 10 tetrahedral variants and another four-squared pyramid confirmed the applicability of this strategy to differently composed polyhedra and helped to understand the rules governing the formation of these non-natural folds.
The novelty of this modular design strategy is in the atypical fold assumed by these cages, whose robust designability and formation of an internal cavity can, in turn, be used for different applications. Besides self-assembly in an in vitro transcriptiontranslation reaction and bacterial production, CCPO structures also self-assembled in mammalian cells as well as in living animals. The correct folding was confirmed by reconstitution of protein reporters, fluorescent proteins and luciferase catalytic activity fused to the termini of the CCPO structures. Biocompatibility of the designed CCPO structures with mammalian cells was proved by monitoring inflammasome activation and unfolded protein response. In addition, the absence of inflammation and liver damage markers confirmed that the tested CCPO structures are not sensed by mammalian cells as foreign and adopt the correct native structure in vivo. Therefore, due to the lack of observable adverse effects in vivo, CCPO cages show considerable promise for biological applications.

Prospects of modular CCPO protein design for applications
The ability to accurately manipulate objects at the nano-scale level is advantageous for various applications from biomedicine, materials science to chemical technology and beyond. While a wide spectrum of different nanomaterials is already available, [65][66][67] polypeptide-based materials own a specific combination of features such as programmability, ability to accommodate functional chemical moieties with nano-scale accuracy, self-assembly, biocompatibility, biodegradability and sustainable technological production that make them a highly suitable material for biomedicine and other technological applications.
Designed proteins have already been used for production of nano-vaccines and drug delivery systems. [68][69][70] Protein-based technologies are increasingly used to address the problem of producing new safer nano-vaccines. 71 Subunit vaccines, composed of discrete molecular effectors, provide a safer and viable alternative to inactivated-or attenuated-pathogen based vaccines. 72 The advantages of designed subunit vaccines reside in the controllability over molecular composition, the higher safety offered by the system and the control over size, shape and geometry. The close arrangement of epitopes in the crystalline lattice of Gp23, the major capsid protein of bacteriophage T4, provided a substrate with 7-10 nm spacing between epitopes that increased the antibody titer. 73 Although polymeric nanoparticles such as PLGA or PGA and lipid nanoparticles have been used as carriers for subunit vaccines and drug delivery, 65,74 protein-based nanoparticles feature strong controllability and high biodegradability in comparison to polymer-based particles, whose long-term effects on health are still not known. 68 Self-assembling peptide nanoparticles (SAPNs) are based on a bottom-up approach, where the intrinsic affinity between the components leads to the formation of structurally defined nanocarriers that present multiple copies of antigens. Epitopes are usually fused to short peptides that assemble into fibers or compact particles. Protein self-assembling nanofibers conjugated to different antigens produced auto-adjuvant effects and elicited activation of antigen-specific T cell differentiation, 75,76 for treatment of major diseases such as HIV, malaria, SARS and avian influenza. [77][78][79][80][81] Parallel presentation of immuno-stimulatory compounds and antigens also represents an attractive strategy, as demonstrated by the use of the innate immunity TLR ligand in combination with antigenic protein. [82][83][84][85][86] On the other hand, what protein engineering and de novo protein design brings to the table is the power to build novel and well-defined architectures at atomic accuracy. 6 The computational protein design software such as Rosetta provides a platform for the design of customized immunomodulatory proteins, which mimic specific structural epitopes [87][88][89][90] or function as peptide-based inhibitors. 91,92 Epitope focused design and backbone grafting permit to move epitopes into structural scaffolds. This strategy allowed grafting HIV and RSV epitopes in protein scaffolds, which was shown to elicit neutralizing activity in animals. 88,90 Protein design was also used to generate novel biocompatible inhibitors. 89 This approach yielded extremely tight HA binders with IC 50 o 150 pM that protected mice from viral infection 91 and small interactors that effectively provided protection against viral infection in animals. 92 Matching both correct size and presentation of antigens is a critical parameter in vaccine development. 93 The size of modular CC assemblies can be regulated via modulation of CC length. Utilization of extended CC units or implementation of higher order structures could allow fine-tuning of the final size of CCPO structures. The other advantage offered by CCPO designs is the possibility to precisely engineer and design fusion partners, either chemically or genetically encoded, as for instance the fusion of natural protein reporters and fluorescent dyes. 21 Manipulation of self-assembling protein modules is the key to achieve highly controllable protein cages for encapsulation and drug delivery. As for multimeric protein cages, subunits can be modified via different chemistries, fused to protein domains pointing either inwards or outwards and self-assembled in order to encapsulate diverse compounds. 94 Protein cages showed promising results as drug delivery systems. 94 Naturally occurring proteins, such as ferritin, vault proteins or viral capsids, have already been successfully developed into drug delivery systems. [95][96][97][98] Notably, Hilvert and coworkers reengineered natural proteins such as lumazine and ferritin for encapsulation of small proteins into natural capsids via electrostatic interactions. [99][100][101] However, designing protein cages from scratch can yield more versatile scaffolds for the purposes of direct applications such as drug delivery. Self-assembling natural domains have been employed to construct de novo oligomeric polyhedral cages as in the studies initiated by Yeates, 102-104 that led to the development of computational procedures for the de novo design of multicomponent large cages. [105][106][107][108] Cages can be constructed by employing a wide range of different building blocks, from bulky protein domains to smaller, rigid units, as in the case of CC elements. In particular, CCs were used as modular units either for direct self-assembly of multimeric hollow B100 nm large spheres 109,110 or for the design of monomeric protein cages as in the case of CCPO structures. In comparison to other de novo cages, CCPO structures are the only example of assemblies that accommodate cavities within a single polypeptide chain. (g) 12-subunit tetrahedron (PDB code 4ITV); (h) 120-subunit icosahedron (PDB code 5IM6). The plot at the bottom shows the volumes of the internal cavities for different proteins, the bars in light blue correspond to natural proteins and bars in dark blue correspond to de novo designed proteins. The internal volume of monomeric proteins is represented by diagonal lines patterned bars. The cavities were generated by using either Computed Atlas of Surface Topography of proteins (CASTp), 123 Voss Volume Voxelator (3V) 124 or a 10 nm radius sphere (h), and volumes were then calculated using UCSF Chimera. 125 Monomeric natural proteins possess internal cavities with an average volume of 0.25 nm 3 ; larger cavities reach around 2.5 nm 3 but often assume irregular shapes and typically occur at the protein-protein interaction surfaces (Fig. 5a). 111 In contrast, multimeric protein cages either natural or de novo designed can accommodate large internal cavities, measuring from 40 nm 3 to more than 4000 nm 3 (Fig. 5b-d and f-g). Due to their peculiar fold, CCPO structures exhibit a large and hydrophilic cavity of approximately 40 nm 3 (in the case of a tetrahedral fold) within a single polypeptide chain (Fig. 5e). In comparison to symmetry-based protein self-assemblies, where the cavity is tightly enclosed, CCPO cages offer a much more exposed cavity. The formation of such extensive cavities in a single polypeptide assembly makes CCPO design an attractive tool for targeted drug delivery and for molecular cages.

Conclusion and outlook
Diverse strategies to obtain polypeptide sequences that fold into a designed three dimensional structure are available and are continuously being improved. 4,6,112,113 Due to the intrinsic adaptability of CC elements, design strategies that use sets of orthogonal CCs offer a solid and designable platform for the development of functional nanostructures. Examples of CC modules used to sense pH changes and drug release from liposomes 114 as well as CC elements able to rearrange their structure upon binding of metal ions 115,116 suggest that protein designs based on these modules could serve as scaffolds for developing conformationally flexible nanostructures. In comparison to design strategies based on single-state energy minimization of large folds, precise control over structural rearrangements is a major advantage offered by small and well-studied protein modules as exemplified by CCs based on azobenzene crosslinkers which trigger light-induced conformational rearrangements of CC helices. 117,118 Control over proteins' three dimensional rearrangement can readily be achieved by grafting flexible conformational hinges within rigid structural elements. In the context of CCPO structures, CC formation involves a conformational transition from unfolded monomers to structurally rigid dimeric units, an example of foldingupon-binding behavior, resembling some intrinsically disordered proteins. 119 In the context of CCPO, the reversibility of the folding transition offers the possibility to further engineer CCPO folds into dynamic assemblies able to assume different conformations upon interaction.
Additionally, the flexible linkers intersecting CC units in CCPO cages offer an additional degree of freedom to polyhedral cages (as experimentally observed in the case of a trigonal prism), which in turn allows considerable movement of CC dimers affecting the volume of the internal cavity, leading to a large breathing capacity of the whole structure. CCPO structures represent an interesting example of modular yet monomeric protein assembly. Foremost, these folds do not rely on symmetric oligomerization and, therefore, permit the assembly of cages with addressable unique sites. Secondly, these cages possess a large cavity that could accommodate chemically linked compounds. Currently, the largest designed CCPOs comprise 700 amino acid residues resulting in one of the largest single chain protein designs. However, it is likely that construction of larger CC nanostructures will require assembly from several partially assembled subunits. Additionally, the expansion of CC orthogonal sets, in terms of both their number and their size, will facilitate the design of more complex modular folds. Besides, discrete multi-chain coiled coil protein assemblies can also be achieved based on the control of the angle between building blocks via linker length 120 or charge repulsion. 109,110 Modular CC-based designs exhibit several properties that make them appealing for therapeutic purposes. Such properties are also gaining importance in biotechnology, in particular in controlled catalysis, where the precise stoichiometric and spatial clustering of catalytic elements is important. 121,122 In this regard, CCPO nanostructures may offer high customization. Furthermore, the biocompatibility of these novel folds, already demonstrated by in vivo studies, 21 provides solid foundations for further development. In particular, understanding these novel folds and repurposing them for geometrical rearrangement of grafted moieties and molecular shielding represent interesting perspectives towards biotechnological applications, and while these challenges may require substantial efforts, recent advances in designing CC-based nanostructures offer all the reasons to be optimistic.  5 Bis-N-succinimidyl-(pentaethylene glycol)ester BS(PEG) 9 Bis-N-succinimidyl-(nonaethylene glycol)ester FRET Förster resonance energy transfer PLGA Poly(lactic-co-glycolic acid) PGA Poly(glycolic acid) SAPN Self-assembling peptide nanoparticles HIV Human immunodeficiency virus SARS Severe acute respiratory syndrome RSV Respiratory syncytial virus HA Human influenza hemagglutinin

Conflicts of interest
RJ is the author of the patent application on the design of selfassembling polypeptide polyhedra.