 Open Access Article
 Open Access Article
      
        
          
            A. 
            Ljubetič†
          
          
        
       a, 
      
        
          
            I. 
            Drobnak†
          
          
        
      a, 
      
        
          
            H. 
            Gradišar
          
          
        
      ab and 
      
        
          
            R. 
            Jerala
          
          
        
      *ab
a, 
      
        
          
            I. 
            Drobnak†
          
          
        
      a, 
      
        
          
            H. 
            Gradišar
          
          
        
      ab and 
      
        
          
            R. 
            Jerala
          
          
        
      *ab
      
aNational Institute of Chemistry, Hajdrihova 19, Ljubljana, Slovenia. E-mail: roman.jerala@ki.si
      
bExcellent NMR – Future Innovation for Sustainable Technologies, Centre of Excellence, Ljubljana, Slovenia
    
First published on 15th March 2016
Polypeptides and polynucleotides are programmable natural polymers whose linear sequence can be easily designed and synthesized by the cellular transcription/translation machinery. Nature primarily uses proteins as the molecular machines and nucleic acids as the medium for the manipulation of heritable information. A protein's tertiary structure and function is defined by multiple cooperative weak long-range interactions that have been optimized through evolution. DNA nanotechnology uses orthogonal pairwise interacting modules of complementary nucleic acids as a strategy to construct defined complex 3D structures. A similar approach has recently been applied to protein design, using orthogonal dimerizing coiled-coil segments as interacting modules. When concatenated into a single polypeptide chain, they self-assemble into the 3D structure defined by the topology of interacting modules within the chain. This approach allows the construction of geometric polypeptide scaffolds, bypassing the folding problem of compact proteins by relying on decoupled pairwise interactions. However, the folding pathway still needs to be optimized in order to allow rapid self-assembly under physiological conditions. Again the modularity of designed topological structures can be used to define the rules that guide the folding pathway of long polymers, such as DNA, based on the stability and topology of connected building modules. This approach opens the way towards incorporation of designed foldamers in biological systems and their functionalization.
Because the problem of biopolymer design is too complex to tackle in our lifetime by a comprehensive (brute force) approach, it needs to be broken down into smaller, simplified sub-problems. An early solution has been to take a naturally occurring protein as a starting point and only tweak specific parts through point mutations or truncations in order to abolish or modify its natural structure and function.7 This represents the basis of the incremental (evolutionary) design and is reasonably straightforward, as long as only small changes are introduced; more radical changes however produce unpredictable outcomes. Taking this approach further, many larger natural proteins can be broken down into distinct structural domains that are able to fold independently from the rest of the protein. Different domains can be mixed and matched to form novel proteins either as a single polypeptide chain or as multiple chains held together by domains that specifically interact with each other.8 A more advanced strategy is to modify the existing domains to engineer specific binding interfaces for other protein domains and for small ligands, resulting in a predictable structure of the complex and potentially novel functions, including those not found in nature.9,10 This is made possible by advances in computational simulations that search a large number of possible conformations and attempt to calculate the stability of each conformation, so that the energetic minimum can be sought. Computational tools like molecular dynamics11–13 or the Rosetta structure modelling suite14,15 are by now well established and can be of great help in determining and designing structures at the atomic level. However, they can still sample only a relatively small part of the conformational space available to biomolecules, so they require some specialized knowledge in order to make the best use of their strengths while being aware of their shortcomings. Some of the most advanced examples in this field include designing protein–protein interfaces with a precise geometry that allows multiple proteins to assemble into symmetric structures like polyhedra9 (Fig. 1) or planar meshes.16
|  | ||
| Fig. 1 Protein design strategies. Haemoglobin (PDB ID 2DN2) is shown as a typical representative of natural globular proteins. The designed multi-domain assembly (PDB ID 4EGG) is composed of four trimeric subunits, with their contact interfaces carefully designed to produce a symmetric tetrahedral structure. The result is a large, bulky assembly with a solid core, similar in principle to most natural proteins. Designed topological protein composed of concatenated coiled-coil dimer forming modules yields a tetrahedral protein cage with a large cavity in the centre (structural model taken from ref. 24). Images were created with UCSF Chimera.61 | ||
Nucleic acids and polypeptides are the two types of linear programmable biomolecules whose sequence can be modified at will in order to guide the self-assembly of their tertiary structures. Proteins are used in nature to perform most functions as molecular machines, while nucleic acids primarily have a role in storage and translation of heritable information. Using nucleic acids instead of proteins in structure design is a way of making the design problem more tractable since simple, well understood base-pairing rules allow us to engineer very specific intra- or inter-molecular contacts. DNA also tends to adopt a predictable double helical structure whenever two complementary strands come into contact.17 As a result, designing the secondary and even tertiary structures of nucleic acids based on complementary modules is considerably easier than for proteins, but the relative simplicity also has drawbacks. The limited diversity of functional groups found in nucleic acids allows less versatile functionalization and the relatively rigid base-pairing rules allow for less structural plasticity and adaptability compared to proteins. RNA is more flexible in this regard than DNA, but as a result its structure is more difficult to predict and it is particularly susceptible to hydrolysis and degradation by ribonucleases. Additionally, nucleic acids may trigger an immune response within the cytosol of eukaryotic cells.18 Thanks to a wide array of established molecular biology tools and automated chemical synthesis, manipulation of nucleic acids is much simpler than with proteins, but it remains expensive when large quantities are desired.
Spectacular progress in designed DNA nanostructures has been achieved in the last three decades. Several different approaches, including multi-strand assembly, hierarchical assembly, scaffold-based assembly and single strand assembly (reviewed in ref. 19), have been successfully demonstrated. It is now possible to assemble almost any selected 3D shape using designed DNA with a resolution of several nm, with particle sizes ranging from 100 nm up to several micrometers in the case of periodic assemblies. The key to this approach is modularity, where the final structures are assembled from modules based on complementary antiparallel strands, whose stability and orthogonality is well understood and can be designed at will. DNA origami,20 the technique where a single long chain, typically from a bacteriophage, is shaped by a large number of shorter oligonucleotides that act as clamps, has proven to be very successful in designing a variety of 2D and 3D nano-scale structures.21 Most DNA nanostructure methods require a separate synthesis of many different oligonucleotides, followed by careful mixing and slow annealing to make sure the correct (most stable) structure is obtained.
In contrast, we focus here on the topologically constrained folding of single chain biopolymers (Fig. 2), a process that more closely mimics the way natural biomolecules fold. The advantage of a single chain design is that each unit folds independently of others, without the need for mixing and assembling different components in the correct ratio. In principle, single chain biopolymers can therefore be produced and folded in vivo, as long as we can avoid misfolding and non-specific interactions with other cellular components. In addition to the design of the structure as the unique energetic minimum, we need to design a primary sequence that will not only give the correct final structure, but will also follow a smooth and efficient folding pathway that avoids aggregation-prone intermediates and misfolded states.22,23 Designing the folding pathway represents a major challenge, both for proteins and for DNA-based nanostructures, but overcoming this challenge will open the door to efficient in vivo production of designed molecular machines and their integration with existing biological systems. This would greatly advance many technological applications, ranging from cost-effective production of biomaterials, engineering new biosynthetic pathways, to cell-based therapeutic approaches to combat diseases.
|  | ||
| Fig. 2 Topological design of the tetrahedral fold from a single polypeptide chain. (a) Topological solutions for double Eulerian trails assembling a single-chain tetrahedral path. Three distinct topoisomers are possible, built either from four parallel and two antiparallel, or three parallel and three antiparallel coiled-coil pairs. (b) A single chain is composed of twelve coiled-coil forming modules, linked in defined order that self-assembled into a cage-like tetrahedral nanostructure. The chain path is threaded through the edges of a tetrahedron traversing each edge exactly twice, so that the path interlocks the structure into a stable shape formed by the six coiled-coil dimers. In this particular topology, two coiled-coils edges are antiparallel and four parallel.24 (c) Representative tetrahedral particles from TEM images and projections of a tetrahedron in the matching orientation are shown. Samples on grids were stained first with 1.8 nm NiNTA-nanogold beads via His-tag followed by the uranyl positive staining. Scale bars represent 5 nm.24 | ||
|  | ||
| Fig. 3 Comparison of the specificity underlying protein forming coiled-coil dimer and DNA duplex building modules. (a) Coiled-coils are characterized by a periodic heptad repeat with residue positions labelled as abcdefg. Specific association of chains is governed by hydrophobic interactions between amino acid residues at positions a and d, forming a hydrophobic spine running the length of the coiled-coil and electrostatic interactions between oppositely charged residues at positions e and g, defining either parallel or antiparallel orientation of strands.28 (b) DNA duplex specificity is determined by the Watson–Crick nucleic base complementarity (A–T, C–G). These specific pairwise interactions give rise to a stable double-helical structure in an antiparallel orientation. | ||
In contrast to DNA duplexes which are always antiparallel, coiled-coil dimers may form in either a parallel or an antiparallel orientation, which expands the number of accessible designed topologies. An additional advantage of designed coiled-coil dimers is that the specificity of pairing is defined primarily by 4 (positions a, d, e, g) out of the 7 residues of the repeat, leaving the 3 remaining residues (positions b, c, f) available for the introduction of side chains that provide different functionalities. Many coiled-coil dimer forming peptides have been experimentally tested and some of them have been specifically selected or designed for their lack of cross-reactivity.34–38 Other sets of orthogonal protein–protein binding pairs could in principle also be used to construct topofold proteins, but coiled-coils (and nucleic acid duplexes) have the advantage of being relatively thin and long, which makes them useful for constructing cage-like structures around solvent-accessible cavities (Fig. 1 and 4).
|  | ||
| Fig. 4 Toolbox of orthogonal dimer forming module set enables formation of designed topological polyhedral folds from a single chain. (a) The protein toolbox consists of orthogonal dimeric coiled-coils (CC), which can bind in either parallel (P) or anti parallel (AP) orientation. The DNA toolbox offers a larger number of orthogonal building blocks, though all are limited to the antiparallel orientations.39 (b) The size of the orthogonal set limits the diversity and complexity of folds that can be constructed. While antiparallel and parallel orientations of coiled-coil dimers allows in principle construction of any type of a protein polyhedron, the antiparallel only orientation of DNA building blocks restricts the selection of polyhedra, with square pyramid as the smallest single chain antiparallel polyhedron.19 | ||
The proof of principle of this strategy was experimentally demonstrated with the design and characterization of the modular self-assembled tetrahedron as the simplest three-dimensional geometric object (Fig. 2).24 The polypeptide chain for a monomeric tetrahedral structure was composed of 12 designed coiled-coil forming peptide modules, capable of forming six orthogonal coiled-coil dimers, four parallel and two antiparallel. The building modules selected from a toolbox of designed orthogonal coiled-coil dimers were concatenated into a defined order and linked by short, flexible peptide linkers that formed the vertices of the tetrahedron. The polypeptide was produced in recombinant form in E. coli and purified in the unfolded form. Self-assembly was achieved at low protein concentration by slow dialysis into denaturant-free buffer, resulting in a nanostructure with edges around 5 nm. The tetrahedral-shaped structure was confirmed by atomic force microscopy and transmission electron microscopy imaging (Fig. 2c), secondary structure content and the correct topology by the reconstitution of the split fluorescent protein linked to the N- and C-terminus of the tetrahedral polypeptide.24
There are, however, a number of practical limitations to topology-based structure design. The main limitation for protein-based nanostructure design is the availability of orthogonal building blocks. Although a substantial number of coiled-coil dimers has been designed and characterized, orthogonality, i.e. a lack of cross-reactivity, has only been demonstrated for a few relatively small subsets compared to the possibilities of nucleic acids.34,35,37,38 The current coiled-coil toolbox suffices for designing simple polyhedral structures such as the tetrahedron or square pyramid (Fig. 4), but a larger set of orthogonal pairs will be needed to construct more complex shapes with much larger numbers of edges. Expanding the pool of orthogonal coiled-coil pairs is therefore a priority and remains an active area of research. On the other hand, designing orthogonal duplexes is much simpler for DNA nanostructures, although nucleic acids have the disadvantage of only forming antiparallel duplexes, which imposes some limits on what single chain topologies can be designed. For example, a single-chain tetrahedron cannot be constructed without the use of parallel strands, although it is possible to construct either a two-chain tetrahedron or a square pyramid using DNA. By contrast, it has been shown that any polyhedron could in principle be constructed from a single chain if we have both parallel and antiparallel interacting modules at our diposal.24 Another limitation which has to be taken into account for the more complex structures is that due to a relatively large number of interacting modules and the possibility of forming topological knots, the biopolymer chain may have difficulty finding its energetic minimum. To avoid partially folded intermediates that are kinetically stable or even aggregation-prone, the folding pathway needs to be considered.
In view of its outstanding interest to all areas of life, protein folding has been studied for more than 50 years. It was realized that a random search of all possible conformations is not a feasible folding mechanism (i.e. the Levinthal paradox5), since folding would require very long timescales, but proteins fold on the sub-second timescales. A proposed solution to Levinthal's paradox was that proteins fold through distinct intermediate states in a well-defined pathway.41 The pathways were defined in terms of abstract states based on kinetic models (i.e. how many different kinetic constants are observed in macroscopic refolding experiments). From further experimental evidence, especially hydrogen–deuterium exchange mass spectroscopy and mutational studies, emerged a more statistically oriented “new view”.42 The new view explains the folding of proteins in terms of a free energy folding funnel. The folding of proteins in this view is based on a downhill energetic bias, where the “ruggedness” of the energy landscape is the cause for the observed kinetic intermediates. The native state could in theory be reached through multiple stochastic pathways that are difficult to predict or observe and characterize experimentally.
The views are basically different aspects (the macroscopic and the microscopic) of the folding process. The Foldon hypothesis43,44 reconciles both views, by proposing that proteins are multistate objects built form small (usually ∼30 amino acids long) separately cooperative foldon units. Only a few foldons need to be found by a random search, while the formation of the subsequent foldons may be guided by those that are already formed. Multiple pathways are possible if the cooperativity between certain foldons is weak.
In protein topofold structures each coiled-coil edge could be considered a separate discrete foldon. Equally in DNA topofold structures each complementary module represents a foldon. As will be shown later, the stability of foldon units and their topology enables some degree of control over the folding pathway by changing the order in which the foldons form.
Although most attention has been aimed at the folding of proteins, in recent years there has been an increased interest in the folding of RNA,45 DNA46 (in DNA origami) and synthetic foldamers47 that aim to mimic the natural biomolecules.
So far most of the designed DNA nanostructures have been assembled by a slow annealing process in a narrowly defined range of temperature and concentrations of building elements, typically taking several hours or even days of slow cooling in order to achieve a reasonable folding yield.48 The ability to control the folding process would also enable the design of topologically knotted structures. Knotted structures have significant technological potential, since their thermal49 and mechanical50 properties are enhanced, similar to macroscopic knots, for example. Formation of knots is not very common in natural protein tertiary structures, due to the demanding kinetics of their folding, which usually involves slipknots. Most frequent protein knots are trefoil knots (31), although knotted structures with a crossing number of six have recently been determined.51 A knotted protein has been designed by gene fusion52 and exhibited higher thermal stability than the unknotted analogue. Folding of both proteins was reversible, but unsurprisingly the knotted protein exhibited 20 times slower folding kinetics.
Folding of knotted designed biopolymers is so challenging because the chain needs to be threaded through previously formed loops in the correct predetermined sequential order, which requires a strategy to control the folding pathway.52–54 Modular topological bionanostructures represent an excellent opportunity to simplify and manipulate the folding pathway due to the uncoupled yet well-understood and tuneable pairwise interactions that define the fold.
|  | ||
| Fig. 5 Design of the folding pathway of twisted topological polyhedra based on the “free end rule”. (a) Any two contact pairs in a linear chain (shown as blue and orange dots) can be classified either in a series (S), cross (X) or parallel (P) relation. The remaining segments can be classified either as a free terminus (T), a hairpin loop (H), an internal loop (L) or an internal segment (I).55 Subsequent contacts may form either favourable or unfavourable folding steps, where the previous connection needs to be unfolded before formation of a new contact. (b) Favourable steps include at least one segment having “free end “. (c) Unfavourable folding steps are hindered either topologically or kinetically due to the previous arrangement of the grey contacts. (d) The optimal topological design (P1) of a single chain (DNA) pyramid with a defined order of the formation of connections. Modules are formed in the alphabetical order with “Aa” being the most stable and the first contact to form. No violations of the free end rule are present. (e) Experimentally the DNA pyramid folded rapidly and with high yield under all conditions.59 AN – thermal annealing, LN2 – quenching with liquid nitrogen, ice – quenching with ice, RT – room temperature cooling. (f) Experimental folding of the sub-optimal circular permutation (P1cp6) design. (g) A circular permutation of the optimal design where the P1 sequence is circularly shifted by six positions to the left in the chain. This permutation introduces six unfavourable steps (shown with a dashed red line). This design did not fold into a DNA pyramid even by slow annealing show in panel (f). More detail is given in Kočar et al.59 | ||
Topofold structures constructed of twisted pairwise interacting polymers, such as a DNA double helix, may contain many kinetically or topologically disfavoured folding steps, particularly in cases where the contact segments exceed one turn of a helix (approx. 10 bp), since it may introduce topological knots. Depending on the initial folding steps (Fig. 5a) the remaining segments can be classified either as a free unstructured terminus (T), a hairpin loop (H), an internal loop (L) or an internal segment (I).59
Pairing of the remaining free modules is affected by the connections already formed. For example, paring of modules between loops (e.g. L + H, H + H and L + L, Fig. 5c) is topologically hindered, as the modules are unable to wrap around each other to form a full-length double helix without unlinking existing connections (providing each of the modules is longer than one turn of the double helix). Threading of previously formed loops through another already formed loop (e.g. I + H, Fig. 5c) is also kinetically disfavoured, as demonstrated by both simulations and experimental results.59
The most favourable folding steps are therefore those in which at least one of the interacting modules is located on a T segment. Favourable folding steps for different arrangements are shown in Fig. 5b. Since at least one of the interacting modules must reside on the free end of the chain, this design principle was named the “free end” rule. Importantly, it has been proven mathematically that at least two folding pathways that consist only of favourable steps can be constructed for every single-chain polyhedron.59 Such an optimal pathway is therefore feasible even if one of the termini is fixed, which may be particularly relevant during the biosynthesis of linear biopolymers where the growing end of the chain is not free.
The importance and feasibility of designing a favourable folding pathway for the modular single chain structures was demonstrated through several designs of a single chain square DNA pyramid.59 The square pyramid is the smallest regular polyhedron that can be composed from a single chain using only antiparallel modules to form a double Eulerian trail that traverses each edge exactly twice in an antiparallel orientation. The square pyramid is highly knotted and was not expected to fold correctly without designing a favourable folding pathway. In order to prove this experimentally, six variants of the square DNA pyramid were designed from the same set of interacting orthogonal module building blocks so that all designs should form the same final structure with equal stability.
Different interacting DNA modules were designed with different thermal stabilities, which was used to steer the folding (annealing) pathway, as more stable pairs were expected to form first. The six designs differed only in the order of the modules in the chain, where the optimal design comprised only steps in agreement with the free end rule, while in other designs one or up to six folding steps violated the free end rule. The optimal design is shown in Fig. 5d. A circular permutation of the optimal design, where each segment has been circularly shifted six modules to the right in the linear chain is shown in Fig. 5g. The circular permutation uses exactly the same modules, but the order of pairing according to the stability results in six unfavourable steps (shown in dashed red). The optimal pyramid design indeed folded correctly by slow annealing and demonstrated efficient self-assembly even when it was rapidly quenched from 90 °C to the temperature of ice or even liquid nitrogen (Fig. 5e), which demonstrates the efficiency of the rational design for the folding pathway. On the other hand, the designs containing more than five unfavourable steps did not fold efficiently even when annealed (Fig. 5e).59
The validity of the free end rule was also corroborated by simulating the folding rates using a coarse grained oxDNA60 model and Forward flux sampling. The free end rule thus represents a guiding principle for the design of modular DNA nanostructures and enables robust designs of highly knotted DNA structures that fold quickly and with high yield.
The free end rule can also be integrated with the new view on folding.43 Each DNA module can be viewed as a separate foldon. By switching the positions of the foldons in the chain the folding pathway can be manipulated.
Since these design rules depend on the topology rather than on the molecular details, they could also be transferrable to other twisted knotted single chain structures such as coiled-coil-based topofold proteins. The protein tetrahedron recently built24 using orthogonal coiled–coiled modules is not knotted, as it uses modules shorter than one superhelical turn, but for protein topofold structures constructed from longer coiled-coil modules we can expect the “free end” rule to become relevant and a tool to steer the protein folding pathway.
Topofold biopolymers are of particular interest as they are based on different design principles from conventional globular proteins, so there is a higher probability for providing novel structures and functions. The successful design of the folding pathway of highly knotted DNA nanostructure demonstrates that it is feasible to design the folding pathway of complex structures. It is likely that it will also be possible to design modular topofold proteins that are able to fold in vivo. The next challenge for this line of research is to investigate the limits of the structural complexity that can be achieved by this strategy, using both theoretical and experimental approach, and the possibilities of introducing functions, both similar to those of natural proteins as well as functions that are unique to topological folds.
| Footnote | 
| † Authors contributed equally. | 
| This journal is © The Royal Society of Chemistry 2016 |