Heather C.
Hayes
a,
Louis Y. P.
Luk
*ab and
Yu-Hsuan
Tsai
*ac
aSchool of Chemistry, Cardiff University, Cardiff, CF10 3AT, UK
bCardiff Catalysis Institute, School of Chemistry, Cardiff University, Main Building, Park Place, Cardiff, CF10 3AT. E-mail: LukLY@cardiff.ac.uk
cInstitute of Molecular Physiology, Shenzhen Bay Laboratory, Shenzhen 518132, China. E-mail: tsai.y-h@outlook.com
First published on 3rd April 2021
The cyclisation of polypeptides can play a crucial role in exerting biological functions, maintaining stability under harsh conditions and conferring proteolytic resistance, as demonstrated both in nature and in the laboratory. To date, various approaches have been reported for polypeptide cyclisation. These approaches range from the direct linkage of N- and C- termini to the connection of amino acid side chains, which can be applied both in reaction vessels and in living systems. In this review, we categorise the cyclisation approaches into chemical methods (e.g. direct backbone cyclisation, native chemical ligation, aldehyde-based ligations, bioorthogonal reactions, disulphide formation), enzymatic methods (e.g. subtiligase variants, sortases, asparaginyl endopeptidases, transglutaminases, non-ribosomal peptide synthetases) and protein tags (e.g. inteins, engineered protein domains for isopeptide bond formation). The features of each approach and the considerations for selecting an appropriate method of cyclisation are discussed.
In the human body and in many other animals, peptides are used as hormones for signal transduction (e.g. insulin), whereas proteins are indispensable for cellular structure and function. Indeed, peptides and proteins are closely associated with nearly all human diseases, and thus they have been utilised for disease prevention or treatment.1–4 In particular, peptides have become increasingly popular as therapeutics due to their high specificity, high activity and low toxicity.5 On a different note, proteins that catalyse reactions have been extensively employed in both research and industrial applications.6 These biocatalysts are attractive alternatives to traditional chemical catalysts owing, in particular, to their substrate specificity, catalytic efficiency and biocompatibility. Towards enhancing the biophysical properties of peptides and proteins, as well as expanding their scope of application as therapeutics and catalysts, cyclisation of peptides and proteins has become a burgeoning field of research.
In nature, many polypeptides are found to be cyclised, a feature that is often intrinsically associated with their biological function. Cyclisation can be categorised into four general classes: side chain-to-side chain, head-to-tail (also known as backbone cyclisation), head-to-side chain and side chain-to-tail (Fig. 1). Side chain-to-side chain cyclisation occurs when a bond is formed between the side chain functionalities of two amino acid residues (Fig. 1a). One prominent example is intramolecular disulphide bond formation between the thiol functionalities of two cysteine residues, leading to a type of cyclic structure, commonly found in peptides and proteins such as insulin and antibodies. It is estimated that about 50% of cysteine residues in polypeptides are found in the form of disulphide bonds.7 Other types of side chain cyclisation, including non-native linkages, are also possible and will be discussed throughout this review. Head-to-tail terminus cyclisation is another commonly observed form of cyclisation. As the first residue in a chain of amino acids has an amino functionality (i.e. N-terminus), and the last residue has a carboxylate functionality (i.e. C-terminus), polypeptides are typically directional. Subsequently, cyclisation can be achieved by joining the N- and C-termini through an amide bond (Fig. 1b). Head-to-tail peptide cyclisation has been observed in microorganisms and plants, such as kalata B1 from the plant Oldenlandia affinis and bacteriocin AS-48 produced by the bacterium Enterococcus faecalis.8,9 Furthermore, a recent report shows that head-to-tail cyclic peptides are prevalent in normal flora such as those in the human gut.10 Meanwhile, the formation of a lactam, lactone or thiolactone between either terminus with an appropriate side chain functional group (Fig. 1c and d) results in side chain-to-terminus cyclisation. For example, bacitracin is an antibiotic side chain-to-tail cyclic peptide produced by Bacillus subtilis, in which a bond is formed between a lysine side chain and the C-terminus.11
Fig. 1 Schematic illustration of cyclisation modes: (a) side chain-to-side chain, (b) head-to-tail, also known as backbone cyclisation, (c) tail-to-side chain, and (d) head-to-side chain. |
As potential drugs, both natural and synthetic peptides are increasingly researched due to their favourable characteristics.12 However, they often suffer from low oral bioavailability and metabolic instability. These shortcomings can often be addressed by cyclisation. Firstly, peptide cyclisation can lead to improved biological activity by enabling enhanced binding towards the target molecule. As a result of decreased conformational flexibility, the more rigid macrocycle has a reduced change in entropy upon binding to the target molecule compared to that of the linear peptide.13,14 Secondly, head-to-tail cyclic peptides have increased resistance to hydrolysis by exopeptidases due to the absence of the free termini. Thirdly, membrane permeability and cytoplasmic delivery of the molecule is enhanced, though the exact mechanism remains not entirely clear.15 Consequently, the favourable pharmacological properties possessed by cyclic peptides makes them attractive therapeutic candidates. Indeed, nine cyclic peptide drugs were approved for market between 2006 and 2015.12 While the majority of cyclic peptides in clinical use are currently derived from natural sources, the design of synthetic cyclic peptide drug candidates is becoming increasingly common, aided by advances in computational design and high-throughput screening.16
For proteins, cyclisation can be employed to improve stability so that enzymes can function outside of their native conditions, such as at elevated temperatures, in acidic or basic environments and in the presence of organic solvents or additives.15,17,18 According to polymer theory, the overall increase in the stability of a cyclised protein originates from the destabilisation of the unfolded state.19 Upon cyclisation, the number of conformational states accessible to the unfolded polypeptide is reduced. As a result, the entropy of the unfolded state is decreased and, consequently, the Gibbs free energy is increased. It should be noted that an offsetting enthalpic cost may occur if cyclisation introduces strain to the system, in which case, the desired stabilisation may not be achieved.20 Although the stabilisation of industrially relevant enzymes usually relies on techniques such as directed evolution and computational design,21 a number of proteins have been shown to benefit from cyclisation. These include β-lactamase, dihydrofolate reductase and luciferase, all of which exhibit improved activity at elevated temperatures compared to their linear forms.22–24
In this review, we first introduce different approaches for peptide and protein cyclisation. Broadly speaking, these approaches can be categorised as either chemical, enzymatic or protein tag methods. While each approach has its own strengths and limitations, the choice of the most appropriate approach largely depends on factors such as the desired application and means to produce the material. Additional considerations will be discussed in the latter part of the review.
Backbone cyclisation can be achieved by intramolecular NCL. This was first demonstrated by Zhang and Tam with peptides ranging from 5 to 16 residues. They observed a high yield of cyclic product with no oligomer formation from intermolecular reaction, even at millimolar concentrations.36 More recently, a series of cyclic peptides between 10 and 28 amino acid residues, were prepared using a microfluidic NCL procedure.37 This enabled acceleration of the NCL reaction, with fast ligation observed even with less reactive C-terminal thioesters, such as those on Val, Ile or Pro. The microfluidic NCL strategy was also successfully employed for the preparation of an 18-residue cyclic peptide RTD-1, which displayed antibacterial activity against E. coli and S. aureus.
Over the years, a variety of other extensions to the NCL methodology have been devised, expanding the scope of application.35 For example, post-ligation desulphurisation using a free-radical and metal-free reduction method can convert cysteine into alanine.38 The removal of the cysteine and replacement by a much more abundant amino acid is advantageous, as the desired polypeptide of interest can be generated with no trace of ligation or mutation. The development of other thiol-containing unnatural amino acids has further increased the number of sites possible for the ligation reaction to take place.39 Another important extension to the NCL method includes the use of selenium in place of sulphur to accelerate the rate of the ligation reaction and minimise side reactions such as thioester hydrolysis.34,35,40
A C-terminal aldehyde functionality is also employed in a recently developed cyclisation, known as CyClick (Fig. 3).48 Cyclisation proceeds with the reaction of the C-terminal aldehyde and the N-terminal amine, forming a cyclic peptide with an imine intermediate. The imine group is subsequently attacked by the adjacent amide nitrogen atom to form an imidazolidinone. The final bicyclic product is thermodynamically stable, driving the reaction toward intramolecular cyclisation. This reaction could be performed at concentrations up to 100 mM without significant production of dimer or oligomer side products from intermolecular reactions. It is noteworthy that the ε-amine of lysine does not interfere with CyClick. Though being efficient, highly chemoselective and stereoselective, CyClick has only been applied to synthetic peptides due to the need for a C-terminal aldehyde functionality.
Fig. 3 Peptide cyclisation by CyClick involving the N-terminal amine and the C-terminal aldehyde (5 ≤ n ≤ 9, m = n − 4). |
Theoretically, any bioorthogonal reaction can be used for peptide and protein cyclisation, although site-specific introduction of the required bioorthogonal functionalities is the prerequisite. For instance, Staudinger ligation is based on the reduction of an azide into an amine by a phosphine. For traceless backbone cyclisation, a bifunctional phosphinothiol reagent was developed (Fig. 4a).50 The ligation begins with a transthioesterification reaction between the peptide thioester and phosphinothiol, followed by reaction with the other peptide fragment bearing an N-terminal azide. The resulting iminophosphorane intermediate is then transformed into an amidophosphonium salt through a cyclic tetrahedral intermediate. Hydrolysis of the amidophosphonium salt produces a native amide bond between the peptide fragments. While the reaction has been successfully applied for cyclising synthetic peptides,51 the application of the Staudinger ligation in aqueous solutions is limited due to the laborious preparation of water-soluble phosphinothiols.52 The widespread application of the Staudinger ligation has further been hindered by the requirement for substrates with glycine residues at the ligation site. Due to increased steric strain on the tetrahedral intermediate in the presence of bulkier residues, a covalent bond between the oxygen and the oxophilic phosphorous atom is favoured over thiol displacement. However, by increasing the electron density on the phosphorous atom of the phosphinothiol reagent, P–O bond formation is discouraged and an improved yield can be achieved for non-glycyl Staudinger ligation reactions.53
Fig. 4 Selected bioorthogonal reactions for peptide/protein cyclisation: (a) traceless Staudinger ligation, (b) type II KAHA ligation, (c) TAMM condensation, and (d) CuAAC. |
In a second example, an amide bond can be formed by α-ketoacid-hydroxylamine (KAHA) ligation.54 This chemoselective ligation takes place between an N-terminal hydroxylamine and a C-terminal α-ketoacid. Depending on the nature of the hydroxylamine substituent, there are two main mechanisms. Type I KAHA ligation uses a free hydroxylamine, while type II KAHA involves the use of an O-substituted hydroxylamine. Though demonstrated to be a feasible method for the cyclisation of medium length peptides, type I KAHA is rarely used, mainly due to the instability of the free hydroxylamine in aqueous media.55 For the type II KAHA ligation, a water stable O-substituted hydroxylamine, most commonly 5-oxaproline, can be easily prepared and incorporated using solid-phase peptide synthesis. After cleavage from the resin, the α-ketoacid and 5-oxaproline cyclise directly, generating a depsipeptide intermediate, which under basic conditions, undergoes O–N acyl shift to give the desired cyclic product, ligated by an amide bond (Fig. 4b).56 A homoserine residue is formed at the ligation site, however, by replacement of the N-terminal oxaproline with an oxazetidine functionality, a serine residue is instead produced upon KAHA ligation.57 Type II KAHA ligation has been used to cyclise a variety of short peptides.58 Furthermore, this approach was shown to be compatible with larger and more challenging substrates, as demonstrated by the chemical synthesis of the cyclic antibacterial protein AS-28, which has 70 amino acid residues.59
Recently, we reported a novel bioorthogonal reaction involving 1,2-aminothiol and 2-((alkylthio)(aryl)methylene)malononitrile (TAMM) (Fig. 4c).60 The TAMM functionality is stable over a range of pH values and temperatures. Though an unnatural moiety remains in the product, this reaction is fast (k ∼10 M−1 s−1) and specific with no cross reaction with internal cysteine or other nucleophilic residues observed. Using this reaction, cyclisation of proteins on bacteriophages was demonstrated with no reduction in phage infectivity observed.
The bioorthogonal reactions mentioned thus far all require non-native functionalities, which can be readily introduced using solid-phase peptide synthesis or in some instances by recombinant approaches.49,61 For example, the technique of genetic code expansion enables site-specific incorporation of unnatural (non-canonical) amino acids that contain a bioorthogonal functionality.62,63 To do this, a blank codon (usually the amber stop codon, UAG) and an orthogonal aminoacyl-tRNA synthetase/tRNA pair are required. The orthogonal synthetase does not recognise any endogenous tRNA or canonical amino acids as its substrate, and the orthogonal tRNA is not a substrate of any endogenous synthetases. The orthogonal synthetase specifically loads the orthogonal tRNA with the designated unnatural amino acid. This aminoacylated tRNA recognises the blank codon on the mRNA and directs the site-specific incorporation of the unnatural amino acid into the target protein.62,63
Chin and co-workers have demonstrated the use of copper-catalysed azide–alkyne cycloaddition (CuAAC) (Fig. 4d) for protein cyclisation through genetically incorporated azide- and alkyne-containing amino acids.64 By definition, “click”-type reactions, such as CuAAC, are simple, rapid, high yielding, stereospecific and wide in scope.65 As such, CuAAC has been used extensively for peptide cyclisation,66 which can be carried out under mild conditions in a variety of solvents, including water.67,68 The 1,4-disubstituted triazole formed at the ligation site, although not a natural functionality of polypeptides, is known to effectively mimic the topology and electronic properties of native trans-amide bonds.69 Theoretically, CuAAC can be employed for peptide and protein cyclisation in live cells through the use of appropriate ligands that can increase the reaction rate and reduce catalyst toxicity.70,71
In place of a terminal alkyne, a strained alkyne can react with an azide in the absence of a catalyst under physiological conditions.72 However, this strain promoted azide–alkyne cycloaddition (SPAAC) lacks the regiospecificity of CuAAC, and forms a mixture of 1,4-disubstituted products. Furthermore, the synthesis of strained alkynes is more laborious.73 Nevertheless, a peptide cyclised using SPAAC exhibited improved proteolytic stability and binding affinity compared to its linear peptide counterpart.74
Alternatively, through the use of a ruthenium(II) catalyst, the 1,5-disubstituted regioisomer is generated upon azide–alkyne cycloaddition (RuAAC).75,76 The 1,5-disubstituted triazole mimics a cis-amide bond and is beneficial to cyclisation of peptides not only as the cyclisation machinery, but also when introduced into the peptide chain before cyclisation, it acts as a turn inducer to bring the linear termini into close proximity for ligation.77 In addition, ruthenium catalysts can also be used for ring-closing metathesis, which was successfully utilised for the stabilisation of α-helices through peptide stapling.78,79 For more detailed discussions on metal-catalysed polypeptide cyclisation we direct the interested readers to references.80–82
Orthogonal cysteine protecting groups can be employed to facilitate regioselective disulphide bond formation in cysteine-rich peptides.84 Alternatively, to overcome the high number of possible disulphide bond patterns, orthogonal disulphide pairing has been developed. This concept relies on a unique sequence of cysteine residues or unnatural side chain functionalities to reduce the number of possible disulphide bond patterns. For example, the two cysteine residues in a CXC motif (X = any amino acid) do not form a disulphide bond with each other. Instead, formation of two disulphide bonds between two CXC motifs is preferred (Fig. 5b).85 Moreover, if penicillamine (Pen) is present, formation of a mixed disulphide bond is thermodynamically preferred over disulphide bond formation between two cysteine residues or two penicillamine residues (Fig. 5c).86 However, no disulphide bond is formed in Pen–X–C motifs, just like CXC motifs. By combining these two orthogonal disulphide pairing strategies, it was demonstrated that for a series of peptide sequences containing six thiol functional groups, the number of isomers obtained was as low as four, and in some cases, only one specific isomer was isolated.87 Similarly, a selenocysteine residue or an unnatural dithiol amino acid (Dtaa) can also be used to moderate the number of possible disulphide patterns (Fig. 5d).88,89
While orthogonal disulphide pairing is effective in minimising the number of isomers generated, their use is generally limited to synthetic materials due to the requirement of unnatural amino acids such as penicillamine. Furthermore, disulphide bonds are not stable under reductive environments (e.g. cytosol) and can be disrupted by thiol-containing molecules through a thiol-disulphide exchange reaction (Fig. 5e).90
Disulphide stapling reagents have addressed the stability issue. A stapling reagent normally contains two electrophiles, such as arsenous acid derivatives,91 dibromopyridazinediones,92 disubstituted maleimides,93,94 perfluoroaryl deriviatives,95 among others (Fig. 5f).96,97 These molecules can be subjected to two nucleophilic additions or substitutions by two thiol groups. Thus, after a disulphide bond is reduced, the addition of a disulphide stapling reagent re-bridges the side chain of two cysteine residues as more stable thioether linkages.
Fig. 6 Enzymatic methods for backbone cyclisation (in some cases recognition sequences may vary): (a) subtiligase, (b) sortase, (c) butelase 1, and (d) OaAEP1. |
With regard to peptide cyclisation, both the length and concentration of the peptide have been demonstrated to affect the preference of SrtA for backbone cyclisation versus oligomerisation.106 It was reported that substrates of a minimum length of 19 residues (including the LPXTG motif) are required for cyclisation to be favoured over intermolecular reactions (i.e. formation of dimers and trimers in linear or cyclic forms). Increase of peptide concentration (>1 mM) was unsurprisingly accompanied with an increase of intermolecular di- and trimerisation. SrtA has also been used to cyclise larger recombinant proteins, including various cytokines, green fluorescence protein (GFP) and ubiquitin C-terminal hydrolase L3.107,108
While SrtA and its variants are valuable tools for peptide and protein modifications, achieving high ligation yields often requires the use of excess amounts of nucleophilic peptide due to the reversible nature of the enzyme. A variety of methods have been developed to circumvent this problem by the removal of small glycyl leaving groups using dialysis,109 use of unnatural starting material to enable quenching of the glycyl leaving groups,110 or a flow-based system.111 Furthermore, recent advances in protein engineering have yielded SrtA variants with >100 fold increase in activity,112,113 as well as Ca2+-independent variants for cellular applications.114
Butelase 1 shows a strong preference for the catalysis of transpeptidation over the hydrolysis of Asx-containing substrates.116 Its efficiency enables low enzyme loading (<0.01 eq.), and its broad sequence promiscuity facilitates transpeptidation in a nearly traceless manner. In the butelase 1 reactions, the NHV preferred recognition sequence, is added to the C-terminal of the substrate (Fig. 6c). Meanwhile, for the nucleophilic peptide, the N-terminal sequence should start with either GX or XL (where X is any amino acid, including D-amino acids).117 Butelase 1 has been employed for backbone cyclisation of different peptides including the cyclotide kalata B1, sunflower trypsin inhibitor (SFTI), conotoxin MrIA, insect antimicrobial peptide thanatin, antimicrobial peptide histatin-3 and θ-defensin,116,117 as well as proteins including GFP, interleukin-1 receptor antagonist and somatropin.118
Similar to butelase 1, OaAEP1 exhibits broad sequence promiscuity. In particular, the variant OaAEP1-C247A, in which the peptide-binding domain is modified, was reported to be kinetically superior to that of the wild-type enzyme.119 While NGL is its native recognition sequence (Fig. 6d), the Gly residue can be replaced with most other amino acids (except Pro), and the Leu residue can be replaced with other bulky amino acids (e.g. Phe, Ile, Met, Trp).104 Moreover, OaAEP1 can be easily produced in E. coli without lengthy activation or purification steps, unlike many other asparaginyl endopeptidases.104 A variety of molecules have been cyclised by OaAEP1,104,119–121 including an intrinsically disordered protein, MSP2,121 which butelase 1 failed to cyclise.118
In addition to butelase 1 and OaAEP1, new asparaginyl endopeptidases are continuously being discovered and employed for peptide and protein cyclisation, some of these include HaAEP1,122MCoAEP2123 and VyPAL1-3.124 These advances not only provide more ligation tools but also enhance our understanding of the molecular mechanism of the enzymes, laying the foundation for engineering variants with versatile functions. Theoretically, asparaginyl endopeptidases can be employed in living systems, such as for the modification of cell surface proteins, although this has only been demonstrated with butelase 1.125
Despite having been demonstrated as a useful peptide cyclisation tool, with benefits including its tolerance to a range of temperatures and pH values, irreversible ligation, and commercial availability, there are few examples in the literature of microbial transglutaminase-catalysed cyclisation due to low sequence specificity when the polypeptide of interest contains multiple lysine and glutamine residues.128
(1) Adenylation (A) domain activates a specific amino acid by transesterification with ATP to generate the corresponding aminoacyl-adenylate.
(2) Thiolation (T) domain (also known as the peptidyl carrier protein) tethers the activated substrate to the enzyme through the formation of a thioester linkage.
(3) Condensation (C) domain catalyses the formation of a peptide bond between the activated acyl group and the free amino group of an amino acid on the neighbouring module.
Fig. 7 Non-ribosomal peptide synthesis by NRPS modules composed of adenylation (A), thiolation (T) and condensation (C) domains with a terminal thioesterase (TE) domain. |
In this way, the peptide chain grows in the N-to-C-terminal direction, until it is released by a thioesterase (TE) domain through either hydrolysis, oligomerisation or cyclisation, catalysed by the active site Ser-His-Asp catalytic triad.132,133 However, the large size (often 100 to 300 kDa) and complex multidomain structures of NRPS make their heterologous expression challenging, and has therefore lead to the development of alternative methods of production.134 For example, cell-free protein synthesis was employed for the in vitro reconstitution of the non-ribosomal cyclic peptide valinomycin gene cluster. As a result, valinomycin was produced in a yield of ∼30 mg L−1, comparable to that of native Streptomyces organisms.135
In addition, TE domains can also function as isolated enzymes. For example, cyclic tyrocidine A was generated from its linear precursor by the TE domain of tyrocidine synthetase.136,137 Synthetic peptide substrates are activated by the attachment of N-acetylcysteamine (SNAC) to the C-terminus. In this way, the natural tethering of the peptide chain, through the cofactor 4′-phosphopantetheine, is imitated. Furthermore, NRPS can be merged with SPPS for cyclising peptides immobilised on solid supports.138
The ability of NRPS to incorporate unnatural and D-amino acids, as well as carry out modifications such as epimerisation, methylation and reduction, results in large structural diversity of peptide products. Although reprogramming of these complex enzymes has so far yielded mixed results,139 the potential for the manipulation of NRPS remains great, especially with advances in the understanding of NRPS. For some dedicated recent reviews on this subject please refer to references.140,141
Method | Unnatural functionalitya | Cyclisation mode | Applicable substrates | Applicable to recombinant materials | Use in living systems |
---|---|---|---|---|---|
a Unnatural functionality refers to presence of any moiety that cannot be produced from 20 canonical amino acids. b See section 4.1 for the recombinant extension to native chemical ligation (i.e. expressed protein ligation) which can be used in living systems. | |||||
Direct coupling | Often required in the starting material | Backbone | Peptides | X | X |
Native chemical ligation | Incorporation of thioester motif | Backbone | Peptides, proteins | ✓ | Xb |
Ser/Thr ligation | Required | Backbone | Peptides | X | X |
CyClick | Required | Backbone | Peptides | X | X |
Traceless Staudinger ligation | Required in the starting materials | Backbone | Peptides | X | X |
KAHA (type I and II) | Required | Backbone | Peptides | X | X |
TAMM | Required | Side chain | Peptides, proteins | ✓ | ✓ |
Cu-catalysed azide–alkyne cycloaddition (CuAAC) | Required | Side chain | Peptides, proteins | ✓ | X |
Strain promoted azide–alkyne cycloaddition (SPAAC) | Required | Side chain | Peptides, proteins | X | ✓ |
Cysteine disulphides | Not needed | Side chain | Peptides, proteins | ✓ | ✓ |
Orthogonal disulphide pairing | Required | Side chain | Peptides | X | X |
Disulphide stapling | Present in the products | Side chain | Peptides, proteins | ✓ | X |
Method | Cyclisation mode | Reaction motifs | Extra residues on the cyclic product | Use in living systems |
---|---|---|---|---|
a Subtiligase variants have very broad substrate scope but the identity of substrate residues in positions P1–P4 and P1′–P2′ influences the ligation efficiency. Generally, hydrophobic residues are preferred. b OaAEP1 can recognise a wide range of substrates, although its native substrates have the sequence NGL + GL. | ||||
Subtiligase variants | Backbone | Peptide ester/thioester + XXa | 0 | X |
Sortases | Backbone | LPXTG + GG | 6 (LPXTGG) | ✓ |
Asparaginyl endopeptidase – butelase 1 | Backbone | N/D-HV + G/L-X | 3 (NXX) | ✓ |
Asparaginyl endopeptidase – OaAEP1 | Backbone | NXX + X-F/I/L/M/Wb | 3 (NXX) | To be demonstrated |
Microbial transglutaminase (S. mobaraensis) | Side chain | K + Q | 0 or 2 (Lys and Gln) | ✓ |
Protein tag – intein | Backbone | Split N- and C-inteins | 0 or 1 (Cys) | ✓ |
Protein tag – SpyTag/SpyCatcher | Side chain | Formation of an isopeptide bond between SpyTag/SpyCatcher | 129 | ✓ |
Protein tag – SpyLigase | Side chain | SpyLigase catalyses isopeptide bond formation between SpyTag/KTag | 23 | To be demonstrated |
Protein tag – SnoopTag/SnoopCatcher | Side chain | Formation of an isopeptide bond between SnoopTag/SnoopCatcher | 124 | To be demonstrated |
Protein tag – SnoopLigase | Side chain | SnoopLigase catalyses isopeptide bond formation between SnoopTagJr/DogTag | 35 | To be demonstrated |
Mechanistically, the splicing begins with N–S or N–O acyl shift leading to the formation of a (thio)ester intermediate, followed by trans(thio)esterification between N- and C-exteins resulting in a branched intermediate. Next, intein excision proceeds through asparagine (or sometimes glutamine) cyclisation before S–N or O–N acyl shift take place to form the peptide bond between the exteins (Fig. 8a).
Fig. 8 (a) Mechanism of intein splicing (X = O or S). (b) Cyclisation by expressed protein ligation. (c) Intein-mediated backbone cyclisation. |
Inteins can be used to generate recombinant peptidyl fragments with a C-terminal thioester (Fig. 8b), which is required in native chemical ligation. This concept, known as expressed protein ligation (EPL), has greatly expanded the scope of native chemical ligation and has also been applied to cyclise large recombinantly produced proteins (e.g. β-lactamase).22
While the majority of inteins exist in a single contiguous form, such as those employed for EPL, some inteins naturally exist as two separate polypeptide chains. These split inteins undergo a trans-splicing mechanism, in which the N- and C-intein fragments first associate to form the active complex in the correct conformation before the splicing takes place (Fig. 8c).143 A prominent example is found in the DnaE gene of cyanobacterium Nostoc punctiforme PCC73102. This split intein is known as Npu DnaE. It exhibits fast splicing kinetics (k = 3.7 × 10−2 s−1)144 and good tolerance towards extein sequence variations,145 compared to another commonly used DnaE split intein from Synechocystis sp. PCC6803 (Ssp DnaE) which is more sensitive to variation in the extein sequence around the splice junction.144,146 By correctly positioning the two fragments of a split intein at the two ends of a peptide, a cyclic peptide is generated upon intein splicing. This strategy, commonly referred to as split-intein circular ligation of peptides and proteins (SICLOPPS), has been employed for backbone cyclisation of peptides and proteins in E. coli, yeast and mammalian cells.23,147,148
Overall, inteins are useful tools for backbone cyclisation that can be achieved in a (nearly) traceless manner, whereby only a single Cys/Ser residue remains at the ligation site after intein splicing takes place. With increased understanding and their wide spread occurrence in nature, new inteins have been engineered with improved properties, although limitations still remain.149 Most commonly, the introduction of an intein can lead to protein misfolding. The relatively large size of the intein can also lead to reduced yields from recombinant expression.150–154 Lastly, gene expression of the Npu DnaE split intein has been reported to be toxic to E. coli, an issue that was circumvented by the incorporation of a degradation tag to remove any spliced intein fragments or unspliced starting material.155
Similarly, another isopeptide bond forming pair, known as SnoopTag (12 residues) and SnoopCatcher (112 residues), was developed by splitting the D4 Ig-like domain of adhesin RrgA from Streptococcus pneumonia.159 In this case, the isopeptide bond formation occurs between Asn and Lys residues and, importantly, it is orthogonal to the analogous SpyTag/SpyCatcher reaction. SnoopTag/SnoopCatcher was also employed to cyclise firefly luciferase.24 It was observed to confer improved stability to the cyclised product compared to the linear control, albeit enhancement was not as great as that observed with the SpyTag/SpyCatcher system.
The spontaneous formation of an isopeptide bond proves that SpyTag/SpyCatcher and SnoopTag/SnoopCatcher methods are practical and useful alternatives for protein cyclisation by genetically fusing the two components to the N- and C-termini of the target protein. Nevertheless, this approach leaves a large “scar” with >100 amino acids remaining in the cyclised products after ligation. This has been addressed by the development of a catalytic system composed of three parts, SpyTag, KTag and SpyLigase. To do this the 116-residue SpyCatcher was split into a 10-residue KTag and a 98-residue SpyLigase (the missing residues were omitted during the restructuring of the SpyCatcher fragment).160 In this system, shorter peptidyl fragments SpyTag and KTag, containing the reactive Asp and Lys residues, respectively, are incorporated into the substrate(s) of interest and subsequently crosslinked by the addition of SpyLigase (Fig. 9c). Similarly, SnoopTag/SnoopCatcher was redesigned into a 12-residue SnoopTagJr, a 23-residue DogTag and a 104-residue SnoopLigase.161 While the use of SpyLigase or SnoopLigase significantly reduces the number of amino acid residues left on the cyclic product, addition of >20 amino acids residues is still required by these means. Nevertheless, these approaches are all theoretically applicable in living systems as demonstrated with SpyTag/SpyCatcher in various examples.162
Cyclisation approach | Advantages | Factors to be considered | |
---|---|---|---|
Chemical | Disulphide bond formation | • Cysteine residues easily introduced chemically or recombinantly (easily applied to smaller peptides and larger proteins) | • Not stable under reducing conditions (i.e. intracellular environment); could be addressed through the use of disulphide stapling reagents |
• Disulphide formation occurs readily under mild aqueous conditions | • Correct disulphide bond formation can be difficult to control and may lead to a mixture of products (addressed by orthogonal disulphide pairing but requires unnatural functionalities) | ||
Direct cyclisation | • Activation of the terminal carboxylic acid group allows reaction to proceed under mild conditions | • Nucleophilic and carboxylate amino acid side chains require protection to prevent side reactions | |
• Limited to synthetic peptides | |||
CyClick | • Efficient, chemo- and stereoselective | • Requires an unnatural functionality in the starting material (C-terminal aldehyde); introduction can be achieved synthetically or through chemical modification (e.g. sodium periodate oxidation of Ser/Thr) | |
• Can be performed at high concentrations without the formation of side products from intermolecular reactions | |||
Native chemical ligation | • Reaction proceeds in aqueous conditions at neutral pH | • A C-terminal thioester is required which is introduced synthetically (addressed by expressed protein ligation) | |
• Presence of chaotropic agents and reducing agents are tolerated and in some cases preferable | • Some NCL extensions involve the use of an unnatural thiol- or selenol-containing N-terminal amino acid in place of cysteine | ||
• Regio- and chemo-selective reaction | |||
• A number of extensions have been developed for broader application (e.g. desulphurisation) | |||
Staudinger ligation | • Traceless cyclisation | • Requires unnatural functionalities | |
• Chemoselective towards the azide (protecting groups not required) | • Phosphinothiols only have limited solubility in aqueous solution | ||
• Glycine residues required at the ligation site | |||
α-Ketoacid-hydroxylamine ligation (Type II KAHA) | • Cyclises both longer and shorter polypeptides | • Requires unnatural functionalities in the starting material | |
• O-Substituted hydroxylamine is water stable | • A homoserine residue is formed at the ligation site (although oxazetidine can be used in place of oxaproline to form a serine residue) | ||
• Chemoselective | • Reaction is relatively slow | ||
• Forms a native peptide bond | |||
Cu-Catalysed azide–alkyne cycloaddition | • Efficient and regioselective | • Requires unnatural functionalities (azide and alkyne) | |
• Requires only mild conditions and can be carried out in water | • Cu(I) catalyst must be generated in situ from Cu(II) by the use of excess reducing agent and Cu-stabilising ligands | ||
• Cu is toxic to cells and so is not suitable for use in vivo | |||
Strain promoted azide–alkyne cycloaddition | • Circumvents the requirement for a copper catalyst | • Lacks regiospecificity (forms a mixture of 1,4-disubstituted products) | |
• Rapid reaction under physiological conditions | • Cyclooctyne reagents are relatively costly | ||
Enzyme | Subtiligase (and variants) | • Very broad substrate scope | • Promiscuity may result in off-target modificationsx |
• Effectively traceless | • Effectively traceless | ||
Sortase A | • Well-studied and understood enzyme | • Ligation is reversible as the LPXTG recognition sequence remains in the product | |
• Mutants have been developed for improved activity | • Relatively low catalytic efficiency | ||
• Commercially available | • Ca2+ dependence, limited use in vivo | ||
Asparaginyl endopeptidase | • Exhibit high catalytic efficiency and requires a low enzyme-to-substrate ratio | • Short recognition sequence may lead to off-target modifications in larger proteins | |
• Relatively broad substrate scope and short recognition sequence | • Recognition sequence remains in the product leading to reversibility of the reaction | ||
• Nearly traceless cyclisation (only Asx remains) | |||
Microbial transglutaminase (S. mobaraensis) | • Calcium independent (in contrast to mammalian transglutaminases) | • Low substrate specificity | |
• Resulting isopeptide bond is chemically and proteolytically stable | • Preference for glutamine residues is unclear | ||
• Tolerant to a broad range of reaction conditions | |||
• Commercially available | |||
Non-ribosomal peptide synthase | • Able to incorporate unnatural and D-amino acids, and can carry out modifications such as epimerisation, methylation and reduction | • So far reprogramming has yielded mixed results | |
• TE domains can function as isolated enzymes | • Large size makes heterologous expression challenging | ||
• A deeper understanding of these complex systems required before their potential can be realised | |||
Protein Tag | Intein (Expressed protein ligation and split intein mediated splicing) | • Overcomes size limitation of native chemical ligation | • Can result in low protein yields after recombinant expression |
• Almost traceless cyclisation (only Cys/Ser remains) | • Can lead to misfolding of the protein of interest | ||
• Does not require separate expression and purification | |||
SpyTag/SpyCatcher (and SnoopTag/SnoopCatcher) | • Wide range of reaction conditions | • Leaves a large scar at the ligation site (addressed by the development of Spy- and SnoopLigase) | |
• High yielding and fast | |||
• Does not require separate expression and purification |
On the other hand, for peptide substrates without a specific three-dimensional conformation, factors such as ring size (i.e. length of linear precursor) and peptide sequence are known to affect cyclisation efficiency, as well as the reagents and conditions employed for cyclisation.167 Before cyclisation can take place, the reactive termini of the linear precursor must come into close proximity with one another. As such, cyclisation is favoured over intermolecular reaction. Various strategies have been developed to facilitate ring closure by pre-organisation of the linear peptide into a conformation predisposed to promote cyclisation.28 Generally, these involve the use of templates and modifications to the peptide sequence to increase flexibility or the introduction of turn-inducers (e.g. proline).168
Chemical cyclisation approaches often require non-native functionalities (see Table 1) or take place under non-physiological conditions (e.g. in organic solvent). As chemical synthesis is often favoured for the incorporation of non-native functionalities into specific positions within the polypeptide chain, it is perhaps the more convenient preparation method when using chemical cyclisation approaches. While it is also possible to introduce non-native functionalities by recombinant means such as genetic code expansion, there are more limitations in terms of amino acid substrates that can be introduced (see section 2.4). In addition, peptides can be synthesised in fully protected forms as required for some chemical ligation methods such as direct amide bond formation. On-resin cyclisation can also be carried out, which can be advantageous to solution phase approaches which often require high dilutions to minimise intermolecular reactions which generate dimer and oligomer side products. High dilution can also lead to long reaction times which in turn lead to epimerisation at the ligation site.170 Using the on-resin approach, the antibacterial peptides polymyxin B2, E2 and a derivative were cyclised. The peptides were anchored to the resin by the amine group of a lysine side chain.171 Upon removal of C- and N-terminal protecting groups, the peptides cyclised efficiently in good yields and exhibited antibacterial activities comparable to that of natural polymyxins. Alternatively, peptides can be anchored to the resin by the C-terminal carboxyl group through the use of safety-catch linkers.172,173 For example, the cyclisation of brachystemin A was carried out successfully using this approach.174
Preparation of enzymes is often essential when they are used as catalysts for cyclisation. However, this process can be time-consuming and labour-intensive, unless the enzyme is commercially available (e.g. sortase). Generally, enzymatic cyclisation can be applied to material produced by any means, although care should be taken with larger proteins which may contain multiple recognition sequences and thus result in off-target modifications causing degradation of the linear starting material and/or cyclised product.121
Protein tag cyclisation approaches such as intein and isopeptide bond forming Tag/Catcher partners are generally composed of sequences of ∼100 amino acid residues. It is therefore convenient to recombinantly express these long protein tag sequences alongside that of the polypeptide sequence of interest, as opposed to chemical synthesis followed by ligation to generate the required sequence. Moreover, it should be noted that while peptides are usually produced in a low yield when using recombinant production methods,175 the addition of the large flanking protein tag sequences required for cyclisation can facilitate their preparation by cellular machinery. When using intein-mediated cyclisation, these long sequences are excised during the cyclisation process and so do not remain in the product.155
With regard to proteolytic stability, it may be useful to consider the mode of cyclisation, i.e. head-to-tail versus side chain cyclisation. Exopeptidases are enzymes that hydrolyse the terminal amide bonds of polypeptide chains. Thus, side chain cyclisation approaches which leave the termini free, for example disulphide bond formation, may leave the polypeptide susceptible to proteolytic degradation. For this reason, a head-to-tail cyclisation approach should result in improved resistance to proteolysis. For example β-lactamase, cyclised using an intein-mediated ligation approach, was shown to be resistance to treatment by carboxypeptidase Y, which hydrolyses C-terminal amide bonds.22 On a related note, proteolytic treatment can be used to test if cyclisation of a polypeptide has taken place, whereby digestion would only be observed if the linear form is present.23
In phage display, bacteriophages that have been genetically modified to display unique peptides on the surface of their coat proteins, are screened for desired activity (e.g. selection by binding to a target molecule).178 By sequencing the phage DNA, the identity of the target-binding peptide can be determined. Phage display is a well-established and effective technique, and cyclic peptides can be generated by intramolecular disulphide bond formation between cysteine residues located either side of a randomised amino acid sequence (i.e. CXnC, where Xn is any number of any amino acids).179–181 However, cyclic peptides produced by this method are susceptible to reduction (i.e. linearisation) under reducing conditions. Alternatively, cyclic peptides can be generated using bioorthogonal reactions. For example, TAMM condensation was used to generate cyclic peptide library on bacteriophages, providing potent cyclic peptide binders to Bcl-2, Mdm2 and Keap1.60
Using SICLOPPS, cyclic peptide libraries can be prepared intracellularly.23 Here, a peptide library is created by randomisation of the extein sequence encoded by degenerate oligonucleotides.182 Upon excision of the split intein sequences, the termini of the extein sequence are ligated and the cyclic peptide generated via the formation of a native peptide bond. Cyclic peptide libraries have been generated using a variety of host cells, including E. coli, yeast and mammalian cells.147,183,184 This intracellular generation of cyclic peptides is advantageous, as it enables the use of cell-based screening against intracellular protein targets, as opposed to in vitro screening which does not always accurately reflect activity and function in vivo. Furthermore, it has been demonstrated that cyclic peptide libraries bearing non-canonical amino acids can be generated using this approach.185 However, like phage display peptide libraries, the maximum number of library members is limited by transformation efficiency of the host cells.182 In addition, the use of split inteins leads to certain extein sequence requirements and sometimes toxicity towards E. coli (see section 4.1).
A third strategy for cyclic peptide library generation is mRNA display. In this approach, the linear peptide is attached to its encoding mRNA sequence through a puromycin linker at the C-terminus.186,187 As such, highly efficient side chain-to-side chain or side chain-to-N-terminus cyclisation approaches are required.188 Disulphide bond formation between two cysteine residues is the most straightforward of these strategies. However, the use of bridging reagents and the incorporation of non-native amino acids189,190 can afford a wide range of possible cyclisation patterns, especially in the presence of multiple reactive residues.191 While non-specific cyclisation can make hit deconvolution difficult, increased library diversity and therefore investigation of a broader range of cyclic peptide scaffolds is advantageous.192 Recently, using an approach based on native chemical ligation, head-to-tail cyclisation of peptides compatible with mRNA display was reported, further broadening structural variety of mRNA display peptide libraries.193
This journal is © The Royal Society of Chemistry 2021 |