Holly E.
Morgan
,
W. Bruce
Turnbull
* and
Michael E.
Webb
*
School of Chemistry and Astbury Centre for Structural Molecular Biology, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK. E-mail: w.b.turnbull@leeds.ac.uk; m.e.webb@leeds.ac.uk
First published on 5th May 2022
Site-specific protein modification is a widely-used biochemical tool. However, there are many challenges associated with the development of protein modification techniques, in particular, achieving site-specificity, reaction efficiency and versatility. The engineering of peptide ligases and their substrates has been used to address these challenges. This review will focus on sortase, peptidyl asparaginyl ligases (PALs) and variants of subtilisin; detailing how their inherent specificity has been utilised for site-specific protein modification. The review will explore how the engineering of these enzymes and substrates has led to increased reaction efficiency mainly due to enhanced catalytic activity and reduction of reversibility. It will also describe how engineering peptide ligases to broaden their substrate scope is opening up new opportunities to expand the biochemical toolkit, particularly through the development of techniques to conjugate multiple substrates site-specifically onto a protein using orthogonal peptide ligases.
This review will explore the key examples of peptide ligases used for protein modification, focusing mainly on sortase, the leading enzyme in the field. The peptidyl asparaginyl ligases Butelase-1, OaAEP1, VyPAL2 and peptide-ligating variants of subtilisin will also be discussed (Scheme 2). The challenges associated with this approach to protein modification will be highlighted, and how engineering of peptide ligases and their substrates has been used to address these challenges. The three principal challenges in developing new methods are ensuring specificity, efficiency and versatility: that modification is site-specific and generates well-defined conjugates; that it is time and reagent efficient; and that it is versatile (Scheme 1). We will first discuss each class of enzyme from the perspective of engineering enhanced catalytic activity. The review will then focus on examples of substrate engineering that aim to reduce the reversibility of the ligation reaction, and thus drive conversion of substrates to products. Studies that have broadened substrate specificity will then be presented, before the final section of the review illustrates how these advances have created new opportunities in the field of protein modification; in particular, the use of orthogonal peptide ligases to conjugate multiple substrates site-specifically onto a protein.
Scheme 1 Summary of strategies used to optimise reactions of peptide ligase to enable complex protein modification reactions including both substrate and protein engineering described in this review. |
The activity of sortase has been extensively exploited to perform protein/peptide protein modification. This strategy requires purified SaSrtA, a donor substrate containing the C-terminal LPX1TGX2 recognition motif and an acceptor molecule with a sterically-unhindered (N-terminal glycine residue). While the recognition sequences for sortases are typically given in the literature in the form LPXTG and are used in this review for clarity, in general, the required recognition motif is LPX1TGX2 (Scheme 2C) where X2 is either a C-terminal amide or another amino residue; protein or peptides where the glycine nucleophile has a free carboxylic acid group are not substrates for sortases. In the authors’ experience, this additional requirement is frequently overlooked by those using sortases for the first time. For C-terminal protein modification, an LPXTG recognition motif is required at the C-terminus of the protein and the substrate to be ligated must contain an N-terminal glycine residue. The accessibility and flexibility of both the N- and C-terminal region impacts the efficiency of the reaction.23–25 One downside to C-terminal labelling is that the LPXTG sequence must be engineered into the protein. Applications of this method are also limited for modification of cell surface proteins which most commonly have intracellular C-terminal regions and extracellular N-terminal regions, and thus cannot be labelled via this method.26 Alternatively, N-terminal protein labelling involves ligation of a labelling substrate with a C-terminal LPXTG motif to a protein with an N-terminal glycine.25 It requires minimal engineering of the protein, only requiring a single N-terminal glycine in a sterically unhindered position. Many commercial expression plasmids have N-terminal protease recognition sequences that, when cleaved, result in a protein that already possesses an N-terminal glycine.27,28 There is also potential for internal labelling of a protein by introducing a flexible loop into the protein.23 Guimaraes et al. demonstrated a method where a loop, containing the LPXTG recognition motif followed by a specific protease cleavage site, was introduced between two cysteine residues which formed a disulfide bond in the protein. The flexibility of the loop was increased by nicking the loop with a protease, allowing the sortase-mediated reaction to occur as it would for a C-terminal labelling reaction. If the loop is flexible and accessible, proteolysis may not be required.
Over the years, sortase-mediated ligation has proven itself to be a key protein conjugation technique. It has been used for a variety of applications including protein–protein fusion,29,30 protein cyclisation,31–33 immobilisation of proteins onto artificial surfaces34,35 and introducing novel functionality, such as fluorescent tags,36 peptides,37 lipids38 and toxins39 into proteins site-specifically. However, it does possess some limitations and a significant amount of work has been carried out to increase the catalytic efficiency, eliminate the dependence on calcium ions, increase the rate of transpeptidation and reduce the rate of hydrolysis and reaction reversal. Many of these challenges have been addressed through enzyme engineering.
Fig. 1 Exemplar yeast and phage constructs used for directed evolution of sortases. In both cases, sortases are encoded by phage or yeast cells and the activity of the encoded sortase is probed by addition of a biotinylated sortase substrate (e.g. Biotinyl-LPETGG) which enables isolation of phage of yeast encoding active sortases. (A) Aga1p–Aga2p strategy used by Chen et al. i to increase sortase activity.44 (B) M13 Phage strategy used by Piotukh et al to identify sortases with altered specificity.45 |
Sortase | Recognition sequence | Ref. | Notes |
---|---|---|---|
Wild-type sortases | |||
SaSrtA (Staphylococcus aureus) | LPXTG | Ton-That et al.40 | Anchors protein to the cell wall in vivo, Poor kinetics in vitro Calcium dependent |
SrtB58 (Bacilli, Listeria and S. aureus) | NPQTN | Mazmanian et al.58 | Found in the iron-responsive determinant locus (involved in iron acquisition, important in bacterial pathogenesis). Anchors IsdC to the cell surface |
SrtC59 (Actinomyces, Corynebacteria, Enterococci and Streptococci) | QVPTG | McCafferty & Melvin59 | Polymerisation of pilin proteins |
SrtD60 (sporulating Gram-positive bacteria) | LPNTA | Marrafini & Schneewind60 | Responsible for targeting BasH and BasI in sporulating bacilli |
SpSrtA21 | LPXTG/LPXTA | Race et al.21 | Calcium independent |
BaSrtA | LPXTG | Weiner et al.22 | Calcium independent |
SavSrtE | LAXTG/LPXTG | Das et al.13 | Calcium independent |
CdSrtA | LPLTG | McConnell et al.61 | Generates an isopeptide bonds to Lys in WxxxVxVYP motif in pilin |
Sortases with enhanced catalytic activity | |||
eSrtA (SrtA(5M))P94R/D160N/D165A/K190E/K196T | LPXTG, LPEXG (X = A, C, S) LAETG | Chen et al.44 | Evolved from SaSrtA |
Improved kinetics | |||
SrtA(5M/Y187L/E189R) SrtA(5M/D124G | LPXTG | Chen et al.46 | Evolved from SaSrtA and SrtA(5M) Improved reaction for N- and C-terminal labelling respectively |
E105K/E108A/Q mutant | LPXTG | Hirakawa et al.50 | Evolved from SaSrtA Calcium-independent |
SrtA(7M) P94R/E105K/E108Q/D160N/D165A/K190E/K196T | LPXTG | Wuethrich et al.52 | Evolved from SaSrtA Improved kinetics, calcium independent |
Sortases with altered specificity | |||
SrtLS SaSrtA β6/β7 loop exchanged for SaSrtB β6/β7 loop | NPQTN | Bentley et al.62 | Evolved from SaSrtA Only catalyses acylation, not transpeptidation |
F40-sortase T164Q/V168M/L169H/D170L/E171A/Q172E | XPKTG (X = A, D, S), APATG | Piotukh et al.45 | Evolved from SaSrtA |
F1-21 sortases V161Y/K162W/P163A/T164N/D165E/V166R/G167I/V168F/L169H/D170V/E171L | APXTG/FPXTG | Schmohl et al.63 | Evolved from SaSrtA |
eSrtA(2A-9) S102C/A104H/E105D/K138P/K152I/N160K/K162H/T164N/K173E/I182V/T196S | LAETG | Dorr et al.64 | Evolved from SrtA(5M) |
eSrtA(4S-9) N98D/S102C/A104V/A118T/F122A/K134R/F144L/I182V/E189F | LPEXG (X = A, C, S) | Dorr et al.64 | Evolved from SrtA(5M) |
SrtAβ I76L/S102C/E105D/N107E/S118I/I123L/D124L/N127H/G134R/K138L/G139D/M141I/K145T/K152R/M155I/R159C/K162R/Q172H/K73E/K177R/V182A/V189Y/T196S/R197S/K206R | LMVGG | Podracky et al.65 | Evolved from 4S-6 (LPESG-specific) |
SpSrtA M3 E189H/V206I/E215A | LPXTG, | Zou et al.66 | Recognises N-terminal GG, AA, SS and CC substrates Evolved from SpSrtA |
Sortases with increased stability | |||
SaSrtA rM4 P94S/D160N/D165A/K196T | LPXTG | Zou et al.67 | Evolved from SaSrtA higher activity than WT at ambient temperature but lower thermal stability, resistant to DMSO |
SaSrtA CyM6 P94S/D160N/D165A/K196T R159N and K162P Head to tail cyclisation | LPXTG | Zou et al.68 | Evolved from SaSrtA (through rM4) Improved thermostability and resistance to chemical denaturation |
Fig. 2 Location of mutations in sortase variants mapped onto the crystal structure of Sortase A. (A) Apo-crystal structure of WT sortase A determined by Zong et al. (1t2p)18 (B) structure of a LPETG peptide bound to Sortase A. (1t2w) (C) location of mutations observed in eSrtA(2A-9) shown in purple.64 Active site cysteine yellow. (D) Location of mutations observed in eSrtA(4S-9) shown in red.64 (E) Location of mutations observed in SrtA(5M)44 (orange) and SrtA(7M)52 (orange and blue). (F) Location of mutations found in SrtAβ (dark green).65 |
Further improvements in efficiency over eSrtA have been obtained using a FRET screening approach.46 In this case, as well as an error-prone PCR-based approach on the whole enzyme, site-saturated mutagenesis at a set of rationally selected sites on WTSrtA or eSrtA was employed. The libraries were screened using a sortase ligation-dependent FRET pair of eGFP-LPETG and GGG-cpVenus. In particular, the 5M/Y187L/E189R variant was found to be highly effective for C-terminal antibody modifications and the 5M/D124G variant was superior for N-terminal antibody modification (Table 1).
Another strategy for sortase evolution, SortEvolve, was reported by Zou et al.47 This approach, which uses a high-throughput screening platform in microtitre plate format, was validated by the same range of mutations. In this case, sortase mutants mediate fusion of the laccase CueO with a C-terminal LPETGGGRR tag to GGG-eGFP-LCI. The degree of ligation was then assayed by LCI-mediated immobilisation of the fusion product to polypropylene plates and assay of laccase concentration using 2,2′-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid (ABTS)). To validate this system, three site-saturated mutagenesis (SSM) SaSrtA libraries were generated at three positions (P94, D160, and D165). Each SSM-library was screened independently in one 96-well microtitre plate. The previously reported P94S, D160N, and D165A mutants were identified. Further recombinant Sa-SrtA variant P94T/D160L/D165Q was characterised with 22-fold improvement in catalytic efficiency compared with the wild-type protein.
More recently, Li et al.48 have investigated the behaviour of intermediate sortase variants in which only a subset of these mutations are included and highlighted that some of these appear to be optimal for a different range of applications. It was determined that each variant has advantages appropriate for specific applications when considering rate of reaction, extent of hydrolysis, purification restraints, temperature, and additives e.g., detergent requirements.
An alternative approach was reported by Hirakawa et al.50 who used a structure-guided alignment of SaSrtA with the calcium-independent enzymes SpSrtA and BaSrtA in order to develop SaSrtA variants with Ca2+-independent catalytic activity. This indicated that Glu105 and Glu108, are not conserved in SpSrtA or BaSrtA. In SpSrtA, Glu105 corresponds to Lys126 which forms a salt bridge with Asp196 (Glu171 in SaSrtA) which may stabilise the closed conformation of the β6/β7 loop instead of calcium ions. In SaSrtA, Glu105, Glu108 and Glu171 coordinate to Ca2+.51 Hirakawa therefore hypothesised that substitution of Glu108 with an uncharged amino acid, together with substitution of Glu105 with Lys, would moderate the negative charge concentrated in the calcium binding site and overcome the calcium dependency of SaSrtA. Consequently, both double mutants E105K/E108A and E105K/E108Q were shown to enhance protein ligation in the absence of calcium, without drastically affecting substrate specificity(see Fig. 2). Overall, however the calcium-independent activity of these proteins was ∼65% lower than the calcium-dependent activity of the WT SaSrtA.
The Ploegh group combined the eSrtA pentamutant with the second of these calcium-independent variants to create the heptamutant SrtA(7M).52 This has a 40-fold higher kcat/KM LPETG compared with the double mutant (E105K/E108A). Thus, as a result of these mutations, a catalytically efficient, calcium-independent sortase enzyme was evolved(see Fig. 2 and Table 1). Despite obvious advantages with the use of the pentamutant and heptamutant, these enzymes are not optimal for all applications as they are prone to higher levels of hydrolysis if not carefully monitored.51 Different variants are suitable for different applications, as made evident by Li et al.48 who have subsequently evaluated the use of SaSrtA variants 3M, 4M and 5M for a range of ligation reactions.
PALs, which are exclusively found in plants, are characterised by their ability to catalyse bond formation in near-neutral conditions. These enzymes are best exemplified by Butelase-1,76 and OaAEP179 whose endogenous activities are the production of cyclic peptides. PALs cleave the bond to form a thioester intermediate which is then attacked by an N-terminal nucleophilic acceptor (X1X2) to form a new peptide bond with the P1 residue (Scheme 2D). The specificity for the N-terminal substrate is often even looser than the C-terminal tripeptide recognition motif allowing a wide variety of sequences in the product peptide. Hemu et al.81 proposed that the difference in activity between AEPs and PALs is due to the amino-acid composition of the substrate binding grooves flanking the S1 pocket of the enzymes, particularly the ‘gatekeeper’ residue (termed the ligase-activity determinant 1 region, LAD1) and residues found in LAD2 that are centred around the S2 and pockets. Combining structural analysis and mutagenesis studies, it was determined that, for an efficient PAL, the first position in LAD1 is preferably bulky and aromatic (Trp/Tyr) and the second position (the gatekeeper) is hydrophobic (Val/Ile/Cys/Ala). Conversely, a Gly at the gatekeeper position favours proteolysis as is observed in the AEPs. For LAD2, small hydrophobic dipeptides (e.g., GlyAla/AlaAla/AlaPro) are favoured in PALs as they retain the leaving group, blocking access to the thioester bond until another peptide acts as a nucleophile. In the case of AEPs, a bulky residue such as Tyr at the first position may destabilize the acyl–enzyme intermediate by facilitating the departure of the cleaved peptide group and exposing the acyl–enzyme thioester to water. Using this insights they were able to re-engineer a protease from Viola candadensis (VcAEP) into an effective peptide cyclase using a single point mutation of this Tyr residue to an alanine in the LAD2 region.
A distinct advantage of Butelase-1 is that it is the fastest known ligase with a very high catalytic efficiency. A typical butelase-mediated reaction requires 100- to 1000-fold less enzyme than a reaction carried out with sortase A. It also has a shorter recognition sequence than sortase (Asx–His–Val) and broader tolerance for the first N-terminal residue for intermolecular peptide and protein ligations, however it is limited by the identity of the second residue. The first applications of butelase-1 were chiefly limited by its availability, since it could only be obtained by extraction from plant tissue. Nguyen et al.76 first attempted to recombinantly express the enzyme in E. coli in 2014, however it was only expressed in an insoluble form. Only very recently have James et al.87 successfully expressed recombinant butelase-1 in E. coli. The enzyme was produced as an inactive zymogen, which is the native form of AEPs and PALs, and matured by autoactivation at low pH in a protocol mimicking the natural process in the plants. The recombinant protein possessed a His6 tag at its N-terminus followed by a GS linker and the fully encoded butelase-1 (minus the 20-residue endoplasmic reticulum signal peptide). After purification of the N-terminally His-tagged zymogen, dialysis at pH 4.0 led to cleavage of the C-terminal propeptide which blocks the active site as well as the N-terminal propeptide. As part of the same study, the crystal structure of the purified zymogen was solved which will potentially allow engineering of butelase-1 to avoid the need for an activation step in the future. In contrast to this multi-step protocol from E. coli, butelase-1 could be successfully produced following overexpression in the yeast Pichia pastoris.88 In this case, export into the ER of the yeast cells also enhanced the formation of disulfide bonds between the five cysteine residues present in butelase-1 enabling folding of the active enzyme. The availability of recombinant butelase-1 will open many more opportunities for protein engineering in the near future.
Due to the earlier lack of a recombinant expression system that limited supplies, most studies of butelase-1 activity have demonstrated its application following immobilisation. For example, Hemu et al.89 immobilised butelase-1 using three different attachment methods: non-covalent affinity capture using both concanavalin A-agarose beads that recognise butelase-1 glycans and NeutrAvidin beads binding to the biotinylated enzyme, as well as covalent attachment via direct coupling of amines to NHS ester-functionalised agarose beads. The immobilised butelase-1 was reusable for >100 runs with undiminished activity, lowering the consumption of enzyme. Immobilisation also enhanced the stability and prolonged the shelf life of the enzyme compared to the soluble form by reducing aggregation and autolysis into less active forms. In particular, the immobilisation increased the effective concentration of the enzyme, accelerating catalytic activity of ligation reactions such as cyclisation and cyclo-oligomerisation under one-pot conditions or in a continuous flow-reactor.
The substrate specificity of the acyl-donating side of subtiligase is assumed to be retained from subtilisin BPN′, which has been extensively studied structurally and biochemically.97–105 However, acyl acceptor preference screening has been carried out specifically for subtiligase.106,107 In particular, an approach called proteomic identification of ligation sites (PILS) has been applied for identifying N-terminal substrate specificity.107 Using peptides derived from proteolysis of E. coli cell lysates it is possible to rapidly profile the ligation efficiency for >25000 different potential substrates which can then be identified by isolation and sequencing of ligated peptides via LC-MS/MS. This allowed rapid determination of the preferred substrate specificity (Scheme 2G). The position preferentially binds small amino acids, Met or basic residues, and the position is preferentially aromatic, large, and hydrophobic. Mutation of subtiligase was also used to map residues in the enzyme which lead to this specificity revealing that Tyr217 and Phe189 are the primary determinants of and specificity, respectively.
Subtiligase has been utilised in many applications including peptide cyclisation,106 the synthesis of thioesters108 and the synthesis/semi-synthesis of large proteins.109 For example, Wells et al.109 used the enzyme to perform total synthesis of Ribonuclease A from six peptide fragments. Due to the chemo-selectivity of subtiligase for the protein N-terminus, the enzyme can be utilised for site-specific protein modification.106 The first example of this was the modification of human growth hormone where the N-terminal structural and sequence requirements for efficient ligation were explored. In this case, it was discovered that introducing an extended N-terminal sequence to the protein resulted in higher modification yields as is often the case for other peptide ligases. Other advantages of the enzyme are that it can be recombinantly expressed in high yields and only requires a sub-stoichiometric amount of enzyme. The principle disadvantages of subtiligase are, however, that the enzyme only works on peptide ester substrates as acyl donors and that a large excess of acyl acceptor/donor is required to suppress the hydrolytic reaction. Near quantitative ligation of peptide substrates could be obtained using a 10-fold excess of some acyl acceptors, suggesting that this approach had promise for peptide assembly but that further optimisation was required.94
Both subtilisin and subtiligase are calcium-dependent due to the presence of a calcium-binding domain required for efficient folding of the proteolytic domain. Deletion of this domain from subtiligase and addition of a set of 18 stabilising mutations previously identified for subtilisin111 yielded a calcium-independent variant of subtilisin, peptiligase. This enzyme can be easily expressed in Bacillus subtilis and has high catalytic efficiency.112 Peptiligase catalyses peptide bond formation between C-terminal carboxamidomethylester fragments and N-terminal acyl-acceptor nucleophiles. In this case, the peptiligase-mediated reaction is very selective for peptide ligation over the hydrolysis given conversions of 60–80% using only 1.5 equivalents of acyl acceptor. Peptiligase was also used to synthesise head-to-tail macrocyclic peptides, producing a 21-mer macrocycle with a yield of 82%. The enzyme was also shown to be functional in the presence of organic solvents and denaturants. Synthetic peptide libraries have subsequently been used to map the specificity of the acyl-acceptor side of peptiligase.113 Unlike subtiligase, peptiligase accepts only the small amino acids Ser, Gly and Ala at the position, dictated due to interactions with Met213 and Leu208 in the enzyme (analogous to Met222 and Tyr217 in subtiligase) while a hydrophobic residue is still required at the position. While effective for peptide couplings, the overall substrate concentrations typically used (10 mM) in reports of subtiligase-mediated reactions are typically at least an order of magnitude higher than would typically be used for protein modification reactions and the majority of reports of this peptide ligase have been in peptide rather than protein applications as discussed later.
Scheme 4 Substrate engineering strategies employed to enhance product yields with SrtA, Butelase and OaAEP1. (A) Formation of a β-hairpin prevents binding of SrtA to the reaction product.115 (B) Hydroxyacetamide products are not substrates for the reverse reaction.25 (C) Cyclisation of the diglycyl motif with loss of serine generates a diketopiperazine.117 (D) A GlyGlyHis motif is a ligand for Ni2+ in solution which sequesters the product peptide as an inactive complex.118 (E) β-Thioacetamide products are not substrates for the reverse reaction.119 (F) Enzyme selectivity is exploited: while OaAEP1 can act on a NGL sequence to form an NGV product, the NGV sequence is a poor substrate.91 (G) The product peptide with an N-terminal cysteine is sequestered by formation of a complex.90 |
A second, more widely adopted approach is to use substrates which generate an inactive by-product. An example of this is using ester-containing substrates to modify a protein which lead to an alcohol-containing by-product which is not a substrate for the enzyme thereby rendering the reaction irreversible (Scheme 4B). Antos et al.114 first demonstrated this with methyl ester containing substrates (Scheme 4B; LPRT-OMe), however stoichiometric quantities of sortase and excess substrate was required to achieve quantitative labelling presumably since the methyl ester was a poorer substrate for sortase than the peptide product. Williamson et al.25 instead generated depsipeptide substrates which more closely mimicked the peptide substrate in that only the amide nitrogen of the Thr-Gly linkage was replaced with an oxygen to generate an ester linkage (LPEToGG). This technique was used to label a range of proteins with essentially quantitative ligation yields using around 2–3 equivalents of the labelling reagent and 20 mol% sortase.25,116 Williamson's results showed that depsipeptide substrates allow rapid labelling of both peptides and proteins using a small excess of substrate and catalytic quantities of sortase. An alternative ester substrate generated by Liu et al.117 placed the ester linkage outside the sortase recognition motif.
In this case, LPETGG-isoacyl-Ser/Hse containing substrates were used to N-terminally modify a protein (Scheme 4C). Upon ligation, the released by-product spontaneously cyclises to generate diketopiperazine. One potential advantage of these substrates is that these esters are reportedly more stable than Antos and Williamson's substrates. Despite these disadvantages, depsipeptide substrates have seen numerous applications including in applications such as profiling N-terminal glycine containing proteins.120 Most recently this approach has been used by Wang et al. in combination with a HPXTG-specific sortase to generate a wide range of engineered histone H2B variants with complexly modified N-termini.121
In a third approach, Row et al.118 employed a technique that deactivated the by-product through nickel-coordination. (Scheme 4D) In this case, the labelling substrate motif was extended to LPXTGGH; the GGH tripeptide formed as a result of reaction chelates Ni2+, thereby sequestering the product and inhibiting participation in the reverse reaction. Building on this work,122 the group worked to further develop and optimise this metal-associated sortase-mediated ligation (MA-SML) approach through peptide model studies to establish the structural features of ligation substrates and nucleophiles. With the extended C-terminal recognition motif, LPXTGGHH5, and a solution additive (Ni2+), modification of full-size proteins with fluorophores, PEG and a biorthogonal cyclooctyne moiety was achieved. An advantage to the MA-SML approach is that it can be applied to both N-terminal and C-terminal sortase labelling, unlike the ester approach which is only appropriate for N-terminal labelling. However large quantities of Ni2+ are required for this approach and this may not be compatible with all protein systems or for in vivo application.
In all of the approaches described above, the general strategy is to in some way chemically ‘remove’ the by-product species from the reaction equilibrium in order to drive the reaction to completion. An alternative approach, pioneered by Freiburger et al.123 for the preparation of segmentally labelled samples for NMR is to physically remove the by-product from the reaction mixture. This removal can be achieved by carrying out coupling reactions in centrifugal concentrators, such that the product peptide (which is smaller than the molecular weight cutoff of the device) is removed from the reaction mixture by cycles of concentration and dilution. This approach can be effective where a C-terminal labelling species is large relative to the peptide product and where the proteins involved can tolerate repeated cycles of centrifugal concentration.
Cong et al. have recently described a different approach towards the engineering of sortase substrates.124 They focused on the limitations of producing proteins with N-terminal glycines for N-terminal labelling which did not rely on the action of methionine aminopeptidase or signal peptidase in the cell or the use of engineered recognition sites for proteases such as TEV protease to reveal the N-terminal glycine sequence. To address this challenge, Cong et al. developed a one-step ‘swapping’ approach for the site-specific N-terminal sortase-labelling/protein-fusion of recombinantly produced proteins. Proteins were overexpressed including an N-terminal MH6-LPETG5-motif, addition of sortase then revealed the glycine motif in situ enabling coupling to a labelling peptide which also contained the sortase motif. While this approach worked well for the near-quantitative labelling of the protein, a substantial excess (5–15-fold) of the labelling peptide was required. This approach was also used to produce C–N protein fusion VHH-GFP via the sortase-mediated coupling of VHH-LPETGGH6 and MH6LPETG5-GFP, in this case, while product was formed an excess of the VHH-LPETGGH6 protein was required to drive ligation.
Iwai and co-workers showed that the OaAEP1 C247A variant also recognises a NCL motif.125,126 This property was utilised by Tang et al.90 to develop an alternative chemo-enzymatic strategy to reduce the reversibility of the OaAEP1-mediated ligation reaction for both N- and C-terminal labelling. In their approach, the CL-terminated peptide, formed as a result of ligation between a C-terminal NCL motif and an N-terminal GL is sequestered via reaction with 2-formylphenylboronic acid to form a thiazolidine (Scheme 4G).127,128 The reaction is also extremely efficient, with a bimolecular rate constant of up to 105 M−1 s−1. The technique was utilised for both site-specific C-terminal and N-terminal protein labelling.90 Using 2 equivalents of a labelling peptide it was possible to achieve between 75% and 92% labelling on C-termini and 79% on the N-terminus. The high yields achieved with only 2 eq. of labelling substrate illustrates the effectiveness of the approach at a relatively low label-to-protein ratio.
As discussed earlier, SpSrtA from S. pyogenes, which is also Ca2+-independent, has been used in protein labelling reactions, accepting both LPXTG and LPXTA motifs.114 Nikghalb et al.133 have subsequently investigated the substrate specificity of a range of sortase A enzymes of staphylococcal and streptococcal origin. In general, streptococcal sortases accept a broader range of substrates then SaSrtA, including LPXTG, LPXTA and LPXTS motifs and consequently N-terminal Gly, Ala and Ser nucleophiles. In particular, Streptococcus pneumoniae sortase A, that recognises the LPXTS substrate, was used for site-specific modification of the N-terminal serine residue of a 48-residue antimicrobial peptide. Additionally, Schmohl et al.134 determined that streptococcal sortases show a strong preference for an LPXLG motif over LPXTG. These results highlight the potential for alternative sortases but many of these have not been extensively exploited, often due to the low catalytic activity of the isolated enzymes.
Zou et al.66 have recently reported the design of SpSrtA variants with improved transpeptidase activity towards different N-terminal amino acid residues. Based on sequence alignment of sortase A from different species they identified conserved residues near the active site suitable for mutation. Three SpSrtA variants (S141G, V206I, and T209D) were generated and assayed using a protein fusion system between a C-terminal LPETG motif and an N-terminal AA-motif. SpSrtA V206I showed significantly improved activity in comparison to WT SpSrtA. Subsequently, site-saturation mutagenesis in the β6/β7 and β7/β8 loops using the optimised SortEvolve47 high-throughput assay described above led to identification of four variants (E189H, E189V, E215A and E215G) with improved activity (≥1.3-fold). The SpSrtA E189H/V206I/E215A M3 triple mutant showed 6.8-fold increased transpeptidase activity when compared to WT. This catalyst could then be used for conjugation of AA-, SS- and CC-terminated motifs to model proteins and for circularisation of eGFP constructs with N-terminal AA and SS-motifs.
The sortase from Corynebacterium diphtheriae (CdSrtA) (pilus-specific enzyme), has also been exploited for protein modification.61,135 The enzyme was originally considered to be a sortase A enzyme, thus named accordingly. However, unlike sortase A, CdSrtA functions as a pilin polymerase and therefore can be categorised into the C family. The enzyme covalently links SpaA pilin subunits together via lysine-isopeptide bonds. This linkage is between an internal WxxxVxVYPK pilin motif in the N-terminal domain and a C-terminal LPLTG motif. Following formation of an acyl–enzyme intermediate between catalytic Cys222 and the LPLTG motif, the intermediate is then attacked by the reactive Lys190 residue within NSpaA's pilin motif resulting in a Thr494–Lys190 isopeptide bond between CSpaA and NSpaA domains within adjacent pilin subunits. The overexpressed WT CdSrtA is catalytically inactive in vitro due to the presence of an N-terminal polypeptide lid segment that masks the enzyme's active site. Introduction of D81G and W83G lid mutations activates the enzyme and a soluble catalytic domain with these mutations is able to site-specifically ligate the isolated NSpaA and CSpA domains in vitro.135 Introduction of a third mutation (N85A) further increases activity leading to 35% more product after a 24 h incubation. The conjugation reaction catalysed by CdSrtA 3M enables site-specific lysine labelling, creating an isopeptide bond but requires two specific motifs and is currently limited in yield, nonetheless it does provide an interesting avenue for future engineering studies.
This chimeric protein consisted of the SrtB Lys174–Asp215 loop inserted between SrtA Asp160 and Lys177 (renumbered to Lys203 as the SrtB loop is 26 residues longer). This replacement of the β6/β7 loop in SrtLS was sufficient to change the specificity profile for NPQTN by over 700000-fold, verifying that the β6/β7 loop is the primary substrate recognition site. However, SrtLS was only able to catalyse the hydrolysis of the motif and not the ligation reaction. This may indicate that the swapped loop could prevent the nucleophilic substrate accessing the active site or there may be additional domains in the SrtB enzymes that are necessary for transpeptidation. Nevertheless, the study illustrated that engineering the substrate specificity of SaSrtA enzymes has potential.
In an alternate approach, Piotukh et al.45 demonstrated the first use of directed evolution to identify a SrtA mutant that possesses broader substrate specificity. To achieve this, a library of 108 sortase mutants was constructed, designed to screen for sortases that recognise the FPXTG motif. This motif was selected as bioinformatics approaches indicate that it exists in nature and marginal ligation of this motif has also been observed using SaSrtA. To produce the library, six amino acids in the β6–β7 loop of sortase, representing solvent-exposed positions in spatial proximity to leucine in the LPXTG motif, were randomised. The library of sortases were generated with N-terminal pentaglycine motifs and C-terminally fused to the pIII protein of M13 phages (Fig. 1B). Exposure of the phage library to biotin-GFPKTGGRR-NH2 peptides therefore led to covalent modification of those phage with mutations that promoted the ligation reaction but not hydrolysis, phage encoding active mutants could then be accumulated via streptavidin capture. Three rounds of selection yielded a set of four mutants that, following subcloning and overexpression, were shown to tolerate a range of amino acids at the first position in the motif. Of these, the F40 mutant (Table 1) was identified to prefer FPXTG to LPXTG, but ligation efficiency was low. However, the mutant had remarkably broad specificity, and actually had a preference for Ala in the first position of the motif. Ligation reactions using an APKTG-containing peptide with SrtA-F40 resulted in 55% labelling after 24 h. Although the mutant has reduced activity compared to SaSrtA, it could still be used to modify histone H3, a protein that has a native APATG motif located at the interface between the globular fold and the tail.
Building upon this work, Schmohl et al.63 established a second generation sortase library, with the β6–β7 loop randomised at nine positions, based on a more recently determined NMR structure of sortase A.136 This new structure had revealed a different conformation for the β6–β7 loop in the bound substrate state which indicated that the initial residues selected for randomisation may not have been ideal for evolving SaSrtA. Thus, a redesigned SaSrtA library was generated, including variation of β6–β7 loop length. The library comprised of approximately 2 × 108 mutants and was screened via the previously established phage display system to identify mutants that accepted substrates containing APXTG or FPXTG recognition motifs. This led to the identification of the F1-21 mutant (Table 1) which accepted both sorting motifs efficiently and showed the highest activity of all sortase mutants isolated so far by phage display. The majority of the isolated mutants contained β6–β7 loops that were longer than the native loop.
In another study, Dorr et al.64 evolved two orthogonal sortase variants with altered specificity based on eSrtA, eSrtA(2A-9) with 11 mutations which recognises LAXTG and eSrtA(4S-9) with 9 mutations which recognises LPXSG (Table 1). The yeast display screen that had been used to evolve eSrtA was modified for this application, with the addition of a negative selection against recognition of off-target substrates.44 Nine rounds of yeast display screening with concomitant refinement of library design and screening strategy led to the evolution of variants of eSrtA that were reprogrammed to recognise new substrates with specificity changes of up to 51000-fold relative to eSrtA and minimal loss of catalytic activity. Both eSrtA(2A-9) and eSrtA(4S-9) strongly prefer the LAXTG and LPXSG substrates, respectively, over the LPXTG substrate, with up to 24-fold specificity for their target substrates. Mutational dissection of the two variants revealed the importance of residue 104 for enzyme activity and specificity at position 2 of the sortase motif. In combination, residues 104, 118 and 182 determine the activity and specificity at position 4 of the sortase motif. Furthermore, eSrtA(4S-9) was demonstrated to modify human protein fetuin A (recognition sequence LPPAG) in unmodified human plasma with high efficiency and specificity, which was unachievable with WT or eSrtA. Both variants could be used to mediate rapid synthesis of double modified fluorophore-protein-PEG conjugates and to functionalise GGG-linked surfaces simultaneously and orthogonally with target peptides.
Recently, the substrate specificity of SaSrtA has been reprogrammed to modify the Alzheimer's disease (AD)-associated Aβ protein, which contains an LMVGG sequence at residues 34–38.65 The yeast display and FACs strategy used to evolve the eSrtA enzymes was also applied here.64 Evolution was started from one of the library of sortase variants previously evolved to recognise LPESG variants (4S.6).64 The rationale for this was that mutants already possessing altered substrate recognition at the fourth position would be a more promising start. After 16 rounds of evolution, involving diversification of the library pools for each round via error-prone PCR, site-saturated mutagenesis and DNA shuffling, SrtAβ was generated. This involved the decrease in concentration of biotinylated LPVGG as a positive selection substrate and decrease in off-target non-biotinylated LPESG. This was to increase the stringency of the screen as the rounds went on. The resultant Srt-Aβ enzyme had 25 amino acid changes (Table 1) compared to the parent sequence 4S.6. These mutations ranged from mutations at positions known to mediate sortase specificity to mutations at highly conserved residues in naturally occurring sortases. These diverse changes provide insights into mechanisms of sortase functions. Compared to the starting enzyme 4S.6, SrtAβ had a 53-fold reduced activity on LPESG, 11-fold reduced activity on LPPAG and 28-fold increased activity on LMVGG. Overall, the directed evolution process lead to a 1500-fold change in the preference of SrtAβ for LMVGG over LPESG compared to SrtA(4S.6) The evolved enzyme, SrtAβ, was used to generate conjugates with Aβ monomers using peptides such GGGK(biotin) and GGGRR, validating the evolution of epitope-specific enzymes as a strategy for site-specific labelling of endogenous peptides. SrtAβ could also conjugate peptides to endogenous Aβ in human CSF and is a promising tool for the study of amyloid proteins.
Piper et al.137 have reported the effect of mutation in the β7–β8 loop region on the activity of SpSrtA. As discussed above, this enzyme is able to act on an LPX1TX2 sequence where X2 is Ala, Ser or Gly but the activity is otherwise relatively low. Wojcik et al. had previously shown that grafting the β7–β8 loop from SaSrtA into SpSrtA generated an LPXTG specific enzyme138 however Piper et al. investigated the effect of creating SpSrtA chimeras where the β7–β8 loop from SrtA from a variety of other Gram-positive bacteria was grafted into the SpSrtA backbone.137 Many of these chimeras such as SpSrtAfaecilis (in which three amino acid) substitutions were able to catalyse reaction of LPX1TX2 peptide substrates faster than SpSrtA. Most interestingly, some of these enzymes were also able to act on a wider range of amino-acid nucleophiles including SpSrtAfaecilis which was shown to act on a LPXTV sortase recognition sequence.
Finally, most recently, Wang et al. used site-directed mutagenesis to combine mutations found in F40-Sortase (which has relaxed specificity for the first position in the recognition motif), eSrtA and the Ca-independent Srt7M and Srt7Y to generate a range of candidate sortases to act on a HPDTG motif found in histones.121 Screening against a fluorescent substrate peptide candidate containing this motif was sufficient to identify a mutant with suitable activity for use in subsequent generation of site-specifically modified histones.
In terms of the acyl-donor side, the mapping of the and pockets led to the production of a subtiligase mutant with altered substrate specificity for and residues.107 The mapping was achieved via alanine scanning and quantifying the resultant changes in ligation specificity using the PILS method. Based on the results, ‘hot spot’ positions 189 ( pocket) and 217 ( pocket) were then targeted for saturation mutagenesis and the specificity profiles of the mutants were analysed using PILS. From this it was determined that Y217K/R mutants improved the reactivity towards sequences with an acidic residue, whereas Y217D/E mutants more efficiently modified a His, Lys, Ser or Arg residue. The F189Q/K/R mutants improved modification of peptides with an acidic residue. However, several F189 variants were expressed at much lower levels than WT subtiligase. Oxidation of Met222 is known to affect enzyme activity,139 and it also occurred in the enzyme variants. Mutation at the 222 position to alanine or glycine can improve subtilisin activity and enhance aminolysis to hydrolysis ratio in subtiligase.113,140 Thus, the F189 and Y217 mutations, along with M222A, were also introduced into the subtiligase heptamutant, stabiligase.107 The resultant variants were expressed at levels comparable to WT subtiligase and maintained the specificity profiles of the mutants. The introduction of the M222A mutation eliminated the methionine oxidation and improved the ligation to hydrolysis ratio.
To demonstrate application of these mutants,107 recombinant antibodies with N-terminal Ser-Asp on the light chain and N-terminal Glu-Ile on the heavy chain were produced. Based on the PILS specificity maps, these N-termini were predicted to be poor substrates for wild type subtiligase, and this was confirmed experimentally. The Y217K mutant quantitatively labelled the heavy chain, however, no measurable labelling of the light chain was observed using the F189R/M222A mutant. This was attributed to inaccessibility of the N-terminus and after addition of a four amino-acid linker, 62% ligation was achieved. To enable wider application of the generated mutants, a web-based tool, α-Amine Ligation Profiling Informing N-terminal Modification Enzyme Selection (ALPINE), was established to aid the selection of optimal subtiligase variants for modification of a particular N-terminal sequence.107 There has yet to be a mutant discovered that recognises all N-terminal sequences, thus selection of an appropriate mutant is important.
Omniligase-1, one of the broad specificity variants, is commercially available and has been used for chemo-enzymatic peptide synthesis (CEPS) of peptides,142 protein semi-synthesis and head-to-tail macrocyclizations of various linear peptides having a free N-terminus.143 Omniligase-1 provides an efficient inter- and intramolecular peptide ligation method for almost any peptide sequence and is scalable and robust enough for industrial application. For example, the enzyme was used in the large-scale synthesis of a 39-mer pharmaceutical exenatide.144
Scheme 5 Examples of application of (A) strategy for the dual labelling of both termini of the same protein using SpSrtA and SaSrtA. Adapted from Antos et al.114 (B) Strategy for the triple labelling of distinct capsid proteins in a M13 bacteriophage particle. Adapted from Hess et al.141 |
Hess et al. have also demonstrated orthogonal labelling with SpSrtA and SaSrtA by functionalising distinct capsid proteins in the same M13 bacteriophage particle. First, the N-terminus of pVIII was labelled with TAMRA-LPETGAA via SpSrtA, followed by N-terminal labelling of pIII with an antibody-LPETG via SaSrtA.145 The group then demonstrated triple capsid protein labelling (Scheme 5B),141 which was achieved by engineering a loop into pIII, containing a Factor Xa cleavage site and LPXTG motif. The first label, containing the LPETGAA motif, was attached to the N-terminus via SpSrtA. Cleavage with Factor Xa, revealed the LPXTG motif in the loop. SaSrtA could then be used to simultaneously label the pIII protein at the C-terminal site with a triglycine-containing substrate, along with pentaglycine installed at the N-terminus of pIX with a LPETGG-containing substrate.
The sortases SaSrtA and CdSrtA 3M have also been used for sequential site-specific dual labelling.104 These two enzymes are orthogonal as they recognise distinct nucleophiles, for SaSrtA an N-terminal glycine and for CdSrtA a lysine in a pilin motif. A fusion protein containing a SUMO protein with an N-terminal pentaglycine peptide, and a C-terminal pilin motif (G5-SUMOPM) was produced. The protein was first incubated with CdSrtA 3M and FITC-LPLTGpep to yield G5-SUMOPM-FITC through conjugation of the threonine of the peptide to the lysine in the pilin motif. After removal of excess FITC-LPLTG peptide using a desalting column, the target protein was then incubated with AlexaFluor546-LPATG and SaSrtA. The threonine of the peptide was conjugated to the N-terminal glycine of the protein, producing the double labelled product. The advantage of this approach is the distinct nucleophile and sorting signal substrate specificities of each sortase which limits cross reactivity. CdSrtA 3M is unable to hydrolyse the LPATG sequence or use it as a transpeptidation substrate; it is specific to LPLTG. Conversely, the isopeptide bond creating by CdSrtA 3M is not significantly hydrolysed by SaSrtA or CdSrtA after 24 hours.
Despite these advances using other natural sortases, there is currently only one example of orthogonal sortase-labelling with SaSrtA enzymes with altered specificity. This would be a superior approach as extensive investigations have been carried out on SaSrtA to understand the structure and mechanism, as well as engineering of the enzyme and substrates to generate efficient labelling strategies. Le Gall et al.146 used a CRISPR/Cas9 based strategy to engineer a hybridoma secreting mIgG1 antibodies (anti-CD20 WT) to a stable daughter cell line producing Fab′ fragments carrying two distinct sortase motifs; an eSrt2A-9 (LAETGG) motif on its heavy chain and an eSrt4S-9 (LPESGG) motif on its light chain (anti-CD20 DTFab′). The DTFab’ molecules could be isolated and modified at the sortase motif sites. Upon incubation with either sortase mutant, eSrt2A-9 or eSrt4S-9, in the presence of a GGC-C-K(FAM) peptide, exclusive fluorescent labelling was detected at the heavy chain (HC) or light chain (LC) labelling sites, respectively. Cross reactivity was not seen for either reaction, indicating that the close proximity of the sortase motif sites did not affect the specificity of either enzyme and allowed distinct payloads at each at the C-termini of the HC and LC. The researchers then demonstrated sequential dual site-specific modification by first incubating DTFab′ with eSrt4A-9 and GGG-C-K(FITC), achieving near quantitative labelling of the HC. Upon isolation of the labelled product, a 60% yield was achieved. Following this, the DTFab’FITC product was incubated with eSrt4S-9 and GGG-K(N3), achieving near quantitative labelling of the LC. The DTFab′FITC/N3 product could be isolated with a 50% yield. Further modification of the LC was achieved by reacting the azide group on the peptide with PEG5k-DBCO in a strain-promoted alkyne-azide cycloaddition (SPAAC). MALDI-TOF and SDS-PAGE was used to confirm the identity and purify of the final product. The target binding capacity of the obtained dual-labelled Fab’ fragment was not compromised. As a result, strategies such as this one could be valuable in the development of next-generation antibody–drug conjugates.
Although promising, the main downside to this technique is the large amounts of excess labelling reagent used (50 equivalents) If the reagent is precious, such as a cytotoxic payload, then this labelling strategy is not appropriate. However, a work around strategy of adding a functional group into the labelling reagent to enable a more efficient conjugation strategy (strain-promoted cycloaddition) to further modify the compound is possible, as utilised in this example. The dual labelling in this strategy must also be carried out sequentially. The ultimate goal would be to do these modifications in a one-pot reaction.
In an alternative approach, Bierlmeier et al. achieved orthogonal multi-fragment assembly with one enzyme, SaSrtA, via ligation site switching (Scheme 6A).147 The group identified that the leucine in the P4 position of the LPXTG motif could be replaced with L-Cys(StBu) and still be recognised by SaSrtA. Once this residue is reduced to cysteine (and further desulfurized to alanine), the motif is no longer recognised by the enzyme, switching it from an ON state to an OFF state. This approach was used in a proof of concept four fragment ligation reaction with a nucleophilic GGGWW peptide and Nvoc-GG-C(StBu)PKTGGRR. The GGGWW peptide was ligated to the C-terminus of the motif-containing peptide to produce ligation product Nvoc-GG-C(StBu)PKTGGGWW. Reduction and desulfurization of C(StBu) residue converted it to alanine and switched OFF the motif, preventing further C-terminal labelling of the peptide. The N-terminus could then be Nvoc-deprotected and further reacted with a sortase-motif containing peptide. The ligation site switching sortase-mediated ligation approach was also used to develop artificial nucleosome mimics to probe bivalent chromatic factors and antigen oligomers to probe antigen-presenting cell function.
Scheme 6 Recent examples of expansion of the substrates for peptide ligases to enable segment assembly and the generation of complex assemblies such as triubiquitins. (A) The use of tertbutylthiol cysteine disulfides as leucine isosteres enables the generation of sortase substrates which can then be deactivated by reduction and desulfurisation.147 (B) Incorporation of azidoacetyl glycyl lysine into proteins enables subsequent reduction using 2-diphenylphosphinobenzoic acid (2DPBA) and labelling using sortases.148 (C) Extension of this approach to applications with multiple orthogonal sortases enables the synthesis of specific triubiquitin and diubiquitylated SUMO constructs using both internal and N-terminal labelling.149 |
In addition to direct anchoring of proteins to the cell surface, many sortases function to link proteins such as pilins together on the bacterial cell surface by covalently linking a sorting motif to a lysine residue within the protein. Despite this, attempts to exploit this reactivity have been limited and most of these enzymes show limited reactivity beyond their native substrate and yields for engineered motifs are typically low. Lang and co-workers148 have recently described an exciting approach which enables such conjugation by the use of genetic-code expansion to incorporate an ε-azidoacetyl-glycyl-lysine residue into the peptide backbone. Following Staudinger reduction to reveal a diglycyl motif, they were able to generate a range of diubiquitin analogues using both Srt5M and eSrtA(2A-9) in vitro in addition to site-specifically SUMOlyated and ubiquitylated proteins (Scheme 6B). Most excitingly, they were able to carry out both the reduction using 2-(diphenylphosphino)benzoic acid and the sortase-modification step in both E. coli and mammalian cells using a Ca-independent variant of eSrtA(2A-9). They have subsequently expanded this work and demonstrated that the eSrtA(2A-9) and Srt5M are orthogonal enabling them to generate a range of complex tri-ubiquitin and mixed ubiquitin/SUMO scaffolds (Scheme 6C).149
Scheme 7 Examples of the combined application of SrtA and butelase-1 to enable double labelling of proteins and formation of protein fusions. Orthogonal labelling combining SaSrtA and butelase-1. (A) dual labelling of ubiquitin via a three-step tandem ligation with native chemical ligation.150 (B) One-pot conjugation of two nanobodies via their C-termini to produce C-to-C protein conjugates.151 This was done with a PEG linker and oligonucleotide linker. (C) One-pot conjugation at the C-terminus of the light chain and heavy chain of an antibody.151 |
Due to their recognition of different motifs, sortase A and butelase-1 can be used for multiple labelling of a protein in a one-pot reaction, reducing reaction times and increasing product yield.85 This was demonstrated by Harmand et al.151 who conjugated two VHHs (nanobodies) via their C-termini to produce C-to-C protein conjugates (Scheme 7B). One nanobody, VHH7, contained the LPETGG motif and the other, VHH–Enh, contained the NHV motif. These proteins were conjugated through a two-headed PEG-based linker via sortase-mediated and butelase-mediated ligation, respectively. Another conjugate was produced in a similar fashion with a double-stranded oligonucleotide as a linker, leading to a protein–DNA–protein product. In the same paper, one-pot orthogonal dual labelling was used to produce an antibody-probe conjugate (Scheme 7C). Orthogonal butelase-1 and sortase A were utilised to modify a full-size antibody IgG1 at the C-terminus of the light chain and heavy chain, respectively, via their recognition motifs engineered into the molecules.
Scheme 8 Examples of the use of P (A) orthogonal labelling combining butelase-1 and VyPAL2 to prepare a cycloprotein-drug conjugate.152 (B) pH controlled orthogonal ligation with VyPAL2 to produce a fluorescein-drug-labelled affibody.153 (C) Substrate controlled orthogonal labelling of an anti-UBC6e nanobody via OaAEP191 (D) Use of C-terminal 2-aminoethylamides to enable C–C tail-to-tail protein dimerisation using OaAEP1. General structure of peptide substrates for homodimerisation and strategy to enable heterodimerisation via use of C-terminal protein thioesters.154 |
The same group, led by Zhang,153 have subsequently used VyPAL2 alone for sequential orthogonal ligation by controlling the pH of the reactions. All previous applications of PALs, described above, involve the use of substrates only, likely due to their enhanced ligation efficiency at neutral pH compared to their aspartic acid-containing counterparts.72–75 VyPAL2 has already been shown to preferentially bind Leu in the position.81 Zhang et al. showed that VyPAL2 worked effectively on peptides with the sequence DSL with a pH optimum of pH 4.5 (presumably driven by the balance of the need for protonation of the aspartic acid in the S1 pocket). In contrast, VyPAL2 catalyses ligation reactions at neutral pH most efficiently with NGL-containing peptides.81 They therefore used pH to switch the selectivity of VyPAL2 and two separate substrates by altering the pH of the reaction. An sfGFP protein was produced with an N-terminal GV and C-terminal NSL was labelled on the N-terminus using a C-terminal DSK DSL-containing targeting peptide at pH 4.5. After purification, C-terminal ligation with a N-terminal GV-containing DOX peptide was carried out at neutral pH; the DGV motif formed in the first reaction was unaffected. A one-pot tandem ligation was also achieved with an adjustment of pH after the addition of the second substrate though in this case a side reaction between the two labelling substrates was observed. To carry out labelling in the opposite order, an affibody (ZEGFR) was prepared with a C-terminal DSL sequence, and a thiazolidine-capped CI motif at the N-terminus to prevent cyclisation (Scheme 8B). The first reaction with GV-Cit-PABA-DOX was carried out at pH 4.5 followed by deprotection of the N-terminus with Ag+ to allow ligation of asparagine-containing Fluorescein-GRANGI at an adjusted pH of 7.4. Using OaAEP1 for this ligation led to formation of a significant quantity of a cyclisation product as the DG bond formed in the first ligation is not completely orthogonal to OaAEP1 at pH 7.4. However, OaAEP1 has higher catalytic activity towards aspartic acid-containing substrates than VyPAL2 and butelase-1. Thus, use of OaAEP1 for the first ligation step and VyPAL2/butelase-1, which show optimum activity towards asparagine-containing substrates at neutral pH, for the second step produced a dual-fluorescein-labelled affibody with higher efficiency and specificity.
More recently the relatively short recognition motif of OaAEP1 has been exploited by the same group to enable the synthesis of C–C tail-to-tail dimeric proteins.154 To generate homodimeric proteins they generated synthetic peptide substrates with an N-terminal GLH motif and a C-terminal leucyl-ethylene diamine motif which mimics the N-terminal GL motif (Scheme 8D). This could then be used with OaAEP1 to generate homodimeric proteins. A variety of candidate amino-acids other than L-leucine were explored and in general the L-enantiomers were preferred to D- and leucine to valine. For heterodimeric proteins, an intein-based strategy was used to generate one-target protein with a C-terminal leucyl thioester. Aminolysis with ethylene diamine generated a C-terminal amine which is then sufficiently close to the normal N-terminal GlyLeu motif that OaAEP recognises it and is able to catalyse transpeptidation to a protein bearing a C-terminal NGLH motif.
The sortases and the peptidyl asparaginyl ligases such as Butelase and OaAEP1 show the greatest promise for future applications in protein engineering. While ligases generated from proteases such as subtiligase have potential, their requirement for ester and thioester substrates and low specificity makes their application in protein engineering more challenging. In the case of sortase, rounds of protein engineering mean there are now a wide range of enzymes with increased rate of reaction as well as mutual orthogonality. These enzymes are often readily available with high bacterial overexpression yields. In contrast, PALs are not yet as readily available but are faster and have shorter recognition motifs which may be better tolerated in the final protein products.
In this regard, the first challenge addressed in the introduction, specificity, has been addressed with a number of sortases and peptidyl asparaginyl ligases now available with distinct recognition motifs. Doubtless this range will be increased in the coming years, via directed evolution and the discovery and characterisation of other naturally occurring peptidyl asparaginyl ligases and sortases. This field is of particular interest, since numerous other housekeeping sortases are extant in Gram-positive bacteria. If we can understand how they recognise their protein substrates then it will be possible to develop a new class of reagents which label, for example, internal lysine residues in defined sequence motifs.
The second challenge, versatility, has been demonstrated most clearly for the sortases. Ca-independent variants have enabled in cell labelling and the application of diverse variants has now started to enable the synthesis of complex protein scaffolds as well as their successful applications in cells. Despite these successes, challenges remain: currently the peptidyl asparaginyl ligases have only really been applied to in vitro systems and approaches for their recombinant expression are only now being optimised. The exploitation of these and other related PALs in combination with sortases promises to be a rich area of development for complex protein and peptide assembly.
The final challenge, efficiency, is critical for the wider application of these approaches. Since transpeptidation is an equilibrium process, driving reactions to completion almost inevitably requires an excess of reagents. This is particularly noticeable in most described examples of protein fusion where an excess of one protein component is required. Even when optimised peptide substrates for labelling are used, an excess is usually required. Numerous approaches to perturb these equilibria such as the use of depsipeptide substrates and substrates which form complexes with metals or other small molecules, or the use of mechanical separation to remove low molecular weight by-products have been reported. All these approaches carry challenges however, from the requirement to add divalent metals to the protein, to the need to generate complex synthetic substrates – ultimately in all these cases a moderate excess of one reagent is still required and no approach in which essentially quantitative ligation using 1:1 reagents has been reported. The major challenge to enabling such a ‘perfect’ reaction, in which two substrate molecules are ligated to generate the desired product, remains the hydrolysis side-reaction. Most peptide ligases such as sortase also catalyse hydrolysis of their substrate although the aminolysis reaction is approximately 105–106-fold favoured over hydrolysis at optimal pH. Despite this selectivity, excess or high concentrations of labelling reagents are needed to compensate for their hydrolysis, or more critically, the hydrolysis of protein substrates which makes them incompetent for subsequent labelling reactions. Whether it is possible to evolve ligases to avoid this challenge remains to be seen but this and the development of other strategies to maximise product formation will be critical if the use transpeptidases are to be extended from single labelling reactions to the efficient synthesis of large multicomponent assemblies in a routine fashion, and even on industrial scale.
This journal is © The Royal Society of Chemistry 2022 |