Richard J. Payne*a and Chi-Huey Wongbc
aSchool of Chemistry, The University of Sydney, Sydney, Australia. E-mail: payne@chem.usyd.edu.au; Fax: +61 2 9351 3329; Tel: +61 2 9351 5877
bPresident, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei, Taiwan. E-mail: chwong@gate.sinica.edu.tw; Fax: +886 2 2785 3852; Tel: +886 2 2789 9400
cDepartment of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA. E-mail: wong@scripps.edu
First published on 30th October 2009
A number of recent advances in the chemical synthesis of glycopeptides and glycoproteins are described, with particular focus on the development of peptide ligation strategies and their implementation in the convergent assembly of complex glycopeptides. Recent applications in the synthesis of full length homogeneous glycoproteins are also highlighted.
Richard J. Payne | Richard J. Payne was born in Christchurch, New Zealand, in 1980. He obtained his PhD from the University of Cambridge under the supervision of Professor Chris Abell and in 2006 moved to the Scripps Research Institute where he worked as a Postdoctoral Research Fellow under the guidance of Professor Chi-Huey Wong. Here, he was involved in the development of new ligation strategies for the synthesis of glycopeptides. In 2008, he was appointed as a Lecturer within the School of Chemistry at the University of Sydney. His current research interests include tuberculosis drug discovery, carbohydrate chemistry and glycopeptide synthesis. |
Chi-Huey Wong | Dr Chi-Huey Wong is President of Academia Sinica and Professor of Chemistry at the Scripps Research Institute. He is a member of Academia Sinica, Taipei, the American Academy of Arts and Sciences, and the US National Academy of Sciences. His research interests are in the areas of bioorganic and synthetic chemistry, including the development of new synthetic chemistry based on enzymatic and chemoenzymatic reactions, synthesis of complex carbohydrates, glycoproteins and small-molecule probes for the study of carbohydrate-mediated biological recognition, development of carbohydrate microarrays for high-throughput analysis of carbohydrate–protein interactions, and drug discovery. |
Broadly speaking, almost all native protein glycosylations can be classified into two types: O-glycosides, whereby a glycan is α- or β-linked to the hydroxyl of serine, threonine or tyrosine, or N-glycosides, in which N-acetylglucosamine is β-linked to the amide side chain of an asparagine present within an Asn-Xaa-Ser/Thr consensus sequence (Fig. 1). The oligosaccharides attached to the protein backbone are usually complex, branched structures. This is a consequence of the fact that attachment of each monosaccharide unit to the pre-existing glycan can occur at one of four hydroxyl groups with either α- or β-stereochemistry. As such, a simple library of tetrasaccharides generated combinatorially from the nine common sugar building blocks would result in over 15 million structures.7 Putting this figure in context, tetrameric peptides and nucleic acids generated in the same way (from 20 proteinogenic amino acids and four DNA bases) would give rise to 160000 and 256 different structures, respectively. It has been postulated that glycosylation could serve to expand the information derived from the concise human genome and facilitate biological processes exclusive to more complex organisms.8–10
Fig. 1 Common O- and N-linkages of glycans to the protein backbone. |
Despite the importance of glycoproteins in a myriad of biological processes, progress towards studying their structure and function has been agonisingly slow, due in major part to difficulties in obtaining them in homogeneous form. Obviously, if one aims to understand the role of glycosylation at a molecular level, it is imperative to have access to homogeneous glycopeptides and glycoproteins, however, this is by no means a trivial exercise. The difficulty arises from the untemplated nature of glycosylation which, unlike protein synthesis, is not under the control of a coding template but, rather, is dictated by the relative activities of a number of glycosyltransferase enzymes. For this reason the resulting glycoproteins are generally produced as heterogeneous mixtures of glycoforms which are generally inseparable by currently available chromatographic techniques. Recombinant expression cannot, in most cases, be used for the production of homogeneous glycoproteins. It is currently accepted that chemical and chemoenzymatic intervention can be used to solve the availability problem.7,10–15 In this article, we will discuss the development of a number of peptide ligation strategies and their application in the synthesis of complex, homogeneous glycopeptides. It is intended that this article complement a number of excellent reviews previously published7,10–19 and, additionally, highlight some recent breakthroughs in the area of glycoprotein total synthesis.
By far the most efficient method for the ligation-based assembly of peptides, proteins and post-translationally modified peptides and proteins is native chemical ligation (NCL). The concept of NCL dates back to the 1950s with the pioneering work of Wieland et al.,20 however it was not until 1994 that this method gained widespread attention as an efficient method for the convergent ligation of peptides when Kent and co-workers reported its application in the synthesis of interleukin-8, a cytokine responsible for the proliferation of B cells during an immune response.21 The reaction relies on the chemoselective condensation between a peptide bearing an N-terminal cysteine residue and a peptide containing a C-terminal thioester moiety to afford an amide bond (Scheme 1). Such a ligation reaction involves a rapid, reversible transthioesterification step to generate a thioester intermediate which then undergoes an irreversible S → N acyl shift to generate a native peptide bond (Scheme 1). The scope of this method has been extensively examined, with most amino acids on the C-terminus of the peptide thioester component shown to undergo facile ligations.22 Since its inception the method has received widespread attention for the synthesis of proteins. Notably, NCL has been successfully utilised in the synthesis of over two hundred full length proteins and has been the subject of several excellent reviews.17–19,23–27
Scheme 1 Proposed mechanism of native chemical ligation (NCL).21 |
NCL has also served as a useful tool for the synthesis of biologically relevant glycopeptides and glycoproteins, with the first significant example being the total synthesis of diptericin ε, an antibacterial glycopeptide containing 82 amino acids and two O-linked glycosylation sites.28 Since the primary sequence is devoid of cysteine residues, it was necessary to introduce a Gly25-Cys mutation to allow for a synthetic disconnection by NCL. It was also necessary to mutate two further residues, namely Asp29-Glu and Asp45-Glu, to enable its construction. The synthesis was initiated by SPPS of a 24-residue N-terminal glycopeptide thioester 1 and a 58-residue glycopeptide 2 containing an N-terminal cysteine residue, both fragments bearing α-N-acetylgalactosamine (α-GalNAc, also known as the TN antigen) at the two glycosylation sites (Thr11 and Thr54, Scheme 2). Owing to problems relating to thioester hydrolysis under the conditions afforded by Fmoc-strategy SPPS, peptide thioesters are traditionally produced via the Boc-strategy. However, in the context of glycopeptide thioester construction, such iterative acid deprotection conditions are usually incompatible with the labile glycosidic linkages. To circumvent this problem, Ellman’s modification of Kenner’s sulfonamide “safety-catch” linker was employed for the synthesis of the N-terminal glycopeptide thioester fragment.29,30 Upon assembly of the glycopeptide by Fmoc-strategy SPPS, the sulfonamide moiety of 3 was selectively alkylated by treatment with iodoacetonitrile in the presence of N,N-diisopropylethylamine (DIPEA) to afford resin bound 4. Thiolysis with benzyl mercaptan successfully released the fully protected glycopeptide thioester which was then treated with an acidic cocktail to remove both the side chain protecting groups and the N-terminal Boc-carbamate to afford 1. Glycopeptide thioester 1 and glycopeptide 2 were subjected to the standard NCL conditions [6 M guanidine hydrochloride (Gn·HCl), 0.1 M phosphate buffer, thiophenol (PhSH), pH 7.5] to afford, after glycan deacetylation by hydrazinolysis, synthetic diptericin ε (5) which retained antimicrobial activity despite the amino acid substitutions.
Scheme 2 Total synthesis of the antibacterial glycopeptide diptericin εvia NCL.28 |
In another early example, Bertozzi and co-workers demonstrated the NCL-based synthesis of a glycoprotein with multiple O-glycosylation sites of the so-called “mucin type”.31 The target, lymphotactin (Lptn), is a 93-residue chemokine which serves as a potent chemoattractant for T cells and natural killer cells.32,33 The C-terminus of Lptn contains a mucin domain with up to eight serine and threonine O-glycosylation sites. The synthetic strategy involved the use of NCL to join two equally sized fragments: peptide thioester 6 corresponding to Lptn1–48 and glycopeptide 7 (Lptn49–93) bearing an N-terminal cysteine and eight α-GalNAc moieties (Scheme 3). As peptide thioester 6 does not possess any backbone glycosylation, it was synthesised via Boc-strategy SPPS. In contrast, glycopeptide 7 was synthesised via Fmoc-strategy SPPS incorporating the preformed glycosylamino acids into the growing chain. The two fragments were ligated under standard NCL conditions to afford the 93-amino acid glycoprotein in 38% yield after HPLC purification. The low yield in this case can be attributed to the use of a thioester containing a C-terminal valine, known to react slowly under NCL conditions owing to the sterically demanding nature of the side chain. In order to generate the native structure, the final step involved folding, achieved by incubating at pH 8 in the presence of cysteine and cystine, to afford the native glycoprotein 8 in 49% yield. Synthetic Lptn 8 was subsequently assessed for its ability to bind its cognate chemokine receptor (XCR1) expressed on human embryonic kidney cells, specifically by assessing its activation of a signal transduction cascade providing an increase in intracellular calcium concentrations. Surprisingly, 8 exhibited similar activity to its unglycosylated counterpart, suggesting that the simple glycans do not play a role in recognition and binding to the chemokine receptor. The authors suggest that the native (presumably more complex) O-linked glycans on Lptn would impart a different three-dimensional structure compared to the bridgehead monosaccharides in their synthetic version therefore modulating function in vivo.
Scheme 3 Total synthesis of lymphotactin (Lptn) by NCL.31 |
The examples discussed so far have demonstrated the utility of NCL for the generation of glycopeptides displaying simple O-linked monosaccharides on the peptide backbone. However, most glycoproteins require more complex glycans for optimal biological activity (vide supra). Over recent years a suite of glycosyltransferase enzymes have become available that can be used for the chemoenzymatic elaboration of glycopeptides and glycoproteins.34–42 While this represents a powerful method for the synthesis of complex glycopeptides, there are several possible drawbacks. These include the potential for incomplete glycosylation, giving rise to glycoformic mixtures which may or may not be separable by chromatography. Additionally, there are a number of glycosidic linkages which cannot be accessed because the requisite enzyme is not readily available. To alleviate these problems, many groups have decided to prepare the desired oligosaccharides synthetically. These can then be incorporated into glycopeptides in one of two ways: the first being introduction of preformed glycosylamino acids which can be used as “cassettes” directly in the SPPS of glycopeptides and glycopeptide thioesters. Alternatively, a more convergent approach, first reported by Cohen-Anisfeld and Lansbury and dubbed the “Lansbury aspartylation,” allows simple access to N-linked glycopeptides.43,44 The method relies on the direct coupling of a glycosylamine onto a polypeptide chain containing an aspartic acid residue. The reaction conditions have been optimised to minimise aspartimide side product formation and, due to the nature of the coupling conditions, extensive protection of the peptide backbone and oligosaccharide is not necessary (with the exception of other carboxylate side chains). The glycopeptides and glycopeptide thioester fragments synthesised by one of these two strategies can be implemented in NCL reactions in the normal way to generate glycopeptides and glycoproteins bearing complex glycans as defined glycoforms.
The first example of an N-linked glycopeptide bearing a complex-type glycan to be constructed by means of NCL was reported by Unverzagt and co-workers in their synthesis of a glycosylated fragment of RNase B (Scheme 4).45 Fmoc-strategy SPPS was utilised for the synthesis of glycodecapeptide thioester 9 which was assembled on PEGA resin 10 incorporating two linkers before the assembled glycopeptide. The first of these linkers was the Ellman-sulfonamide safety-catch linker discussed previously.29,30 The group also utilised a Rink amide linker which allowed for peptides to be released during peptide assembly in order to assess the efficiency of the synthesis by HPLC and LC-MS. The Fmoc-protected glycosylasparagine presenting an unprotected biantennary heptasaccharide (11) was coupled into the growing peptide chain as a “cassette” in high yield by pre-activation with 1-H-benzotriazol-1-yl-oxytripyrrolidinophosphonium hexafluorophosphate (PyBOP) and DIPEA. The authors used just 0.8 equivalents of the precious glycosylamino acid with respect to the resin bound peptide and achieved a 95% yield as determined by measuring the piperidine–fulvene adduct produced upon deprotection with piperidine–DMF. Acetylation of the oligosaccharide hydroxyls was next conducted to stabilise the glycosidic linkages to acidic cleavage conditions and to prevent unwanted acylation reactions. This was achieved using acetic anhydride, acetic acid and pyridine and allowed for selective acetylation without activating the sensitive sulfonamide linker. Following acetylation, the sulfonamide was activated by treatment with the powerful methylation reagent trimethylsilyldiazomethane followed by thiolysis with mercaptopropionic acid ethyl ester (100 equiv.) and sodium thiophenolate (3 equiv.) to release the fully protected glycopeptide thioester from the solid support. Finally, acidic cleavage of the side chain protecting groups furnished the desired glycopeptide thioester 9 in 46% yield based on the coupling of glycosylamino acid 11. Generation of the C-terminal fragment 12 bearing an N-terminal cysteine residue (RNase 41–68) was achieved via standard Fmoc-strategy SPPS and this was reacted with thioester 9 under standard NCL conditions (on an analytical scale). After reacting for 8 h, followed by in situ deacetylation by treatment with aqueous hydrazine in the presence of DTT, glycopeptide 13, corresponding to RNase 30–68, was observed by HPLC and mass spectrometry.
More recently, Danishefsky and co-workers have reported the synthesis of a protected version of the full length glycoprotein β-human follicle-stimulating hormone (β-hFSH) possessing two N-linked chitobiose disaccharide units.46 In this report the authors also describe the synthesis of a glycopeptide fragment of β-hFSH bearing an N-linked dodecasaccharide and a C-terminal thioester. It is anticipated that this fragment will be utilised for the NCL-based synthesis of a more complex glycoform of this important glycoprotein in future studies.
Although hundreds of proteins19,23 and a number of glycopeptides have been synthesised with the assistance of NCL-based strategies, the total synthesis of a homogeneous glycoprotein bearing a complex glycan and a native amino acid sequence remained elusive until 2008 when the groups of Dawson and Kajihara reported the preparation of a single glycoform of the 76-amino acid chemokine monocyte chemotactic protein-3 (MCP-3).47 The group chose to assemble the glycoprotein via the ligation-based assembly of three fragments. These included peptide 14 (residues 36–76) bearing an N-terminal cysteine and peptide thioester 15 (MCP-3 11–35) containing an N-terminal thiazolidine as a masked cysteine which were both prepared by Boc-strategy SPPS (Scheme 5). The final fragment was glycopeptide thioester 16 (MCP-3 1–10) containing a complex-type sialylated N-glycan. The latter fragment was prepared using both Boc- and Fmoc-strategy SPPS, the former being particularly impressive given the potential lability of glycosidic linkages under iterative treatment with trifluoroacetic acid in the Boc-deprotection steps (vide supra). The three fragments were subsequently assembled via two consecutive NCL reactions. Specifically 14 and 15 were ligated to afford MCP-3 11–76 (17) followed by ligation with 16 to furnish the full length glycoprotein in relatively high yield, further demonstrating the efficiency of the NCL reaction in the assembly of complex targets. The glycoprotein was subsequently folded to afford native MCP-3 (18) as a pure glycoform. The correct alignment of disulfide bonds was confirmed by chymotrypsin degradation studies.
Since the seminal report by Kent and co-workers,21 a number of modifications have been made to the traditional NCL reaction. For example, Boons and co-workers have carried out glycopeptide ligations within liposomes to aid in the facile construction of lipophilic glycopeptides.48,49 These liposome-mediated NCL reactions were found to increase reaction rates and provide higher yields than those conducted in aqueous buffered media.
In another modification to the traditional NCL reaction, a number of groups have chosen to modify the C-terminal thioester to incorporate a masked thioester or an alternative acyl donor. For example, Danishefsky and co-workers reported a method for the synthesis and use of masked thioesters in NCL, thus overcoming some of the problems associated with lability of peptide and glycopeptide thioesters. The strategy was to synthesise a C-terminal peptide phenolic ester bearing an o-disulfide moiety which could rearrange to produce a thioester under reductive conditions (Scheme 6).50 The phenolic ester can be readily prepared by introducing a suitably protected 2-mercaptophenol unit to a peptide or glycopeptide (generated by SPPS). Upon reduction of the disulfide with a suitable reagent e.g. sodium 2-mercaptoethanesulfonate (MES-Na) or triscarboxyethyl phosphine (TCEP) a free thiol is revealed which can participate in a reversible O → S acyl shift to generate a C-terminal thioester (Scheme 6). Importantly, the thioester can be generated in situ in a ligation reaction by adding a suitable reducing agent and, in the presence of a peptide or glycopeptide bearing an N-terminal cysteine, will undergo an NCL reaction. This strategy was applied in the synthesis of glycopeptide 19 bearing two N-linked glycans including the core N-linked pentasaccharide and an unnatural version containing a bridging glucose in place of mannose (Scheme 7). Based on the early work of Bodanszky,51 Danishefsky and co-workers have also reported the use of an activated p-nitrophenyl ester as a thioester surrogate in NCL reactions.52 Recently, Blanco-Canosa and Dawson reported the preparation of C-terminal peptide isoureas which can serve as precursors to C-terminal thioesters.53 Alternatively, these can undergo thioesterification in situ and can participate in NCL reactions. Peptide isoureas could be synthesised in high yield using standard Fmoc-strategy solid-phase chemistry.53 Similarly, the cysteinyl prolyl ester has recently received attention as a masked thioester equivalent in peptide ligation reactions.54 Due to ease of preparation, these new C-terminal acyl donors and thioester precursors should find wide application in future glycopeptide and glycoprotein syntheses. It is also anticipated that further modifications to the NCL reaction will continue to be reported with a view to improving its utility in the synthesis of a host of glycopeptides and glycoproteins.
Scheme 7 NCL-based synthesis of a complex model glycopeptide using an o-disulfide phenolic ester as a masked thioester.50 |
Production of an N-terminal coupling partner for use in NCL requires that the fragment be expressed bearing a C-terminal thioester. This can be achieved by taking advantage of naturally occurring self-splicing elements called inteins, analogous to introns present in nucleic acids.55–57 Inteins have the ability to catalyse their excision from a protein through a series of acyl-transfer reactions in which a cysteine thioester is a key intermediate. By introducing an affinity tag into the intein, this intermediate can be isolated, and upon exposure to an appropriate ligation partner will undergo NCL to afford the desired product. In the context of glycoprotein synthesis, this allows the semi-synthesis of proteins containing glycosylation at their C-terminus and has recently been exploited by a number of groups.62–65 The power of this technology was aptly demonstrated in the synthesis of a homogeneous glycoprotein variant of maltose-binding protein (MBP) by the Wong group (Scheme 9).62 The synthesis initially involved expression of the 392-amino acid MBP in Escherichia coli as a fusion protein to the N-terminus of an intein derived from Saccharomyces cerevisiae possessing a chitin-binding domain at the C-terminus (23). The thioester 24 formed between the N-terminal cysteine residue of the intein and the C-terminus of the MBP was purified on chitin beads before undergoing transthioesterification (thioester exchange) to generate a soluble C-terminal MBP thioester. This was reacted in situ with a glycodipeptide bearing an N-terminal cysteine, e.g. H-Cys-Asn(β-GlcNAc)-OH to afford thioester 25 which underwent an S → N acyl shift to furnish homogeneous MBP 26 bearing C-terminal glycosylation. Enzymatic elaboration of the bridgehead glycan was also demonstrated, thereby providing scope for the synthesis of more complex glycoforms.
An additional advantage of EPL is that both the N-terminal and C-terminal engineering approaches to afford peptide and protein fragments are orthogonal to one another. This allows for the introduction of glycopeptide fragments at both the C- and N-termini of a suitably masked expressed protein fragment via NCL. Macmillan and Bertozzi were the first to exploit a double ligation approach in the EPL-based synthesis of three distinct glycoforms of GlyCAM-1, a glycoprotein ligand involved in leukocyte homing (Scheme 10).63 GlyCAM-1 consists of two heavily glycosylated mucin domains at the N- and C-termini which are bridged by a central unglycosylated domain. To enable the synthesis of this complex glycoprotein by EPL, three substitution mutations were made to cysteine, namely Lys-41, Gln-78 and Gln-102. The central domain (GlyCAM-1 41–77) was produced recombinantly using the IMPACT system, which relies on a pH-dependent intein cleavage (Scheme 10).66 The fusion construct 27 was designed containing a factor Xa cleavage site which served as a protecting group, masking the N-terminal cysteine (Cys-41) for a subsequent ligation. The most impressive glycoprotein synthesised in this study was 28, containing heavily glycosylated N- and C-termini. Glycopeptide 29 (GlyCAM-1 78–132) was first ligated to fusion construct 27via NCL to afford 30.64 Cys-41 was then liberated by treatment with factor Xa followed by ligation with glycopeptide thioester 31 to afford the full length 132-amino acid GlyCAM-1 (28). Upon completion of the synthesis, the internal cysteine residues were capped with iodoacetamide so as to mimic the glutamine residues found in the native glycoprotein.67 It should be noted that Imperiali and co-workers have also used a fusion protein produced via the IMPACT technology for the semi-synthesis of a glycosylated version of the bacterial immunity protein Im7.65
Perhaps the most impressive example of an EPL-based semi-synthesis of a full length homogeneous glycoprotein was the recently disseminated total synthesis of the 124-amino acid enzyme RNase C by the Unverzagt group.68,69 This work significantly built on previous studies by the same group in which a 38-amino acid glycopeptide fragment of RNase B (13) was prepared using a combination of SPPS and NCL assembly strategies (see Scheme 4).45 In this recent study, the group chose to utilise the commercially available IMPACT system to produce a large peptide fragment (RNase 40–124) which could subsequently be used in the NCL-based assembly of a full length protein as a single glycoform. Unfortunately, in preliminary studies, expression of the fusion protein in E. coli led to the formation of inclusion bodies and, as such, the intein did not self-cleave to afford the desired fragment bearing an N-terminal cysteine residue.68 To remedy this problem the authors developed a novel approach to facilitate solubility of the thiol rich fragment. Specifically, carboxyethylmethanethiosulfonate (CEMTS)7032 was used to chemoselectively derivatise the seven cysteine side chains as mixed disulfides under refolding conditions (Scheme 11). The disulfide-protected fragment 33 was ligated with an intein derived peptide thioester, Met-RNase1–39 (34), which, under reductive conditions (TCEP), gave the full length protein 35 in 36% yield after isolation by gel filtration. It should be noted that the ligation reaction was conducted under strictly inert conditions in a nitrogen tent (<10 ppm O2) to prevent reoxidation (which would inevitably lead to insoluble protein). The semi-synthetic protein was subsequently refolded using a glutathione redox couple under rapid dilution which produced stable RNase which exhibited enzymatic (hydrolase) activity.
With the methodology developed and applied successfully for unglycosylated RNase, the stage was set for application to the total synthesis of a single glycoform of RNase C.69 Initial studies aimed at preparing the RNase1–39 glycopeptide thioester using the solid-phase double-linker strategy previously reported by the group45 thus allowing for a two fragment–one ligation synthesis of the glycoprotein. However, the 40-amino acid fragment could not be produced in this manner thereby necessitating a further disconnection at Cys26. This in turn led to three fragments that could be assembled by two sequential NCL reactions (Scheme 12). The N-terminal peptide thioester 36 (RNase 1–25) was assembled on double-linker resin which, after activation and thiolysis under the previously developed conditions, gave the desired fragment in moderate yield (20%). In this case the authors incorporated two Ser-Ser pseudoproline dipeptide units in order to increase the yield of the solid-phase synthesis. The central fragment, a glycopeptide thioester corresponding to RNase 26–39 (37), was synthesised in a similar manner. The complex-type nonasaccharide was obtained from egg yolk and modified into the preformed Fmoc-Asn building block for incorporation into SPPS. After coupling, the free glycan hydroxyls were acetylated to prevent cross reactivity. Efficient coupling of further amino acids was maximised by the use of a Lys-Ser pseudoproline dipeptide and norleucine residues were incorporated in place of Met 30 and 31 to prevent oxidative sulfoxide formation during the synthesis. Additionally, a thiazolidine was incorporated at the N-terminus of the fragment to prevent homocoupling during the ligation reaction. Activation and thiolysis of the fully assembled glycopeptide provided thioester 37 in 18% yield. With the desired synthetic fragments in hand, the group turned to the semi-synthesis of RNase C. This involved reaction of disulfide-protected RNase 40–124 (33) with thioester 37 (RNase 26–39), containing the complex nonasaccharide, under reductive NCL conditions (TCEP) to afford the 98-amino acid RNase fragment bearing an N-terminal thiazolidine. This was subsequently unmasked with methoxyamine at pH 3–4 to generate glycopeptide 38 bearing an N-terminal cysteine residue. Ligation with peptide thioester 36 using mercaptophenyl acetic acid (MPAA) as the activating thiol under strictly anaerobic conditions afforded full length RNase C. This was refolded directly over four days and the desired glycoprotein 39 was isolated in high yield by gel filtration (71%). The synthetic enzyme displayed a similar circular dichroism spectrum to that of the native folded protein and was shown to be hydrolytically active, exhibiting 50% of the activity of RNase. Although the synthetic variant contains two methionine–norleucine mutations, this ligation-based assembly of a full length homogeneous glycoform possessing a complex glycan represents a landmark study in the field of glycoprotein synthesis. It is also possible that such a strategy will allow for more efficient routes to a number of other homogeneous glycoproteins by synthetic means in coming years.
Initial studies directed at overcoming the requirement of an N-terminal cysteine in NCL led to the development of thiol-based auxiliaries. These included 1-phenyl-2-mercaptoethyl auxiliaries 40 and 4171 and the 4,5,6-trimethoxy-2-mercaptobenzyl (Tmb) auxiliary 42.72 These can be attached to the N-terminus of a synthetic (glyco)peptide and are readily removed following ligation by treatment with trifluoroacetic acid (TFA) (Scheme 13). It is believed that the thiol of the auxiliary facilitates chemical ligation in an analogous manner to the side chain of a cysteine residue, as depicted for the Tmb auxiliary in Scheme 13. The utility of these auxiliaries with respect to glycopeptide synthesis was first demonstrated by Macmillan and Anderson who used both auxiliaries to synthesise fragments of the O-linked glycoprotein GlyCAM-1.73
Scheme 13 Structures and proposed mechanism of the 1-phenyl-2-mercaptoethyl auxiliaries 40 and 41 and 4,5,6-trimethoxy-2-mercaptobenzyl (Tmb) auxiliary 42.71,72 |
Several elaborate examples of such auxiliary-based ligations have since been reported by Danishefsky and co-workers in the construction of complex glycopeptides bearing multiple glycans.74,75 In one pertinent example the authors describe the thiol auxiliary-based ligation between glycopeptide 43 bearing a variation of the Tmb auxiliary and a masked glycopeptide thioester 44, both presenting an N-linked chitobiose glycan (Scheme 14). The masked thioester 44 was equipped with a C-terminal phenolic ester (described previously by the Danishefsky group, vide supra), which was converted to the desired thioester via an O → S acyl migration under the reducing ligation conditions. The resulting thioester underwent thioesterification with the auxiliary-containing glycopeptide 43 (after in situ reduction of the disulfide bond) followed by an S → N acyl shift to afford the desired glycopeptide 45 bearing two glycans. This example is particularly impressive given the reported difficulty of auxiliary-mediated ligations for hindered amino acids at the ligation junction.73 Following ligation, the N-terminal thiazolidine was unmasked to provide an N-terminal cysteine residue which was ligated with another complex glycopeptide thioester via a standard NCL reaction.
Scheme 14 Synthesis of N-linked glycopeptide 45 using a variant of the Tmb auxiliary.74 |
Scheme 15 (A) Synthesis of O- and N-linked glycopeptides by sugar-assisted ligation (SAL), X = –O– or –NH(CO)–; (B) second generation SAL.76,79 |
Scheme 16 Proposed mechanism of sugar-assisted ligation (SAL).76 |
Scheme 17 Synthesis of model glycopeptide 49 by NCL followed by metal-free desulfurisation of cysteine to alanine.87 |
The same authors recently reported the application of the NCL metal-free desulfurisation protocol in the synthesis of a homogeneous glycopeptide fragment of erythropoietin (EPO),92 a glycoprotein hormone produced naturally in the kidney cells and used clinically for the treatment of anaemia associated with a number of diseases.93–95 The strategy involved the synthesis of a 7-amino acid glycopeptide thioester 51via selective protecting group manipulation and introduction of the complex-type sialylated glycan via the Lansbury aspartylation (Scheme 18).43,44 This was ligated, under the standard NCL conditions, with peptide 52 corresponding to EPO1–21, containing an Acm-protected cysteine and armed with a C-terminal o-disulfide phenolic ester. Additionally, the fragment possessed an allyl-protected glutamic acid to prevent unwanted side reactions. Unfortunately, the ligation only proceeded in low yield and generated the cyclic lactone 53 after allyl ester deprotection by palladium-catalysed chemistry. The thiolactone was subsequently opened by treatment with thiopropionic acid to afford glycopeptide thioester 54 presenting a free thiol side chain which could now be desulfurised. Reaction of 54 in a reducing buffer in the presence of thiopropionic acid as a radical propagator and VA-044 gave EPO1–28 (55) in 67% yield. At this point glycopeptide 55 still contained an Acm-protected cysteine residue and a C-terminal thioester, which is amenable to further ligations, envisioned for the total synthesis of a homogeneous glycoform of EPO (see Schemes 24 and 25). Given the efficiency and chemoselectivity of these ligation–desulfurisation strategies for the generation of glycopeptides, it is anticipated that this method, and variations thereof, will continue to find wide use in the synthesis of complex glycopeptides and glycoproteins.
Scheme 18 Synthesis of a complex N-linked glycopeptide fragment of erythropoietin (EPO1–28) by NCL followed by metal-free desulfurisation.92 |
Scheme 19 NCL-desulfurisation at (A) phenylalanine96 and (B) and (C) valine.86,98 |
In order to increase the reactivity of the β-mercaptovaline moiety in the above ligation, Danishefsky and co-workers reported the preparation of a valine building block derivatised with a more reactive primary thiol at the γ-position.98 The γ-thiol valine was generated as a mixture of diastereoisomers, however, since this stereocentre is abolished upon desulfurisation to valine it did not pose a problem in isolating peptides retaining stereochemical integrity. The authors reported significant rate enhancements with peptides bearing N-terminal γ-thiolated valine residue when compared directly against the penicillamine (β-thiol) derived peptides. It is important to note that the γ-thiol variant would proceed through a six-membered ring transition state in the S → N acyl transfer compared with a five-membered ring in the peptides bearing an N-terminal penicillamine. This suggests that the reactivity/accessibility of the thiol is an important factor for the rate of these reactions and suggests that the initial thioesterification step is rate determining. C-terminal peptide thioesters, o-thiophenolic esters, p-nitrophenyl esters and p-cyanophenyl esters were successfully employed as acyl donors in this reaction to furnish ligation products in high yields (Scheme 19C). In the context of glycopeptides, the method was implemented in the ligation of a peptide bearing an N-terminal γ-mercaptovaline (56) to a C-terminal peptide o-thiophenolic ester bearing an N-linked disaccharide (57) to generate glycopeptide 58 in 90% yield after only 30 minutes reaction time at ambient temperature (Scheme 20). Desulfurisation using the metal-free method (TCEP and VA-044)87 proceeded smoothly to afford the native glycopeptide 59 possessing a native internal valine residue in 89% yield.
Scheme 20 Synthesis of a model N-linked glycopeptide by NCL followed by desulfurisation at valine.98 |
Scheme 21 (A) Proposed mechanism for the conversion of cysteine to serine. (B) Synthesis of a model glycopeptide 62 bearing a complex N-linked glycan via NCL followed by conversion of the ligation site cysteine to serine.99 |
Scheme 22 Synthesis of 23 kDa MUC2 tandem repeat glycoprotein via the silver-promoted “thioester method.”109 |
Scheme 23 Synthesis of a model N-linked glycopeptide via sequential phenolic ester directed amide coupling reactions (PEDAC TCEP followed by PEDAC AgCl).114 |
The method described above was recently employed in impressive fashion for the synthesis of two complex homogeneous glycopeptide domains of EPO (Cys29–Gly77115 and Gln78–Arg166116). These were constructed bearing suitable protection and functionality with a view to future implementation in the ligation-based total synthesis of a fully glycosylated homogeneous version of the 166-amino acid glycoprotein. The synthesis of the first of these fragments (Cys29–Gly77)115 was envisaged using a three fragment, two ligation assembly in the N → C direction as described for the model glycopeptide 67 (see Scheme 23).114 The key difference was the requirement for a C-terminal thioester to remain on Gly77 to enable a subsequent ligation with the Gln78–Arg166 fragment in future synthetic endeavours. The group therefore decided to utilise three different acyl donors with tuned reactivities to achieve this unprecedented and formidable task. A model peptide, corresponding to Cys29–Pro42 bearing a C-terminal p-cyanophenolic ester,98 was first ligated to a peptide containing a C-terminal phenolic ester with an o-disulfide moiety (corresponding to Asp43–Gly57). These were reacted under the PEDAC conditions, with the omission of AgCl and TCEP to prevent reduction of the disulfide. Unfortunately, although these two fragments could be ligated, this occurred with complete hydrolysis of the masked thioester at Gly57. This hydrolysis was attributed to the steric accessibility of the glycine ester and, as such, the authors introduced an additional ortho-substituent to block both π-faces of the Gly57 carbonyl group. Under the same conditions, glycopepide 71 containing a C-terminal p-cyanophenolic ester and a complex N-linked dodecasaccharide was ligated to peptide 72, bearing the modified C-terminal phenolic ester (Scheme 24). This provided the desired glycopeptide 73 in 47% yield, and proceeded without any detectable loss of the C-terminal phenolic ester. This was then subjected to a subsequent ligation with peptide 74, corresponding to Gln58–Gly77, which contained a C-terminal glycine alkyl thioester. On this occasion, the PEDAC TCEP conditions were used to facilitate disulfide reduction and formation of the active acyl donor in situ. This provided glycopeptide 75 containing suitably protected amino acid side chains [Glu(OAllyl), Cys(Acm), Lys(ivDde)], an N-terminal thiazolidine and a C-terminal alkyl thioester in 51% yield.
Scheme 24 Synthesis of EPO (29–77) N-linked glycopeptide fragment by two sequential PEDAC ligations.115 |
The C-terminal fragment of EPO (Gln78–Arg166) was constructed by three consecutive PEDAC TCEP ligation reactions.116 Significant solubility problems were initially encountered en route to this rather hydrophobic fragment. To overcome these difficulties, secondary amino acid surrogates,117 specifically pseudoproline and Dmb-protected dipeptides, were incorporated to disrupt any secondary structures that may form during peptide assembly. The synthesis of the 88-amino acid fragment was envisaged via the assembly of two short glycopeptide fragments EPO (78–87) 76 and EPO (123–129) 77, and two long peptide fragments EPO (88–122) 78 and EPO (130–166) 79 (Scheme 25). A number of protecting group strategies were investigated, the most appropriate being the side chain protection of lysine and cysteine side chains with Alloc and Acm groups, respectively. Additionally, the N-termini of 76–78 were Fmoc protected and Asp123 was protected with a fluorenylmethyl ester (Fm) until after the first ligation reaction. In contrast to the EPO (29–77) fragment (Scheme 24), EPO (78–166) was synthesised in the traditional C → N direction. The first ligation was between glycopeptide 77, bearing a C-terminal o-disulfide phenolic ester and an unprotected glycophorin-type O-glycan (EPO123–129), and peptide 79 corresponding to EPO (130–166). This was conducted under the PEDAC TCEP conditions and, upon completion, the N-terminal Fmoc and side chain Fm ester on Asp123 were removed by treatment with piperidine in DMSO to afford the desired glycopeptide in 59% yield over the two steps. This was subsequently ligated to 34-amino acid peptide 78 which also contained a C-terminal proline o-disulfide phenolic ester and an N-terminal Fmoc group. After completion of the ligation reaction, Fmoc deprotection furnished the desired glycopeptide corresponding to EPO (88–166) in 52% yield over the two steps. The final PEDAC TCEP ligation with glycopeptide 76 containing a complex N-linked dodecasaccharide and a C-terminal proline o-disulfide phenolic ester, followed by Fmoc deprotection, furnished EPO (78–166) (80) in 50% yield. Although the Alloc and Acm side chain protecting groups could be removed at this stage (via Pd(0) and Hg(II) or Ag(I),91 respectively), these were intentionally left intact to allow for the ligation-based total synthesis of a full length homogeneous glycoform of EPO in future work. Mass spectral analysis also revealed the presence of lactones between the carboxylate of sialic acid and the 4-position of neigbouring galactose units (not shown in Scheme 25), however, these should be amenable to hydrolysis upon completion of the synthesis. The convergent assembly of the enormously complex glycopeptide fragments necessary for the total synthesis of EPO clearly demonstrates the power of the PEDAC strategy, which will undoubtedly feature in future glycoprotein syntheses where NCL cannot be employed.
Scheme 25 Synthesis of EPO (78–166) glycopeptide fragment containing both N- and O-linked glycans by three sequential PEDAC ligations.116 |
Scheme 26 Synthesis of a 60 residue MUC1 glycopeptide via two sequential direct aminolysis ligation reactions.118 |
With the advent of this suite of ligation methods it should now be possible to disconnect the primary sequence of a glycoprotein at almost every ligation junction. For this reason it is anticipated that these methods, either alone or in combination, should aid in the rapid and efficient synthesis of an increased number of homogeneous glycopeptides and glycoproteins. Undoubtedly, success in this area would provide unparalleled opportunities for the study of these biomolecules at the molecular level, shedding light on the structural and functional implications of protein glycosylation.
This journal is © The Royal Society of Chemistry 2010 |