Matthew R.
Pratt
a and
Carolyn R.
Bertozzi
*abcd
aDepartment of Chemistry, University of California, California 94720
bDepartment of Molecular and Cell Biology, University of California, California 94720. E-mail: bertozzi@cchem.berkeley.edu; Fax: (510) 643-2628; Tel: (510) 643-1682
cHoward Hughes Medical Institute, University of California, California
dCenter for Advanced Materials, Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California
First published on 13th December 2004
Investigations into the roles of protein glycosylation have revealed functions such as modulating protein structure and localization, cell–cell recognition, and signaling in multicellular systems. However, detailed studies of these events are hampered by the heterogeneous nature of biosynthetic glycoproteins that typically exist in numerous glycoforms. Research into protein glycosylation, therefore, has benefited from homogeneous, structurally-defined glycoproteins obtained by chemical synthesis. This tutorial review focuses on recent applications of homogeneous synthetic glycopeptides and glycoproteins for studies of structure and function. In addition, the future of synthetic glycopeptides and glycoproteins as therapeutics is discussed.
Matthew Pratt | Matthew Pratt was born in Mesa, Arizona in 1976. He received his BS in biochemistry and mathematics from the University of Arizona in 1999. While an undergraduate, he worked for Prof. Robin Polt on glycosylation methodology and the synthesis of glycosylated analogs of enkephalin. He then entered the PhD program at the University of California Berkeley under the direction of Prof. Carolyn Bertozzi. His current research focuses on the chemical synthesis of glycoproteins and glycopeptide mimetics, developing assays for glycosyltransferase activity and inhibitor screens, and designing chemical tools to understand O-linked glycosylation of proteins. |
Carolyn Bertozzi | Prof. Carolyn Bertozzi received her PhD in Chemistry from UC Berkeley in 1993, working with Professor Mark Bednarski on the synthesis and biological activity of C-glycosides. She pursued postdoctoral research at UCSF with Professor Steven Rosen, studying the activity of endothelial oligosaccharides in promoting leukocyte adhesion at sites of inflammation. Prof. Bertozzi returned to Berkeley as a member of the faculty in 1996, where she is now Professor of Chemistry and Molecular Biology and a member of the Howard Hughes Medical Institute. Her research focuses on understanding and controlling changes in cell surface glycosylation associated with cancer, inflammation and bacterial infection. Prof. Bertozzi is a member of several Scientific Advisory Boards of biotechnology and pharmaceutical companies, and a co-founder of Thios Pharmaceuticals. She also serves on the editorial advisory boards of numerous journals, including J. Med. Chem., J. Org. Chem., Acc. Chem. Res. and J. Am. Chem. Soc. Prof. Bertozzi's awards include the Irving Sigal Young Investigator Award of the Protein Society, the ACS Award in Pure Chemistry, the Merck Academic Development Program Award, the Glaxo Wellcome Scholars' Award, the Presidential Early Career Award in Science and Engineering, the MacArthur Foundation Fellowship, the Camille Dreyfus Teacher-Scholar Award, the Arthur C. Cope Scholar Award, the Horace S. Isbell Award in Carbohydrate Chemistry, the Alfred P. Sloan Research Fellowship, the Donald Sterling Noyce Prize for Excellence in Undergraduate Teaching and the UC Berkeley Distinguished Teaching Award. She is also an elected member of the American Academy of Arts and Sciences. |
Glycosylation of proteins clearly allows biological systems to augment the information contained in a genome. For example, the single gene product tissue plasminogen activator (tPA) can exist in more than 100 discrete glycoforms with a distribution of activities.2 The potential utility of complex posttranslational modifications such as glycosylation is the ability to expand protein structure and function exponentially. Glycosylation represents a level of complexity that is not under direct genetic control and may be necessary for many complex processes in higher organisms. As an example, O-linked fucose found on epidermal growth factor (EGF)-like repeats in the developmental switch Notch has been shown to be necessary for correct signaling.3 Furthermore, the N-acetylglucosaminyltransferase Fringe, which modifies these O-linked fucose residues, was shown to modulate developmental patterning of wing cells in Drosophilamelanogaster.4 This discovery underscores the importance of even simple monosaccharide glycosylation patterns in multicellular biology.
Despite the obvious importance of oligosaccharides in a variety of biological processes, progress towards understanding their specific functions has been limited by their complexity and heterogeneity.5 Oligosaccharides are products of a template-independent biosynthetic pathway and, therefore, direct genetic methods for the expression of homogeneous glycosylation patterns do not exist. At best, the sites of glycosylation can be added or removed to probe the effects on function. Some genetic and biochemical modifications to the termini of glycans can produce limited structural definition,6,7 but, as with natively expressed glycoproteins, the structures obtained remain heterogeneous and difficult to characterize.
The only way to access glycopeptides and glycoproteins of defined structure is through chemical and enzymatic synthesis. The synthesis of glycopeptides and glycoproteins from readily available components is therefore an important goal. Unlike peptide and nucleic acid chemistry, carbohydrate chemistry is complicated by structural branching and varied stereochemistry found in large oligosaccharides. Furthermore, the formation of glycosidic bonds is a delicate matter requiring strictly anhydrous conditions that are incompatible with unprotected peptides and proteins. Despite these challenges, glycopeptides and glycoproteins with pendant glycosylation ranging from monosaccharides to large glycans have been prepared through chemical synthesis, chemoselective ligation and chemoenzymatic transformations. These tools and approaches have been reviewed extensively elsewhere.8,9 This review focuses on recent applications of synthetic glycopeptides and glycoproteins to studies of their structure and function, and as prospects for developing therapeutic agents.
Fig. 1 Six major classes of O-linked glycans. The Tn antigen of mucin-type O-linked glycans is outlined in the box. |
The α-O-GalNAc-Ser/Thr structure, commonly referred to as the Tn antigen, forms a biosynthetic foundation for eight core structures resulting from glycosylation at the C-3 and/or C-6 hydroxyl groups of GalNAc (Fig. 2). Of these, cores 1 and 2 are the most common core structures in mucin-type glycoproteins. Cores 3 and 4 are less abundant and are confined to mucins. The core structures can be elaborated with other monosaccharides such as sialic acid, fucose, and/or repeating units of Galβ1,4GlcNAc. Together with other modifications, such as sulfation, these elaborations give rise to highly complex structures often containing important recognition elements involved in cell–cell recognition.1
Fig. 2 Eight known O-linked mucin-type core structures. |
Synthesis of mucin-type glycopeptides is commonly accomplished by incorporation of a suitably protected O-glycosyl amino acid into a polypeptide by solid-phase peptide synthesis (SPPS). Of the two standard methods, 9-fluorenylmethoxycarbonyl (Fmoc)-based chemistry is more often employed than tert-butyloxycarbonyl (Boc)-based chemistry for the SPPS of glycopeptides. The sequential removal of base-labile Fmoc protecting groups for peptide elongation is compatible with the presence of acid-sensitive glycosidic bonds and avoids repeated exposure to trifluoroacetic acid and final deprotection with hydrogen fluoride, common to Boc-based methods. The hydroxyl groups of the carbohydrates are typically protected as acetyl or benzoyl esters that can be removed by treatment with sodium methoxide or hydrazine after the peptide has been cleaved from the solid support. Obtaining the appropriate O-glycosyl amino acid is the major challenge to glycopeptide synthesis.
Although the assembly of peptides containing simple α-GalNAc residues is now relatively straightforward, the routine construction of glycopeptides bearing more elaborate O-linked glycans remains a tremendous challenge. The chief obstacle in the synthesis of complex O-glycosyl amino acids is obtaining high α-selectivity in the formation of the O-Ser/Thr mucin-type linkage. Even with simple monosaccharide donors, establishing α-selective conditions for the glycosylation reaction can be trying, and this variability is exaggerated when dealing with large oligosaccharide donors to prepare complex structures found on native mucins. The α-O-GalNAc-Ser/Thr building blocks required for SPPS of mucin-type glycopeptides are generally prepared by glycosylation on the appropriate serine or threonine derivative. The use of a 2-azidogalactose halo- or thioglycoside donor ensures high α-selectivity in the glycosylation reaction (Fig. 3). Conversion of the 2-azido group to an N-acetamido group provides the Fmoc-protected GalNAc-Ser/Thr amino acids. Since these building blocks are now commercially available, simple O-linked glycopeptides are readily accessible to those with experience in SPPS. The size limitation on synthetic glycopeptides is imposed by the technical constraints of SPPS.
Fig. 3 Synthesis of α-O-GalNAc-serine and threonine building blocks for SPPS of O-linked glycopeptides. |
Fig. 4 Glycopeptides prepared by solid-phase peptide synthesis (SPPS) bearing the tumor-associated antigens Tn, TF, and 2,6-sialyl TF (STF) on 1. |
Fig. 5 Chemoenzymatic syntheses of a sulfate- and sialyl Lewis x (sLex)-modified PSGL-1 glycopeptide perfomed by Cummings (a) and Wong (b) and their coworkers. Synthetic glycopeptides (2 or 5) were sequentially treated with the appropriate sugar donors and glycosyltransferases; galactosyl transferase (GalT), N-acetylglucosaminyltransferase (GlcNAcT), sialyltransferase (SiaT), fucosyltransferase (FucT), 3′-phosphoadenosine-5′-phosphosulfate (PAPS) and tyrosylprotein sulfotransferase-1 (TPST-1). |
Fig. 6 Essential determinants of PSGL-1 for binding P-selectin. P-selectin is an endothelial adhesion molecule comprising a C-type lectin domain, an epidermal growth factor (EGF) domain, and a series of consensus repeats similar to complement regulatory proteins. P-selectin binding to the N-terminus of PSGL-1 requires sulfotyrosine residues and an sLex glycan. The sialic acid and fucose residues of sLex interact strongly with the lectin domain. |
Wong and coworkers accomplished a similar chemoenzymatic synthesis of PSGL-1 fragment 6 (Fig. 5b).29 Enzymatic glycosylations were used to transform a monosulfated glycopeptide (5) carrying an α-O-linked disaccharide rather than a simple monosaccharide. Although this route required the synthesis of a more complicated disaccharide glycosyl amino acid (4), it is advantageous in that it does not require either the core 1 β1,3-galactosyltransferase (GalT) or the core 2 β1,6-N-acetylglucosaminyltransferase (GlcNAcT), both of which are not commercially available, to create the sLex hexasaccharide moiety. The synthetic glycopeptides generated by the above method are interesting prospects for anti-inflammatory therapy, as they bind and inhibit P-selectin with potencies superior to simple sLex glycans.
In addition to functioning as adhesion ligands, protein-bound glycans can greatly impact the immunogenic properties of many antigens.30 Many viral envelope proteins are glycosylated and use this characteristic to avoid immune detection.31,32 Tumor cells often display dramatic changes in glycosylation patterns, and these tumor-associated structures may be recognized by the immune system as tumor-specific antigens.33 In order for these antigens to elicit an adaptive immune response, they must be processed by antigen presenting cells, such as dendritic cells (DCs), and displayed to T cells in the context of a complex with major histocompatability (MHC) molecules. The T cells that recognize the glycopeptide antigen are stimulated to proliferate and can then assist in the immune reaction against the virus-infected cell or tumor. T cells that recognize a particular antigen can be cultured in vivo (termed a T cell hybridoma) and used to study the specificity of their receptors for related antigens. Despite the clear ability of glycans to modulate protein structure and function, processing of glycoproteins by antigen presenting cells for presentation to T cells has not been well studied. Recently, synthetic and enzymatically elaborated glycopeptides based on the tumor antigen MUC1, a cell surface glycoprotein, were used to probe the processing and presentation of glycopeptides by DCs.34 It was shown that DCs endocytose MUC1 glycopeptides of various lengths, transport them to compartments for processing into smaller glycopeptides, and present them on MHC II molecules without removal of the glycans. This suggests that a repertoire of carbohydrate-specific T cells can also be elicited against glycoprotein antigens.
To probe the fine specificity of T cells against glycopeptides, Meldal and Werdelin tested peptides 7 and 8 and glycopeptides 10–16 for cross-reactivity with a T cell clone specific for glycopeptide 9 (Fig. 7).35 The T cells proved to be extremely specific in their recognition of the α-GalNAc residue, displaying only mild cross-reactivity with glycopeptide 10, where the α-GalNAc was bound to Ser rather than Thr. Remarkably, glycopeptide 13, with only the subtle stereochemical change from α-GalNAc to α-GlcNAc, showed no cross-reactivity. Thus, it can be concluded that glycopeptides can elicit T cell activation in a carbohydrate-dependent manner that is extremely specific for the structure of the carbohydrate moiety. Because many pathogenic microbes utilized glycosylation to evade the immune system, the knowledge that T cells can recognize glycopeptides in a glycosylation specific manner is important for a complete understanding of the immune response.
Fig. 7 Synthetic glycopeptides used to probe antigen processing and presentation. Peptides 7 and 8 and glycopeptides 9–16 were prepared by SPPS. T-cell hybridomas were raised against glycopeptide 9 bearing an α-GalNAc residue. All other (glyco)peptides were used to probe the specificity of the above hybridomas. |
Because of the aberrant glycosylation patterns characteristic of some types of cancer, there has been increasing interest in generating antitumor vaccines based upon these abnormal glycan structures. For example, Danishefsky and coworkers synthesized several glycopeptides represented by 1 (Fig. 4) and 17 (Fig. 8).36 These glycopeptides contained clustered carbohydrates corresponding to the Tn, TF, and 2,6-STF antigens described earlier, as well as the Lewis y (Ley) antigen (Fig. 8). Glycopeptides represented by 1 and 17 generate robust antibody responses that cross-react with the same antigen expressed on tumor cells.37 Although it is difficult to determine with certainty the factors that contribute to immunogenicity, it appears that a mucin-like oligomeric display of the carbohydrate antigens is critical. Only antibodies raised against the Ley-elaborated peptide 17 reacted with both clustered Ley-mucin glycoproteins and monomeric Ley-ceramide. These clustered glycopeptides and others are undergoing evaluation as vaccines for several types of cancer.36
Fig. 8 Glycopeptides prepared by SPPS bearing the tumor-associated antigen Lewis y (Ley). |
The methods described above for the construction of O-linked glycopeptides permit access to structurally diverse but relatively short glycopeptide fragments (∼20 amino acids). Naturally occurring mucin-type oligosaccharides are typically present on proteins that far exceed this size. To surpass the size limits inherent in linear SPPS, the coupling of peptide fragments by native chemical ligation (NCL) technology has found widespread use.38 The ligation of two unprotected peptide segments, one bearing a C-terminal thioester and the other an N-terminal cysteine residue, affords the product peptide with a native amide bond at the ligation site. NCL is efficient and highly chemoselective, and the reaction conditions are entirely compatible with glycans and native proteins. For these reasons, the extension of NCL to glycoprotein synthesis presents an ideal solution for accessing large glycoprotein structures.
Lymphotactin (Lptn) is a 93-residue chemokine that serves as a potent chemoattractant for both T cells and natural killer cells.39 With a small mucin-like domain located at its C-terminus, Lptn is unusual, for relatively few chemokines are extensively O-glycosylated. Lptn is readily dissected by the NCL strategy into two synthetic peptides: a 47-residue peptide α-thioester (18) and a 46-residue glycopeptide (19) with eight α-GalNAc residues (Fig. 9a). The thioester (18) was synthesized using traditional Boc-based SPPS methods,40 and Fmoc-based synthesis with α-O-GalNAc-Ser/Thr afforded the lymphotactin mucin domain. Ligation of the two fragments cleanly gave the glycosylated chemokine, which was biologically active in a standard calcium mobilization assay.39 Synthesis of this glycoprotein by NCL has provided milligram quantities of homogeneous Lptn for structural and functional studies. Since Lptn has immunostimulatory properties, this chemokine may find therapeutic use in the future.
Fig. 9 Native chemical ligation (NCL) of peptide thioesters to N-terminal cysteinyl peptides provides full-length glycoproteins lymphotactin (a) and diptericin (b). |
A chemically defined version of diptericin, an 82-residue antimicrobial glycoprotein from insects, has also been prepared by NCL (Fig. 9b). Containing a proline-rich sequence similar to the antimicrobial peptide drosocin and an attacin-like domain, this modular antimicrobial peptide carries potential O-linked glycosylation sites at Thr11 and Thr54.41 Diptericin entirely lacks cysteine, thus Gly25 was strategically changed to the cysteine residue required for the NCL reaction. Positioned between the drosocin- and attacin-like domains, this disconnection also made possible investigation of the isolated domains for biological activity. For generation of the acid- and base-sensitive N-terminal glycopeptide-α-thioester 20, conventional Boc- and Fmoc-based methods to prepare the thioester could not be used. To overcome this obstacle, Fmoc-based SPPS was performed on a sulfonamide “safety-catch” resin developed by Ellman and coworkers that allowed the release of peptide thioesters under mild conditions by nucleophilic addition of thiols.42 Glycopeptide thioesters for NCL can be routinely prepared by this method. Removal of side-chain protecting groups yielded glycopeptide-α-thioester 20, which was ligated to the glycopeptide fragment 21 generated by Fmoc-based SPPS. NCL efficiently produced the full-length glycoprotein 22, which inhibited bacterial growth with an IC50 of 2.70 ± 0.30 μM, similar to the potency of synthetic native diptericin previously prepared in our laboratory.43 Glycopeptides such as diptericin appear to have broad spectrum activity against numerous bacteria, and are thus attractive lead compounds for new antibiotics that might avoid resistance.
Fig. 10 The three classes of N-linked glycans: (A) complex, (B) high mannose, (C) hybrid. |
The functions of N-linked glycosylation are wide-ranging and not understood in the case of every protein. However, they can be divided into two types: intra- and extracellular. Intracellularly, the broad function of N-linked glycans is protein folding and trafficking.46 This is exemplified by the protein folding quality control mechanism of the cell that involves a chaperone system found in the ER of nearly all eukaryotes, the calnexin–calreticulin cycle. Calnexin and calreticulin are related ER lectins that interact with N-linked structures bearing a single glucose residue that is exposed by the trimming action of glucosidases I and II.47 Calnexin or calreticulin binding slows the trafficking process and allows for proper protein folding and disulfide bond isomerization.48 While the glycoprotein is retained in the ER, glucosidase II removes the last glucose residue of N-linked glycans and causes release of the protein from the lectin-like chaperones. In a poorly understood process, a glucosyltransferase simultaneously acts as a folding “sensor”, apparently replacing a glucose residue only on improperly folded proteins, causing their binding to calnexin and calreticulin and return to the folding cycle of the ER.49
Extracellularly, N-linked glycans can function as structural elements and as ligands for receptors. Structurally, the large and flexible N-linked glycans increase protein stability by restricting the conformational flexibility of the underlying protein without sacrificing the net entropy of the system. A thermodynamic study performed by Robertson and co-workers with an ovomucoid protein domain demonstrated that two glycans found on Asn10 and Asn52 increased the melting temperature of the 68-residue polypeptide by 4.8 °C.50 An N-linked glycan can also affect the local structure and stability of a protein, as revealed by an NMR structure of a soluble form of human CD2 solved by Wagner and coworkers.51 CD2 is a cell surface glycoprotein present on T lymphocytes and natural killer cells, and the attachment of an N-linked glycan at Asn65 is necessary for its binding to CD58. As measured from nuclear Overhauser effects (NOEs), the protein-proximal GlcNAc-GlcNAc disaccharide was in close contact with a cluster of charged and polar residues located on one face of a β-sheet. The proper orientation of this sheet within CD2 was required for folding of the CD58 binding site. In this case, as well as others,52 the glycan acts in concert with the polypeptide to orchestrate the overall structure and function of the protein.
N-Linked glycans can also function extracellularly as ligands for carbohydrate receptors, as is highlighted by the glycoprotein growth factors and hormones. For example, erythropoietin (EPO) is synthesized by the kidneys and circulated in the blood to stimulate red cell proliferation and differentiation in bone marrow. The carbohydrates on EPO consist of one O-linked and three N-linked glycans to make up 40% of the protein's total weight. Fukuda and co-workers have shown that variation in the carbohydrate content of the N-linked glycans alters the serum half-life of EPO and thus alters its activity in vivo.53 Asialo-erythropoietin has no measurable activity in vivo due to rapid clearance from the bloodstream. EPO glycoforms containing N-acetyllactosamine (LacNAc) repeats were similarly cleared from serum circulation in a rapid fashion. In both cases, exposed galactose residues were recognized by the hepatic asialoglycoprotein receptor, and the hormone was internalized by endocytosis and degraded in the lysosome. Thus, EPO must be glycosylated in a very specific fashion to retain activity in vivo.
Fig. 11 Peptide fragments from the extracellular domain of the nicotinic acetylcholinesterase receptor (nAChR) (23, 24 and 25) prepared by SPPS. Several glycoforms were constructed by the sequential action of Endo-M on 24 to produce 26 followed glycosidase digestion to afford 27 and 28. |
In addition to the chemoenzymatic methology described above, the convergent coupling of synthetic glycosylamines to the aspartyl side chains of unprotected peptides has been utilized to explore the effects of N-linked glycosylation on peptide conformation. Danishefsky and coworkers have applied this approach to short peptides bearing very large N-linked oligosaccharides.55 Pentasaccharide 29 was synthesized using the “glycal assembly method”,56 converted to glycosylamine 30, and then coupled to a pentapeptide to produce glycopeptide 31 in good yield (Fig. 12). These synthetic achievements have enabled landmark studies of the stereochemical communication between the carbohydrate and peptide domains. NMR studies undertaken by Live and coworkers compared two glycopeptides, differing only in the absolute stereochemistry of the amino acids (L-peptide vs.D-peptide).55 Both peptides adopted a type I β-turn, but there were measurable differences between the stereochemically “matched” peptide and the “mismatched”. Thus, communication between the carbohydrate and polypeptide domains of a glycoprotein is not solely based upon the bulk of the carbohydrate. Rather, specific interactions between the polypeptide and the carbohydrate are governed by their precise structures.
Fig. 12 Synthesis of a pentasaccharide-modified glycopeptide (31): a) (NH4)HCO3, H2O (95%); b) amide formation: HOBt and HATU in DMF (40% yield). |
Fig. 13 Synthesis of a calcitonin derivative with a complex-type N-linked glycan by transglycosylation. Endo-M transfers the biantennary glycan from the donor glycosyl amino acid (STF-GP) derived from transferrin, liberating GlcNAc-Asn. |
Fig. 14 Structures of peptide 32 and the corresponding glycopeptide 33. |
This journal is © The Royal Society of Chemistry 2005 |