Yixuan
Xie
a,
Siyu
Chen
a,
Qiongyu
Li
a,
Ying
Sheng
b,
Michael Russelle
Alvarez
d,
Joeriggo
Reyes
e,
Gege
Xu
a,
Kemal
Solakyildirim
af and
Carlito B.
Lebrilla
*ac
aDepartment of Chemistry, University of California, Davis, California, USA
bDepartment of Chemistry, Biochemistry, Molecular, Cellular and Developmental Biology Graduate Group, University of California, Davis, California, USA
cDepartment of Biochemistry, University of California, Davis, California, USA
dInstitute of Chemistry, University of the Philippines, Los Baños, Laguna, Philippines
eMarine Science Institute, University of the Philippines, Diliman, Quezon City, Philippines
fDepartment of Chemistry, Erzincan Binali Yildirim University, Erzincan, Turkey. E-mail: cblebrilla@ucdavis.edu
First published on 18th May 2021
A cross-linking method is developed to elucidate glycan-mediated interactions between membrane proteins through sialic acids. The method provides information on previously unknown extensive glycomic interactions on cell membranes. The vast majority of membrane proteins are glycosylated with complicated glycan structures attached to the polypeptide backbone. Glycan–protein interactions are fundamental elements in many cellular events. Although significant advances have been made to identify protein–protein interactions in living cells, only modest advances have been made on glycan–protein interactions. Mechanistic elucidation of glycan–protein interactions has thus far remained elusive. Therefore, we developed a cross-linking mass spectrometry (XL-MS) workflow to directly identify glycan–protein interactions on the cell membrane using liquid chromatography-mass spectrometry (LC-MS). This method involved incorporating azido groups on cell surface glycans through biosynthetic pathways, followed by treatment of cell cultures with a synthesized reagent, N-hydroxysuccinimide (NHS)–cyclooctyne, which allowed the cross-linking of the sialic acid azides on glycans with primary amines on polypeptide backbones. The coupled peptide–glycan–peptide pairs after cross-linking were identified using the latest techniques in glycoproteomic and glycomic analyses and bioinformatics software. With this approach, information on the site of glycosylation, the glycoform, the source protein, and the target protein of the cross-linked pair were obtained. Glycoprotein–protein interactions involving unique glycoforms on the PNT2 cell surface were identified using the optimized and validated method. We built the GPX network of the PNT2 cell line and further investigated the biological roles of different glycan structures within protein complexes. Furthermore, we were able to build glycoprotein–protein complex models for previously unexplored interactions. The method will advance our future understanding of the roles of glycans in protein complexes on the cell surface.
The inherent problems in the analysis are mostly due to the low abundance of glycoproteins and the large diversity of glycoforms that further complicates the analysis. With the development of proteomic and glycoproteomic techniques, the methods that characterize protein complexes and networks on cell membranes involving glycan–protein interactions have been explored. Paulson, Kohler, and their respective colleagues have developed diazo-bearing sialic acid reporters to capture cis- and trans-interactions of CD22.9–11 They used the metabolic labeling method to incorporate diazo groups into sialic acids. Proteins that associate with sialic acids can then be captured by producing reactive carbenes through ultraviolet irradiation. However, only isolated binding-pairs can be characterized because the method relied on purification by gel electrophoresis. To explore proteome-wide interactions, we have developed the protein oxidation of sialic acid environments (POSE) method to identify potential sialic acid-associating proteins in situ.12 Frei et al. employed a trifunctional chemoproteomic cross-linker and created a prominent ligand-based receptor-capture (LRC) technology to identify glycan-binding proteins.13 The targets of vaccinia viruses on the human cell surface have been successfully illustrated by this method. The same group further improved the method termed HATRIC-LRC, which successfully identified the receptors of endothelial growth factor receptor (EGFR) antibody and Holo-transferrin (TRFE) from cells.14 However, glycan information within the interactions could not be obtained, and the glycan-mediated protein networks on the cell surface were absent. Moreover, the method involves the use of periodate, which can potentially disrupt the interaction of native glycans and proteins.
Herein, we developed a cross-linking method to characterize glycan–peptide cross-linked (GPX) products coupled in cross-linking reactions, which provided direct information on the sialic acid-mediated protein networks on cell membranes (Fig. 1). In our method, N-azidoacetylmannosamine (ManNAz) was metabolically incorporated into cell surface sialic acid-containing glycoproteins through the de novo biosynthetic pathway. This bioorthogonal reporter was further conjugated with the cyclooctyne functional group by the addition of a synthesized N-hydroxysuccinimide–cyclooctyne (NHS–cyclooctyne) reagent. An NHS ester-activated carboxylic acid reagent was further added to react with primary amines on the lysine side chains of nearby proteins. After the glycan–protein cross-linking step, the cells were lysed and tryptic digested through a previously reported workflow.15 The glycan–peptide cross-linked (GPX) pairs were purified using the reverse-phase column and the strong cation exchange (SCX) cartridge and analyzed by reversed phase liquid chromatography coupled with a high-resolution Orbitrap mass spectrometer. Enriched GPX pairs were fragmented using high energy collision-induced dissociation (HCD) for identification through a modified proteomics workflow. Although the NHS–cyclooctyne cross-linker is not MS-cleavable, the glycans in the GPX pairs can be fragmented during tandem MS/MS, which makes the pairs behave as MS-cleavable cross-linked peptide pairs with different glycan compositions. The complexity derived from glycan heterogeneity daunted the direct identification using the search software. To address this problem, we obtained site-specific glycopeptide information from the specific cell line through glycoproteomic analysis using LC-MS. In this way, glycans containing SiaNAz were mapped with protein site information allowing us to narrow down the glycopeptides, thereby largely avoiding misidentification from searching a much larger combinatorial space of peptides. By integrating the glycomic and glycoproteomic results with available cross-linking software (MeroX), we could unambiguously identify the GPX products.16 The characterization of the GPX products coupled in cross-linking reactions provided direct information regarding the sialic acid-mediated protein networks on cell membranes. We further illuminated the glycan–protein networks present on the PNT2 cell membrane which mediate the function of cells.
We then optimized the workflow with experiments using a selected target protein. In these experiments, bovine serum albumin (BSA) was modified with the cross-linker to yield proteins with pendant cyclooctyne (Fig. 2a). The modified protein was then introduced to the cell culture to allow the cyclooctyne to react with SiaNAz-containing glycans on the cell membrane. Tryptic digestion was then performed on the cell lysates to yield the GPX products containing a glycopeptide from a cell membrane glycoprotein and a BSA peptide. The resulting pairs were analyzed by LC-MS, and the generated data were used to evaluate various cross-linking software that are currently available.
The best suited informatic tools for identifying peptide–peptide cross-links (PPX) with LC-MS included pLink, XlinkX, and MeroX.16,18,19 Based on our evaluation, MeroX was found the most effective for identifying GPX products. Using MeroX, the false-discovery rate (FDR) was decreased to less than 5% while filtering out peptide scores less than 80 (Fig. 2b). To confirm the reproducibility of the method, BSA-cross-linked samples were analyzed in triplicates with more than 60% overlap of identified GPX pairs obtained (not shown).
We also optimized the collision conditions in the higher-energy C-trap dissociation (HCD) for fragmenting GPX pairs. The HCD method was then sufficient for fragmenting GPX pairs and yielded high quality information for confident identification. For example, the MS/MS spectrum of a GPX product shown in Fig. 2c readily identified the glycopeptides as the membrane glycoprotein ITGAV (integrin alpha V) cross-linked to a peptide of BSA. Other representative spectra are shown in Fig. S2.† The glycopeptide–peptide cross-link did not behave like the more common peptide–peptide cross-link. The former, due to its significantly higher mass, required more energy to fragment. Controlling the HCD energy without strong attenuation of the signal proved difficult. Therefore, the tandem MS appeared over-fragmented compared to the peptide–peptide cross-link spectrum in some cases, but were nonetheless sufficient to identify the sequences. It was noted that the presence of the cross-linking reaction on lysine or arginine resulted in a missed cleavage site. We also examined other factors including the incorporation of ManNAz, the efficacy of cross-linking reaction on cell lines, and the enrichment of GPX products to validate the method further. Details of these experiments are provided in the ESI.†
The target proteins were presumed to be some type of sialic acid-binding proteins. Indeed, functional analysis of the target proteins yielded binding proteins, with both anionic and cationic binding functions. The proteins with anion binding functions were consistent with glycan-mediated interactions and the anionic nature of sialic acids, while the proteins with cation binding functions were mostly associated with an alkali or alkaline earth metal and act like typical C-type lectins. Comparison of the target proteins to sialic acid-binding proteins identified earlier through oxidative proximity labeling experiments yielded a nearly 50% overlap in the total proteins identified (Fig. S4†).12 Additionally, a greater than 70% overlap was observed when comparing the protein complexes identified here and in the interaction that was previously identified, which confirmed the consistency of the results and suggested sialic acids may mediate these interactions. Meanwhile, the other 30% of the proteins not found as sialic acid binding proteins in published databases should be assigned as potential glycan-binding proteins (Fig. S5†).
Interaction maps were constructed using Cytoscape and visualized by grouping the source glycoproteins (Fig. 3a) and their protein targets (Fig. 3b). From the interaction maps, we noted the emergence of hub proteins that interacted through glycans with several other proteins simultaneously. The hubs were either glycoproteins that radiated their glycans toward many other proteins (outward hubs), or proteins that interacted with many glycoproteins through the other proteins' glycosylation (inward hub). The presence of these hubs suggested that key proteins control or are controlled by other proteins through the sialic acid-mediated interaction network.
Outward hubs were glycoproteins associated with many target proteins through their sialylated glycans. Sialic acids likely play a role in the function of these glycoproteins. Integrins, especially ITB1 (integrin beta-1) and ITA3 (integrin alpha-3), were noted to be large outward hubs. Integrins are known to participate in many biological functions including binding, signaling, and cell adhesions.20 Other sialylated glycoproteins including AMPN (aminopeptidase N), CD166, L1CAM (neural cell adhesion molecule L1), LAMP1, and LAMP2 (lysosome-associated membrane glycoprotein 1 and 2) were also noted to be prominent outward hubs. The biological functions of source proteins and target proteins were further analyzed using the STRING software. As shown in Fig. S6,† nearly 50% of the source proteins were involved in cell adhesion, while only 12% of the target proteins were found with similar functions, suggesting the unique role of sialic acid in cell adhesion and migration. This result demonstrated hub glycoproteins likely play major roles in cellular attachments. Indeed, the sialylated CD166 protein, a dominant outward hub, is known to be involved in inhibiting monocyte cell migration.21 Additionally, the two LAMP proteins were reported to mediate cell–cell adhesion through sialic acids.22,23
The non-glycosylated protein AHNAK (also known as desmoyokin) was found to be an inward hub and a common target of many glycoproteins. This very large protein has been shown to be important with a multitude of roles including as a structural and signaling protein on cell membranes.24 It was previously reported that the downregulation of AHNAK was accompanied with high tumor potential.25 Proteins including FLNB (filamin-B), and TLN1 (talin-1) were also found to be inward hubs. FLNB and TLN1 were reported to interact with integrins, and the inhibition of sialylation was reported to decrease their association.26
Interestingly, glycosylated proteins were also found as inward hubs. For example, EGFR was a dominant outward connected protein such as integrins; however, it was also the glycan target of several integrins, suggesting that the interactions between these proteins are extensively mediated by glycans. We compared all the source proteins and target proteins with the lipid raft database,27 and over 80% and 60% of the source and target proteins can be found from the lipid rafts, respectively (not shown). This result is consistent with the previous observation from confocal microscopy and supports the evidence that glycoproteins are often associated with the lipid raft microdomains on the cell membrane.12 Overall, our results from the network integration analysis demonstrated the emergence of hub proteins that appear to play a central role in the sialic acid-mediated interactome.
We also investigated the effect of the degree of sialylation and fucosylation on protein association using the same base glycan structures with different amounts of sialylation and fucosylation. For example, Hex(5)HexNAc(4)Sia(1) is a biantennary monosialylated structure, while Hex(5)HexNAc(4)Sia(2) is the same base glycan composition with an additional sialic acid. For the two glycans, only a small fraction (4 out of 22) of their target proteins overlap, specifically ITA2 (integrin alpha-2), FLNB (filamin-B), PALLD (palladin), and K22E (keratin, type II cytoskeletal 2 epidermal). However, the same glycans when observed over replicates generally yielded over 60% similarities.
We further investigated the potential biological significance of increasing (or decreasing) the extent of sialylation. For example, 65% (11 out of 17) of the proteins that interacted with di-sialylated glycan (Hex(5)HexNAc(4)Sia(2)) having cell surface receptor signaling function, while only 30% (10 out of 29) of mono-sialylated glycan (Hex(5)HexNAc(4)Sia(1)) targeted proteins with similar functions. This result suggested that the change of glycoprotein targets might be the requirement for some cellular processes. Indeed, the activity of some cell signaling proteins was previously found depending on the degree of cell surface sialic acid.29,30 Similarly, we examined the role of multiple fucosylation on the selection of target proteins by examining glycan compositions with the same base structures and varying amounts of fucosylation. The first fucose is generally on the core (with exceptions), while the second and third are in the antennae. The effect of multiple fucosylation was more difficult to establish; however, we noted that glycans with only core fucose and glycans with antennary fucose had different preferences for selecting targets. The proteins CTNA2 and CTNB1 were previously found to interact with α(1–2)-linked antennary fucose specifically.31 Indeed, CTNA2 (catenin alpha-2) and CTNB1 (catenin beta-1) were only captured by glycans with antennary fucose (Hex(5)HexNAc(4)Fuc(2)Sia(1)), but not by glycans with only core fucose (Hex(5)HexNAc(4)Fuc(1)Sia(1)).
Further utility of this method was to further probe the role of glycans in protein complexes on the cell membrane. Based on the GPX data, we identified protein complexes that are known to interact from previous literature and can now be characterized by their glycan–protein interactions. We used the GPX results to map known protein–protein interactions on the cell membrane while inserting glycans into their respective interactions. For example, the Hex(5)HexNAc(4)Fuc(1)Sia(1) and Hex(5)HexNAc(4)Fuc(1)Sia(2) glycans on ITGB1 Asn97 were found to cross-link with the RAP1B protein at Lys104. ITGA5 (integrin alpha-5) has been previously shown to interact with ITGB1 (integrin beta-1) using X-ray crystallography.32 From the STRING database, RAP1B was predicted to interact with ITGB1 although the nature of this interaction is not well defined. We employed DisVis to corroborate the cross-linking results using distance restraints between the glycosylation site (Asn97) and the cross-linking site (RAP1B-Lys104) calculated with specific glycan lengths (10–30 Å).33 Putative ITGAV-ITGB1-RAP1B complexes were then generated using the HADDOCK software (Fig. S8†).34 Out of 66 structures, the complex with the highest HADDOCK score was subsequently glycosylated using the CHARMM-GUI glycan modeler generating two complexes each glycosylated with Hex(5)HexNAc(4)Fuc(1)Sia(1) and Hex(5)HexNAc(4)Fuc(1)Sia(2) (Fig. 4a). To show how the ITGB1-Asn97 glycans (Hex(5)HexNAc(4)Fuc(1)Sia(1) and Hex(5)HexNAc(4)Fuc(1)Sia(2)) may interact with RAP1B, we mapped the residue interactions in Chimera after applying contact parameters (VDW overlap of −0.4 Å).35 The interaction model of RAP1B to the integrin complex suggested that the RAP1B molecule was rich in clusters of charged residues (Lys and Arg) that attracted sialic acids on the ITGB1, which can increase their binding affinities. The global structural similarity was quantified by using root-mean-square deviation (RMSD), and we observed that the ITGAV-ITGB1-RAP1B complex does not show significant conformational changes between the two glycoforms Hex(5)HexNAc(4)Fuc(1)Sia(1) and Hex(5)HexNAc(4)Fuc(1)Sia(2) (Cα RMSD = 0.004 Å, 1195 atoms). This observation was consistent with the previous molecular dynamics studies where the minimal differences in protein conformation and dynamics between glycosylated and deglycosylated proteins were noted.34 Meanwhile, we observed differences in residue contacts between Hex(5)HexNAc(4)Fuc(1)Sia(1) and Hex(5)HexNAc(4)Fuc(1)Sia(2), with the latter having fewer interactions (Fig. 4b) potentially contributing to the difference in biological activity. Overall, these GPX results offered unprecedented views of the interactions between proteins that are mediated by specific glycan types on cell membranes.
Finally, we identified a potential functional relevance of cell surface sialic acid-mediated protein networks. While it is not yet possible to interrupt a single glycoprotein–protein interaction, we may examine the general systemic effect by disrupting the sialic acid network and performing a standard cellular assay such as cell migration. We treated PNT2 cells with 3-fluorinated sialic acid, a sialyltransferase inhibitor, to reduce sialylation on the cell surface. As a result, PNT2 cell migration was significantly increased after the treatment (Fig. S9†), which demonstrated that the loss of sialic acids enhanced the growth of cells to make them more tumorigenic. Indeed, we previously found that cancerous prostate cells (LNCaP cells) have limited sialic acid expression compared to the normal prostate cells (PNT2 cells).36
A further limitation was the lack of available software to identify GPX products in the LC-MS/MS data. The fragmentation patterns of GPX products are more complicated and require knowledge of glycan fragmentation. A protein–protein cross-linking software tool (MeroX) was adapted by treating the whole glycan plus linker as a single component in a peptide–peptide cross link. Each glycan composition was therefore a different cross-linker and was searched individually. For example, a bi-antennary monosialylated glycan (Hex(5)HexNAc(4)Sia(1)) with composition and associated mass was deemed a different cross-linker from the bi-antennary glycan with a fucose (Hex(5)HexNAc(4)Fuc(1)Sia(1)). We generally examined the most abundant SiaNAzylated glycoforms expressed on the PNT2 cell membrane. This corresponded to nine glycoforms that were individually annotated using the MeroX software.
The proteins identified by GPXs were all mediated by sialic acids. Thus, the majority of the target proteins were likely sialic-acid binding proteins. Indeed, a comparison of the target proteins in this method and sialic acid binding proteins identified on the cell membrane of the same cell line by proximity oxidative labeling showed at least a 50% overlap between the two methods.12 However, although sialic acids were the basis of the cross-linking reaction, the interactions could have been driven by other glycan features present as part of the total structure. Thus, a terminal galactose or terminal fucose could have been the mediator of the glycan–peptide interaction while the sialic acid was merely an observer that produced the cross-link. The GPX between proteins, mediated by the glycans, could therefore be potentially structurally broader involving other types of (non-sialylated) glycan–peptide interactions. On the other hand, interactions among glycans with no sialic acid were not represented here. However, cross-linking of activated residues such as those based on fucose and N-acetylglucosamines could be performed in the future using the methods developed for sialic acid.
GPX has revealed the potential roles of specific glycoforms and provided additional clues regarding the large heterogeneity often associated with glycan structures. For example, fucose is a key monosaccharide residue involved in many biological processes, most notably recognition. However, the role of multiple fucosylation ranging from zero to greater than four on an N-glycan is a mystery. We can at least understand the difference between no fucose and one. The first fucose is generally found as a (1,6)-link on the chitobiose core. The complete loss of core fucosylation is generally lethal to humans.41 We found that adding a core fucose increased trans protein interactions. Thus, core fucosylation is potentially necessary to increase trans protein interaction, while no fucose can limit these interactions to favor cis-protein interactions. Further analysis of other proteins and other cell lines may similarly reveal the role of multiple fucosylation and multiple sialylation.
Sialic acids themselves are similarly important monosaccharides and are often found in cell membranes. The loss of sialic acid is not lethal as some cell lines are devoid of sialic acid in their N-glycans; however, altering the overall amount of sialic acids or sialylated glycans have been correlated with diseases such as cancer.42 Indeed, a decrease in sialic acid glycosylation has been associated with cancer progression. In the cell lines studied here, suppressing sialylation of glycans increased the tumorigenic potential of the cell. Increasing the number of sialic acids likely increases the strength of the glycan–peptide interaction. The most striking feature of the GPX results is the highly interactive networks of membrane proteins mediated by sialic acids and the rise of specific protein hubs. The outward hubs are glycoproteins that simultaneously interact with a large number of proteins through their glycans. Thus, integrin beta-1 (ITB1) and its partner integrin subunit alpha 3 are major hub proteins providing glycans that interact with the largest number of proteins (49 and 42 distinct proteins, respectively). As these proteins play important roles as receptors and in signaling, their high interactivity is consistent with these roles. However, proteins such as AMPN, which are known primarily to aid in the digestion of peptides from proteins as part of the gastric and pancreatic processes appear to similarly be highly interactive. AMPN may be an important protein with other functions that are yet to be revealed. Alternatively, the GPX may simply identify the peptides that were cross-linked in the process of being digested. Nonetheless, these interactive maps may eventually reveal the roles of the hubs in cells and the central roles they play in cellular functions.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1sc00814e |
This journal is © The Royal Society of Chemistry 2021 |