Michael Russelle S. Alvareza,
Xavier A. Holmesa,
Armin Oloumia,
Sheryl Joyce Grijaldo-Alvarez
a,
Ryan Schindlera,
Qingwen Zhoua,
Anirudh Yadlapatia,
Atit Silsirivanitc and
Carlito B. Lebrilla
*ab
aDepartment of Chemistry, University of California, Davis, Davis, California, USA. E-mail: cblebrilla@ucdavis.edu
bDepartment of Chemistry, Biochemistry, Molecular, Cellular and Developmental Biology Graduate Group, University of California, Davis, Davis, California, USA
cDepartment of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
First published on 4th April 2025
The processes involved in protein N-glycosylation represent new therapeutic targets for diseases but their stepwise and overlapping biosynthetic processes make it challenging to identify the specific glycogenes involved. In this work, we aimed to elucidate the interactions between glycogene expression and N-glycan abundance by constructing supervised machine-learning models for each N-glycan composition. Regression models were trained to predict N-glycan abundance (response variable) from glycogene expression (predictors) using paired LC-MS/MS N-glycomic and 3′-TagSeq transcriptomic datasets from cells derived from multiple tissue origins and treatment conditions. The datasets include cells from several tissue origins – B cell, brain, colon, lung, muscle, prostate – encompassing nearly 400 N-glycan compounds and over 160 glycogenes filtered from an 18000-gene transcriptome. Accurate models (validation R2 > 0.8) predicted N-glycan abundance across cell types, including GLC01 (lung cancer), CCD19-Lu (lung fibroblast), and Tib-190 (B cell). Model importance scores ranked glycogene contributions to N-glycan predictions, revealing significant glycogene associations with specific N-glycan types. The predictions were consistent across input cell quantities, unlike LC-MS/MS glycomics which showed inconsistent results. This suggests that the models can reliably predict N-glycosylation even in samples with low cell amounts and by extension, single-cell samples. These findings can provide insights into cellular N-glycosylation machinery, offering potential therapeutic strategies for diseases linked to aberrant glycosylation, such as cancer, and neurodegenerative and autoimmune disorders.
Efforts have been made in the past to correlate glycogene expression with the abundance of these glycogene products.16–18 Novel methods such as SUGAR-seq,19 scGlycan-seq,20 and scGR-seq21 enabled simultaneous quantification of glycogene transcripts and glycans through the use of lectin-based glycan profiling. In addition to these novel methods, computational tools such as GlycoMaple,22 SHAP,23 and Glcopacity,24 have aided in analyzing RNAseq and lectin-based glycan profiling data from these methods to comprehensively study glycosylation. Lectin-based glycomic profiling methods are preferred when complexed with other methods such as RNAseq, due to the convenient and rapid analyses that require no additional complicated or large instruments as well as the wide availability of fluorescently-labeled lectins able to characterize the glycan structural motifs.25 However, these approaches have been limited by the recognition capacity of lectin-based glycan profiling, which fails to capture the complete glycan structures or composition, and inability to differentiate between the classes of glycoconjugates – N-glycan, O-glycan, or glycosphingolipid. For example, the SNA lectin can recognize α-2,6-linked sialic acid to galactose residues, but it is unable to distinguish whether the structural motif came from an N-glycan, O-glycan, or glycosphingolipid.26 More critically, lectins do not allow quantitation between different structures. Thus, it has so far been difficult to correlate the transcriptomic expression of glycogenes with individual glycan structures or compositions, or with the main classes of glycans.
On the other hand, LC-MS-based glycomic methods provide more comprehensive structural analyses of cellular glycosylation, allowing the detection and quantification of individual N-glycan compositions or structures, especially when performed on a stationary phase that yields isomeric separation such as PGC.25 As such, LC-MS/MS-based methods surpass lectin-based limitations,25 and complementing results of LC-MS/MS-based methods with RNAseq data may provide a better elucidation of glycosylation pathways.
In order to address the efforts of correlating glycogene expression with quantitative glycomic abundance, we developed a method to integrate RNAseq transcriptomic and LC-MS/MS N-glycomic data, correlating glycogene expression with protein N-glycosylation abundances across cells from diverse tissue origins and conditions. Non-linear regression models were constructed to predict N-glycan abundances from glycogene expression profiles and identify key genes associated with specific N-glycans. The approach was validated by accurately predicting N-glycosylation in cell lines (GLC01, CCD19-Lu, Tib-190) regardless of cell sample amount. This method provides a platform to identify glycogenes implicated in cancer and N-glycan biosynthesis, enabling the development of targeted therapeutics for these pathways.
To quantify the expression of glycogenes, we performed 3′-TagSeq RNAseq analysis for quantifying gene expression using tag abundance and then normalized using TMM normalization.27 To reduce noise and highlight differences in glycogene expression, we filtered the ∼18000 transcriptome data to include the genes relevant to N-glycan expression (nearly 170 glycogenes)11 that encompasses sugar transporters, nucleotide sugar synthesis, dolichol pathway proteins, mannosyltransferases, lysosomal targeting and degradation, mannosidases, GlcNactransferases, galactosyltransferases, fucosyltransferases, and sialyltransferases (Fig. 3A). With the glycogenes alone, we observed stark differences in expression amongst the cells we assayed (Fig. 3B and D). For example, the mannosidase MAN1A1 was observed to be most abundant in B cells followed by prostate cells. Similarly, MGAT3, which adds an N-acetylglucosamine to the chitobiose core to synthesize bisected N-glycans, is most abundant in B cells compared to the rest of the tissue cell types. We also quantified several fucosyltransferases and sialyltransferases; for example, FUT11 (which adds a terminal α1,3-fucose) and FUT8 (which adds a core α1,6-fucose) are expressed significantly differently between tissue types, with FUT11 being consistently higher than FUT8 expression within the same tissue type. ST6Gal1, which adds a terminal α2,6-sialic acid, are expressed differently between tissue types (Fig. 3C).
![]() | ||
Fig. 4 Model construction and 5-fold cross-validation of supervised ML models (A) for predicting N-glycan (e.g. H8N2, H5N4, H5N4F1, H5N4S1, H5N4F1S1) abundances from glycogene expression data (B). |
Based on the results, we created models with good performance (validation R2 > 0.7) especially for undecorated (H5N4, H7N6), fucosylated (H5N4F1, H5N4F2, H7N6F1), sialylated (H5N4S1, H6N5S1) and sialofucosylated N-glycans (H5N4F1S1, H5N4F1S2, H7N6F1S1). On average, we observed good model performance in predicting the most abundant N-glycans per category (Table 1): undecorated (average validation R2 = 0.82), fucosylated (average validation R2 = 0.74), sialylated (average validation R2 = 0.83), and sialofucosylated (average validation R2 = 0.85) N-glycans. We found that N-glycans with models having poor performance (R2 < 0.3) tended to have very low abundance (<0.05%). Hence, after filtering out the N-glycans with very low abundance (<0.05%), we obtained several models with good performance (R2 > 0.7): high-mannose (n = 1), undecorated (n = 9), fucosylated (n = 22), sialylated (n = 6), sialofucosylated (n = 22) (ESI Fig. 1–5†).
Furthermore, we observed that each N-glycan composition necessitated different regression models. For example, the best model for the fucosylated bi-antennary compound H5N4F1 was the squared exponential GPR (R2 = 0.85, RMSE = 0.71845) whereas the best model for the sialylated bi-antennary compound H5N4S1 was the Medium Neural Network (R2 = 0.9, RMSE = 0.74494) (Table 1). A likely explanation for the differences in model characteristics between N-glycan structures is the differences in glycogene interactions upon biosynthesis; for example, fucosylated N-glycan structures would not necessitate the involvement of sialyltransferases in its biosynthesis, while sialylated structures would not involve fucosyltransferases in its biosynthesis. Likewise, we observed differences in model characteristics between undecorated multiple-branched N-glycans: the bi-antennary H5N4 utilized a Bilayered neural network (R2 = 0.86, RMSE = 0.48265), the tri-antennary H6N5 used a Squared exponential GPR (R2 = 0.79, RMSE = 0.36207), and the tetra-antennary structure H7N6 performed best under a Trilayered neural network (R2 = 0.82, RMSE = 0.40917). Due to the numerous possible GlcNActransferases that could catalyze the branching reactions of N-glycans (e.g. MGAT1, MGAT2, MGAT4A/B/C, MGAT5/5B), it is likely that each additional N-glycan branch (e.g. tri-, tetra-antennary) necessitates the involvement of more GlcNAc-transferase than the bi-antennary structure; hence, more complicated glycogene interactions are likely to occur with highly-branched structures compared to bi-antennary ones.
There were N-glycan compositions with relatively poor model performance: H6N5F1 (validation R2 = 0.58), H5N4S2 (validation R2 = 0.3), and H7N6S1 (validation R2 = 0.22); we attribute these low predictability values to potential structural variations arising from differences in linkages, especially that of fucose (α1,3 vs. α1,6), sialic acid (α2,3 vs. α2,6), and galactose (α1,3 vs. α1,4). We also observed fair to poor model performances with our high-mannose predictive models, for example: H9N2 (trilayered neural network, R2 = 0.56, RMSE = 30.817), H8N2 (linear SVM, R2 = 0.48, RMSE = 1.5606), and H7N2 (cubic SVM, R2 = 0.41, RMSE = 1.4336). Although the low model performance warrants further exploration, a probable explanation is the structural separation of N-glycans with similar compositions but different glycosidic linkages, thereby decreasing model resolution.30 As such, refining the N-glycomic structural characterization method with linkage information to obtain precise structures will aid in refining the current predictive models in a future study. Another potential reason highlighted above is low abundance of some glycan structures; wherein we observed poor performance of models (R2 < 0.3) for structures with less than 0.05% relative abundance. We summarized the model parameters for each N-glycan structure into MATLAB workspaces, including the training and validation datasets into Github (https://github.com/MichRussAlv/glycoPATH).
Beyond predicting the bulk N-glycome of cells, we aimed to determine whether decreasing the number of cellular starting material will affect the RNAseq data and thus, the N-glycomic predictions (Fig. 6). The rationale for this experimental design is to circumvent the inherent dependency of the N-glycomic LC-MS/MS method with the starting amount of cell sample. For example, the bulk N-glycome obtained from LC-MS/MS glycomics decreases by a factor of ten with each ten-fold dilution of the starting cell suspension consisting of five million cells (5 M) (Fig. 6B). Furthermore, we start to observe skewed results from LC-MS/MS glycomics upon decreasing the amount of starting material. In contrast, transcripts can be prepared from relatively lower amounts of starting material and still obtain consistent results; hence, low-input transcriptomics.27,28 Using transcriptomics and our predictive models (RNAseq + models), we observed that the predicted N-glycomes were more consistent in abundances when calculated from RNAseq (Fig. 6C). Moreso, the predicted N-glycomes from were similar to the LC-MS/MS profile obtained from bulk cells. In addition, the predicted profiles of 5 M, 500 K, and 50 K cells were found to be more consistent compared to the corresponding profiles obtained from LC-MS/MS glycomics. For example, with LC-MS/MS glycomics we observed sialylated abundances of 13.59% (from 5 M cells), 11.46% (from 500 K cells), and 5.60% (from 50 K cells), showing a decrease in observable sialylated compounds with lower amount of starting cell material. In contrast with RNAseq + ML-models, we predicted sialylated abundances of 13.59% (from 5 M cells), 12.91% (from 500 K cells), and 13.89% (from 50 K cells). Thus, we can obtain accurate representations of N-glycome of bulk cell populations even when extracted from low-input samples.
For fucosylated glycans, the most abundant compositions were found to be biantennary structures H5N4F1 and H5N4F2, and monofucosylated structures H6N5F1 and H7N6F1. The fucosyltransferases with high importance scores for these catalyze the addition of antennary fucose (FUT7, FUT11) as well as core-fucose (FUT8), with the exception of the bi-fucosylated H5N4F2 with lower importance score compared to the other structures (Fig. 5C). We believe these results indicate that for N-glycan compositions with several fucose residues, the combination of terminal and core-fucosyltransferases can be used to calculate its abundance. In determining correlations with sialylated N-glycans, ST3Gal6 have the highest importance scores, followed by ST3Gal3 and ST6Gal1 (Fig. 7D). In particular, we observed that ST3Gal6, which catalyzes the addition of terminal sialic acid in an α2,3-linkage, have the highest importance score with bi-, tri-, and tetra-antennary structures with mono- or bi-sialylation. ST3Gal3, which catalyzes the same reaction, have higher importance score to bi-sialylated structures H5N4S2 and H6N5S2. Finally, ST6Gal1, which adds α2,6-linked sialic acid, has highest importance score with monosialylated biantennary compound H5N4S1.
Finally, we sought to identify the fucosyltransferases and sialyltransferases with the highest importance scores for sialofucosylated structures (Fig. 7E). We observed similar trends with the fucosyltransferases and fucosylated N-glycans, wherein the enzymes catalyzing the addition of terminal (FUT7, FUT11) and core fucose (FUT8) had high importance scores. With sialyltransferases, ST3Gal3 lost importance except for the monosialylated monofucosylated triantennary structure H6N5F1S1, whereas ST3Gal6 remained to have higher importance scores across the N-glycan compositions surveyed. ST6Gal1 and ST6Gal2 both have high importance scores, having both catalyze the addition of α2,6-linked sialic acid. Altogether, these results indicate that multiple enzymes can catalyze the same reaction to produce the same N-glycan composition as well as enzymes having specific substrate preferences. With this method, we are able to discern the specific glycogenes that play important roles in the synthesis of specific N-glycan compositions.
An interesting application for the glycogene feature importance calculation is that we can rank glycogenes that catalyze the reaction pathway from precursor high-mannose structures to sialofucosylated structures, after overlaying the enzymatic rules catalyzed by every glycogene enzyme in the pathway. For example, in order to synthesize the sialofucosylated structure H5N4F1S1 from the high-mannose structure H9N2 it has to go through mannose trimming by mannosidases (α1,2-linked: MAN1A1, MAN1A2, MAN1B1, MAN1C1; α1,3-/α1,6-linked (MAN2A1, MAN2A2, MAN2B1, MAN2B2), addition of GlcNAc (β1,2-linked: MGAT1, MGAT2; β1,4-linked: MGAT4A, MGAT4B, MGAT4C; β1,6-linked: MGAT5, MGAT5B; bisecting: MGAT3), addition of fucose (α1,2-linked: FUT1, FUT2; α1,3-linked: FUT10, FUT11, FUT4, FUT4, FUT5, FUT6, FUT7, FUT9; α1,6-linked: FUT8), and addition of sialic acid (α2,3-linked: ST3Gal1, ST3Gal2, ST3Gal3, ST3Gal4; α2,6-linked: ST6Gal1, ST6Gal2).11 Ranking the importance of these glycogenes in the H5N4F1S1 model shows the following most important genes per reaction: α1,2-mannosidase: MAN1A1 (score = 8.2625); α1,3/α1,6-mannosidase: MAN2A1 (score = 11.416), MAN2B1 (score = 7.187); β1,2-GlcNActransferase: MGAT1 (score = 5.0537); β1,4-GlcNActransferase: MGAT5B (score = 4.6258), MGAT4B (score = 4.4976); galactosyltransferase: B4Galt2 (score = 9.3097), B4Galt1 (score = 7.5278); α1,3-fucosyltransferase: FUT7 (score = 9.1033), FUT11 (score = 8.4866); and α2,6-sialyltransferase: ST6Gal1 (score = 5.9686), ST6Gal2 (score = 6.4552) (Fig. 8).
To further validate the pathway analysis method, we determined whether the model can detect perturbations in the N-glycan biosynthesis pathways caused by glycosylation inhibitors. We recently characterized the effect of glycosylation inhibitors such as kifunensine (type-I mannosidase inhibitor) and 2-deoxy-2-fluorofucose (fucosylation inhibitor) on the N-glycan profiles of cells.32,33 Kifunensine inhibits the activity of type-I mannosidases (e.g. MAN1A1),34 thereby preventing the maturation of N-glycans from high-mannose types (e.g. H9N2) into undecorated, fucosylated, sialylated, and sialofucosylated types. Similarly, 2-deoxy-2-fucose is a pan-inhibitor of fucosylation, through inhibition of GDP-fucose synthesis and fucosyltransferase activities.35 Thus, 2-deoxy-2-fucose is an effective inhibitor of both fucosylated (e.g. H5N4F1, H5N5F1) and sialofucosylated(e.g. H5N4F1S1, H5N5F1S1) N-glycan structures.
To test this notion, we treated B cells with these inhibitors and subsequently extracted the transcriptomic profiles for 3′-TagSeq analysis. We then predicted the resulting glycomic data with the predictive models to determine the specific glyco-enzymes affected by the inhibitors (Fig. 9A and B). With kifunensine, we expect type-I mannosidases (MAN1A1) to be inhibited resulting in higher abundance of H9N2 in the kifunensine-treated cells (Fig. 9C). Interestingly we also found slightly higher abundance (albeit less drastic) of H8N2 in the kifunensine-treated cells; such a result may suggest that the kifunensine treatment to be very effective in inhibiting the first step in the pathway – the trimming of H9N2 into H8N2. Furthermore, we observe a slight decrease in abundance of the smaller high-mannose N-glycan H5N2 in the kifunensine-treated cells; such a result indicates that kifunensine is less active in inhibiting α1,3/6-linked mannosidases (type-II mannosidases). Unsurprisingly, we predicted the abundance of mature N-glycans (e.g. H5N5, H5N5F1, H5N5S1, H5N5F1S1) to be lower in kifunensine-treated cells due to inhibition of high-mannose trimming. The predicted N-glycan abundances followed the trend of experimentally-derived kifuneninse-treated cells (Fig. 9C). With kifunensine, there is an increase in the abundance of H9N2 and H8N2 N-glycans and a decrease in abundance of mature N-glycans (e.g. H5N5, H5N5F1, H5N5S1, H5N5F1S1).
With 2-deoxy-2-fluorofucose-inhibited cells, we expected fucosyltransferases (FUT7, FUT11) to be inhibited (Fig. 9A and B). Indeed, we observed a drastic decrease in abundance of both fucosylated (H5N5F1) and sialofucosylated (H5N5F1S1) N-glycans in the treated-cells. We also observed a corresponding drastic increase in abundance of sialylated H5N5S1 N-glycan in the treated cells. Based on the pathway (Fig. 9D), wherein the undecorated substrate H5N5 could be decorated by either fucose (H5N5F1), sialic acid (H5N5S1), or both (H5N5F1S1). This result corresponds to 2-deoxy-2-fluorofucose being a fucosylation inhibitor; in that the H5N5 precursor is shunted onto the sialic acid decoration step to form H5N5S1, instead of the fucose decoration step to form H5N5F1. Thus, we observe drastically higher abundance of H5N5S1 compared to both H5N5F1 and H5N5F1S1. Similarly, we compared the predicted N-glycan abundances with experimentally-derived 2-deoxy-2-fluorofucose-treated cells and found a similar trend (Fig. 9D). In particular, the abundances of fucosylated N-glycans H5N5F1 and H5N5F1S1 decreased, while the abundance of sialylated H5N5S1 increased.
An interesting outcome of integrating both pathway analysis and N-glycan abundance prediction is the ability to identify which glycogene is highly correlated with the biosynthesis pathway. For example, there are several fucosyltransferases expressed by B cells (e.g. FUT7, FUT11, FUT8) but the glycogene that had the highest importance score was FUT7; thus, specific targeting of FUT7 could lead to more precise knock-down of fucosylated N-glycans in these cells as opposed to using pan-inhibitors.
Previous methods such as SUGAR-seq,19 scGlycan-seq,20 and scGR-seq21 used lectin-based glycan profiling coupled with RNAseq methods to simultaneously quantify glycogene and N-glycan expression. Software, such as GlycoMaple,22 SHAP,23 and Glcopacity,24 have been developed to recapitulate the data obtained from these methods. However, lectin-based glycan profiling methods are known to have several limitations. Due to lectin binding being specific to glycan epitopes and not individual structures, it is unable to distinguish between specific N-glycan compounds.25 For example, the fucose-binding lectin such as AAL47 can bind to fucose in α1,2-, α1,3-, α1,4-, and α1,6-linkages, regardless of being present in N- or O-glycans.25 Thus, lectin-based profiling may lack specificity for defined N-glycan structures. LC-MS/MS methods can provide the necessary resolution required to quantify specific N-glycan structures, and integrating its analysis with RNAseq transcriptomics can further provide insights into the relationship between the glycome and transcriptome.
Protein N-glycosylation involves the action of the whole machinery of glycosidases, glycosyltransferases, transport proteins, and chaperones that work in conjunction with each other to enact post-translational modification on glycoproteins.11,12 These enzymes and proteins are coded into the transcriptome by over 160 glycogenes: 14 mannosidases, 18 galactosyltransferases, 35 N-acetylglucosaminyltransferases, 13 fucosyltransferases, and 21 sialyltransferases.11 Members of the mannosidase family catalyze the removal of mannose residues either in an α1,2-, α1,3-, or α1,6-linkage, essentially processing high-mannose type N-glycans into other types.43,48 The action of both N-acetylglucosaminyltransferases and galactosyltransferases signal the transition of N-glycans from high-mannose types into either hybrid- or complex-type N-glycans. Once GlcNAc has been added to the antenna of the N-glycan, galactosyltransferases can subsequently act on it to catalyze the transfer of UDP-Gal to the antenna GlcNAc either in a β1,3- or β1,4-linkage.49 Upon biosynthesis of either hybrid- or complex-type N-glycans, further decoration with fucose and/or sialic acid residues is acted upon by fucosyltransferases and sialyltransferases, respectively. Fucosyltransferases add fucose residues from GDP-Fuc to either the antenna GlcNAc in an α1,3-linkage, or to the core-GlcNAc in an α1,6-linkage; the latter reaction is known to be catalyzed only by FUT8.50 On the other hand, sialyltransferases can catalyze the addition of sialic acid from CMP-NeuAc to antenna GlcNAc residues, either in α2,3- or α2,6-linkage.51
In the multi-step biosynthetic process such as the production of N-glycans, multiple glycogenes interact and work together to synthesize the eventual N-glycan structure conjugated to the glycoprotein. Hence, this process necessitates the use of robust predictive models and algorithms, such as machine-learning algorithms, to holistically incorporate these multi-gene interactions.52 Based on the modeling results, each N-glycan necessitated a different machine-learning algorithm to construct. Among the best performing models were gaussian processes most frequently with a rational quadratic kernel. This model type was specifically developed to suit modelling tasks based on smaller datasets where inputs have widely varying degrees of correlation with predicted outputs.53,54 In the context of biochemical networks, Gaussian Process Regression has been shown to be an effective method for modelling systems in which external pathways have a significant influence on the subsystem in question.53 With this modelling approach the influence of specific glycosylation genes as a subset of the total transcriptome were pinpointed as factors in N-glycan abundance expression. We identified that N-glycan linkage diversity may contribute to model performance. For example, an N-glycan with composition H5N5 can have at least three possible structures: bisected bi-antennary, and two tri-antennary structures. Before the addition of a third GlcNAc residue, given the known rules of N-glycan biosynthesis, the linkages of the nine residues in a bi-antennary glycan H5N4 can inferred based on composition. This involves addition of two GlcNAc residues to the chitobiose core common to all N-glycan which is then extended by β1,2 linkage forming MGAT1 and MGAT2. Following the putative biosynthetic pathways presented in Fig. 8, an addition of a fifth GlcNAc implicates either MGAT3, MGAT4, or MGAT5 to form a β1,4-linked bisected structure, a β1,4-linked tri-antennary structure, or a β1,6-linked tri-antennary structure, respectively. Thus, the biosynthetic pathways to create H5N5 structures are naturally linked with each other due to having common reaction precursors being acted on by MGAT3, MGAT4, or MGAT5.
Using our predictive models, we successfully predicted the N-glycome for both bulk cell samples and low-input cell samples. Notably, we found that LC-MS/MS N-glycomics is highly sensitive to the amount of starting cell material, which led to skewed abundances of high-mannose, undecorated, fucosylated, sialylated, and sialofucosylated N-glycan structures. In contrast, transcriptomics combined with machine-learning models produced more consistent results regardless of the starting cell population size, enabling us to predict protein glycosylation even with a smaller cell sample (i.e. low-input N-glycomics) or from single cells (single-cell N-glycomics). By correlating N-glycomics with glycogene expression data, we can see patterns in N-glycan biosynthesis of lung cells that coincided with reports of aberrant glycosylation in cancer. High-mannose N-glycan structures were observed to have significant negative correlations with MAN1A1 (removes α1,2-linked mannose) and MAN2A1 (which removes both α1,6- and α1,3-linked mannose) among other mannosidases. MAN1A1 in particular, is known to be correlated with impaired survival in breast54 and bone43 cancers. Fucosylated N-glycans significantly correlated with fucosyltransferases FUT7 and FUT11, which adds fucose to antenna GlcNAc in an α1,3-linkage, as well as FUT8, which adds fucose to the core GlcNAc in an α1,6-linkage; these are interestingly associated with invasion and metastatic potentials of tumors through hypoxic conditions.55–57 Finally, sialylated N-glycans correlated with ST6Gal2 and ST3Gal3, which add sialic acid residues to antenna galactose in an α2,6-linkage and α2,3-linkage, respectively. Overexpression of α2-3 sialyltransferase III (ST3Gal-III) in pancreatic cancer has been implicated in pancreatic tumor progression. Overexpression of α2-6 sialyltransferase I (ST6GalNAc-I) was related to poor patient survival in colorectal carcinoma patients.58 In lung cancer patients, aberrant glycosylation has been found to be correlated to aberrant expression of glycosylation enzymes as well. Gene-expression analysis of lung tissue sections from smokers and never-smokers found significantly upregulated MAN1A2, MAN2A1, MGAT2, MGAT4B, B4GALT2, FUT2, FUT3, FUT6, and FUT8 while several enzymes, MAN1A1, MAN1C1, MAN2A2, MGAT1, MGAT3, and FUT1 were significantly down-regulated.30,46 In addition to these enzymes, FUT7 has also been observed to play a role in lung cancer. The expression of FUT7 and/or FUT4 has been positively associated with significantly shorter survival in lung cancer patients compared to the patients that did not express these genes.59 FUT7 expression was consistent with sLex expression level; basing on the biosynthetic pathway of sLex, this was to be expected. L-selectin ligands, which are synthesized by FUT7, was found to also play a role in the metastasis mechanism of leukocyte L-selectin action, with attenuated metastasis observed in FUT7−/− mice. FUT7 was also found to have a role in the metastasis of human colorectal carcinoma cells (LOVO), with increased metastatic potential, sLex and glycoprotein CD24 expressions after transfection with FUT7, implying the role of FUT7 in glycosylation of CD24 and its enhancement of metastatic potential.60 Likewise, overexpression of CD15S epitopes were observed after transfection of SEBTA-001 (biopsy-derived brain metastatic NSCLC) and NCI–H1299 (metastatic NSCLC from cervical lymph node) with FUT7.61 This led to enhanced cell adhesion of these cells to an endothelial cell monolayer of hCMEC/D3 (human cerebral microvascular endothelial cell line), with knockdown of FUT7 expression leading to decreased cell adhesion. In addition to metastatic pathways, FUT7 also plays a role in other oncogenic pathways. FUT7 overexpression in A549 (NSCLC) cells led to increased sLex expression, which correlated to the activation of the EGFR/AKT/mTOR pathway, triggering cell proliferation.62 In human hepatocarcinoma cell lines, FUT7 overexpression led to increased the sLex expression of the InR (insulin receptor)-α subunit, enhancing autophosphorylation of InR-β and further phosphorylation of insulin receptor substrate-1 (IRS-1), protein kinase B (PKB/Akt), MAPK, MEK, PDK-1, PKN, c-Raf-1 and β-catenin.63 Overexpression of FUT7 in heptocarcinoma cell line also downregulated the protein expression and activity of the cyclin-dependent kinase inhibitor p27Kip1 protein, an inhibitor of CDK2.64 By decreasing p27Kip1, the increased CDK2 activity stimulated the phosphorylation (and deactivation) of the retinoblastoma protein and stimulated G1/S transition and cell proliferation. The reduced p27Kip1, enhanced CDK2 and Rb phosphorylation, and cell proliferation, were correlated with the amount of sLex, the biosynthetic product of FUT7.
N-Glycomics was performed following previously defined methods.25,66 The dried and cleaned-up N-glycans were reconstituted in Milli-Q water and then transferred into LC-MS/MS vials for injection into the instrument. The samples were injected into an Agilent 1200 series liquid chromatography system with an Agilent PGC-II chip (40-nL enrichment column, 5 μm; 75 μm × 43 mm separation column; Agilent Technologies, cat. no. G4240-64010). Analytical separation was performed using a gradient of mobile phase A (3% ACN, 0.1% v/v formic acid) and B (90% ACN, 1% v/v formic acid): 0–2 min: 0–0%, 2–20 min: 0–16%, 20–40 min: 16–72%, 40–42 min: 72–100%, 42–52 min: 100–100%, 52–54 min: 100–0%, 54–65 min: 0–0%.
Mass spectra were acquired using an Accurate mass QToF (Agilent Technologies, model no. 6520) over the 600–2000 m/z range in positive-ion mode. The Vcap was kept at 1850 V throughout the run. MS scans were acquired at 0.8 spectra per s and MS/MS scans were acquired at 1.0 spectra per s. Fragments were obtained in CID, with collision energies calculated using Vcollision = 1.8 × (m/z)/100–2.4. Mass spectra were acquired using data-dependent acquisition, selecting the top 5 precursors per scan for fragmentation.
Acquired LC-MS/MS data were processed using MassHunter Qualitative Analysis Software B.07.00 (Agilent Technologies). N-Glycan structures were identified using the MassHunter's Find by Molecular Feature algorithm using previously defined parameters, with matching to our in-house database of previously-identified N-glycans from lung cancer cells, which were subsequently manually validated using MS/MS spectra.35,65,67 Relative quantification of N-glycans was achieved by measuring the area under the curve (XIC) of each N-glycan structure. The XICs were further processed by normalizing to the TIC and subsequent classification and summation based on: high-mannose (containing >3 mannose residues), undecorated (containing <4 mannose residues, >3 N-acetylglucosamine residues, and no fucose nor sialic acid residues), fucosylated (containing <4 mannose residues, >3 N-acetylglucosamine residues, at least 1 fucose and no sialic acid residues), sialylated (containing <4 mannose residues, >3 N-acetylglucosamine residues, no fucose, and at least 1 sialic acid residue), or sialofucosylated (containing <4 mannose residues, >3 N-acetylglucosamine residues, at least 1 each of fucose and sialic acid residues).
Numerous bioinformatic tools have been developed to aid glycoinformatics, such as in predicting glycan structures from transcriptomic data,22,68 glycosites from genomic data,69,70 protein-glycan interactions,71,72 and lectin-based glycan signatures from RNAseq.23 As of writing, there is still no methodology to predict protein N-glycosylation abundance from glycogene expression data. Hence, this is the motivation for our method glycoPATH, which can correlate information from both RNAseq transcriptomics and LC-MS/MS glycomics to predict glycosylation. From the correlations, we were able to construct prediction models to predict the N-glycome of cells derived from multiple tissue origins (CCD19-Lu, GLC01, Tib-190), from cells with low amount of starting material, and cells with perturbed glycosylation profiles due to inhibitor treatment. Although we were able to obtain accurate results of our predictions using the glycosylation enzymes, additional refinement of the models could be used to improve the accuracy and precision of predictions. Specifically, incorporation of glycosylation enzyme protein abundance, activity, and localization, may be beneficial.73–75 Additionally, the models can be further refined by incorporating linkage information (e.g. α2,3- vs. α2,6-linked sialic acid) of the N-glycan structures into the N-glycan dataset. Finally, the models were trained on global glycosylation profiles obtained using nLC-QToF methods; as such, does not contain glycosite- and glycoprotein-specific information. Future work involving quantifying glycoprotein-specific glycan abundances using nLC-Orbitrap methods will benefit from the ML-based modeling presented here. While compositional abundances are the natural starting point for this form of correlation study, there is strong potential for the incorporation of linkage information in future work. In depth structural analysis of glycans falls into two broad categories of teasing apart structural features by tandem MS as well as distinguishing linkage information with retention time. Ongoing projects are seeking to delve into the next layer of glycan structural information, involving knock-outs of specific glycosyltransferase genes to identify specific peaks corresponding to specific glycan linkages. Altogether, the method presented here can provide comprehensive information on the glycogenes involved in protein glycosylation.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5sc00467e |
This journal is © The Royal Society of Chemistry 2025 |