The impact of O-glycan chemistry on the stability of intrinsically disordered proteins

Protein glycosylation is a diverse post-translational modification that serves myriad biological functions.

Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) of proteins are prevalent in both eukaryotes and prokaryotes. [1][2][3] Although oen poorly conserved in sequence, the amino acid content of IDPs and IDRs is actively regulated, and IDPs and IDRs serve functions such as connecting ordered domains, regulating translation, molecular recognition and signaling, and assisting in protein folding. [2][3][4] Because of their inherent exibility and lack of structure, IDPs and IDRs are susceptible to proteolytic cleavage in the competitive, extracellular milieu, and O-glycosylationthe attachment of a sugar moiety to the b-hydroxyl group of serine or threonineis an important mechanism to protect against proteolysis in these regions. 5 In fungi and yeasts in particular, most of the secreted IDPs and proteins exhibiting IDRs are O-mannosylated, [6][7][8][9] but the evolutionary preference for this specic glycosylation pattern is not well understood. The present study uses glycopeptide synthesis and molecular dynamics (MD) simulations to reveal that O-mannosylation is the preferred glycan motif on fungal IDP sequences and reveals the biophysical reasons underpinning this observation, in turn suggesting an evolutionary selection for a-mannose as the preferred glycan for IDP/IDR stabilization in some eukaryotic systems.
O-Mannosylation is strongly preferred for proteolysis protection of a model fungal IDP. To investigate how glycan identity affects IDP proteolytic stability, we employed the naturally O-mannosylated linker from the Trichoderma reesei glycoside hydrolase family 7 cellobiohydrolase, TrCel7A, as a model. 10 This enzyme is one of the most important industrial cellulases and its linker is a well-studied O-mannosylated IDP. [11][12][13][14] The a-anomeric conguration was chosen since it is the only type reported so far in reducing terminal mannose residues of O-mannosylated proteins from fungi and yeasts. 8 We used solid-state glycopeptide synthesis 12,15,16 to produce four variants ( Fig. 1), including the non-glycosylated linker, and measured the half-life to thermolysin degradation with MALDI-TOF MS (Fig. S1-S4 †). [15][16][17][18][19] As shown in Table 1, all glycosylated variants improve proteolytic stability over the non-glycosylated linker, L NG , but the O-mannosylated linker (L man ) exhibits an striking 112-fold improvement over L NG , 16-fold proteolysis protection over the O-galactosylated linker (L gal ), and 3-fold over the O-glucosylated linker (L glc ). These results, obtained using a model IDP, align with our previous observation that O-mannosylation improves proteolytic stability compared to other glycans in an ordered protein domain from the same enzyme. 15,16 Glycan stereochemistry impacts protein exibility and accessibility. To explain the results presented in Table 1, we subsequently conducted temperature replica exchange molecular dynamics (T-REMD) with explicit solvent using various linker models, including the four experimental systems.
Analyses are reported on the T-REMD population from the lowest temperature replica (300 K). Two hypotheses for the increased proteolytic stability imparted by glycans are that (i) glycans increase protein rigidity 20,21 and that (ii) glycans impart steric hindrance to restrict protease access. 22 Both hypotheses were tested computationally by examining differences in protein exibility and accessibility. Notably, the predicted cleavage sites to various proteases coincide with the glycosylation sites (Fig. S5 †), perhaps suggesting that steric hindrance may be responsible for proteolysis resistance. However, the calculated solvent accessible surface area is similar for all glycosylated models considered (Fig. S6 †), while there is a considerable difference in proteolysis susceptibility among L man , L gal , and L glc , with L gal exhibiting only slightly higher resistance to proteolysis than L NG . These results suggest that steric hindrance alone cannot fully explain proteolytic resistance, since the glycan moieties occupy roughly the same volume.
We subsequently examined how glycan chemistry affects protein exibility, glycan orientation, specic interactions, and backbone torsional preferences in an attempt to explain the high proteolysis resistance imparted by O-mannosylation. Information about protein exibility and extension were obtained from the free energy proles, or potential of mean force (PMF), as a function of the end-to-end distance for all linkers (Fig. 2). Unlike L gal and L glc , for which the PMFs are somewhat at-bottomed and resemble that of the non-glycosylated linker L NG , the PMF for L man is slightly narrower and shows a well-dened local minimum at larger distances ($3.0-3.5 nm). This indicates that L man is, on average, stiffer and adopts more extended conformations than its counterparts. Further analyses reinforce the hypothesis that a-mannosylation is able to restrict protein exibility. That is, the relative stiffening of L man was corroborated by its greater persistence length (Table S1 †). Also, similar structures from T-REMD were clustered considering the Ca atoms with a root mean squared deviation cutoff of 1.5Å (Fig. S7, Table S3 †). 23 The most populated clusters were found for L man . Moreover, values of root-mean-square deviation relative to average structures computed for 10 ns trajectory blocks also indicate lower mobility of the L man backbone (Table S3 †). Small differences in protein backbone exibility and concomitant large differences in resistance to proteolysis were also recently found for a structured protein with a single attached glycan, a-mannose or a-glucose. 24 Chaffey et al. suggested that a chain of specic interactions between O-mannosyl and side chains of close residues may be propagating stiffening along the protein backbone. The similar behavior observed with IDPs suggests that the effects of a-mannose on protein stiffening may not be exclusive to a specic protein fold. From these observations, we further hypothesized that the observed differences in linker extension are caused by local interactions with the C2-hydroxyl group (2-OH) adjacent to the glycan-peptide bond, which is equatorial in a-glucose and a-galactose and axial in amannose. Fig. 3A shows the average number of hydrogen bonds (HBs) between the protein and each of the carbohydrate Fig. 1 The four linker models examined experimentally and computationally (left). Chair representations of a-mannose, a-galactose, and a-glucose are also depicted (right).  hydroxyl groups computed from the T-REMD simulations. The HBs between the 2-OH group and the peptide contribute signicantly to the higher total number of HBs in L gal and L glc . Compared to L man , this indicates that the equatorial conguration of 2-OH, the closest hydroxyl to the peptide chain, favors glycan-protein HBs. Next, we show that orientation of the glycans relative to the peptide chain depends on the glycan chemistry and affects the conformational freedom of the glycosylated IDP. Fig. 3B shows the normalized distribution of the angle q between the normal to the plane of the sugar ring and the vector formed by Ca and Cb of the threonine residues to which the glycan is attached. Values near 180 and 90 correspond, respectively, to conformations in which the plane of the rings are nearly parallel and perpendicular to the direction of the peptide chain (Fig. 3C). The shoulder at $90 observed for L gal and L glc indicates that the glycans are more frequently oriented perpendicularly to the peptide chain than in L man , and, therefore, exhibit smaller contact surface with the protein (Table S2 †). This effect is associated to the pronounced glycan-protein HBs involving the equatorial 2-OH in L gal and L glc . The normalized angle distributions computed for the subset of molecular frames in which these specic interactions are absent (Fig. 3B, dashed lines) lack the characteristic shoulder in the 80-100 range, demonstrating that the C2 stereochemistry impacts the glycan conformation.
Taken together, the results presented thus far demonstrate that the 2-OH position affects glycan conformation and that protein dynamics differ depending on glycan chemistry. Next, why a-mannosylation leads to more extended conformations and reduces protein exibility requires an explanation. To this end, we examined how glycans affect the protein backbone conformational sampling at the residue level. Fig. 4A shows the Ramachandran plots for the L NG threonines, in which the protein backbone frequently visits all three major conformational regions. The R3 region corresponds to a-helix like conformations, whereas R1 and R2 correspond to more extended conformations, such as those found in b-sheets and polyproline II structures. Although no persistent secondary structures were detected during the simulations, these results reect the structural features of the linkers. We veried that attached glycans alter torsional sampling of the nearest amino acids, as seen elsewhere. 25,26 For L gal and L glc , the same three regions are populated as in L NG , except that the peak in the R3 region occurs only every other residue because of the excluded volume of neighboring glycans ( Fig. 4C and D, S8 †). In contrast, the R2 region is predominantly favored in L man for all glycosylated residues, suggesting that the relative rigidity of the a-mannosylated linker results in part from a reduced local dihedral exibility of the glycosylated residues imparted by a-mannosylation (Fig. 4B). We suggest that perpendicularly oriented glycan rings in L gal and L glc allow for improved accommodation of neighboring glycan rings, favoring more compact conformations. Conversely, the preferred orientation  of a-mannose glycans hinders the mobility of the surrounding atoms in the peptide chain, thus revealing a direct relationship between glycan chemistry, orientation, and protein conformational freedom.
Variants decorated with O-mannobiosyl (L 2man ) or O-galactobiosyl (L 2gal ) were also simulated, as well as the linker with a putative natural decoration based on a previous experimental characterization (L man-h ) (Fig. S9 and S10 †). 10 Our analyses suggest that the length of the glycan only slightly changes the dynamics of the protein when the chemistry of the 2-OH groups in the immediately attached glycosyl unit is preserved, reinforcing its importance (Fig. S11 †).
Glycosylation pattern and protein primary sequence are correlated. Although less well studied, many secreted bacterial proteins are also O-glycosylated. 27 For example, the multienzyme cellulosome from Clostridium thermocellum exhibits Oglycans on its linkers. 28 Similarly, the thermostable enzyme CelA from Caldicellulosiruptor bescii has linkers of up to 70 amino acids rich in O-glycans. 29 However, unlike the typical Omannosylated linkers from eukaryotic proteins, these linkers exhibit mostly O-galactosylation, and are enriched in proline, relative to eukaryotic IDRs. 30 Aiming to understand why O-mannosylation is not prevalent in bacterial IDP and IDRs relative to their eukaryotic counterparts, we also studied a "PT linker", which comprises a proline-threonine repeat sequence, and represents a fragment of glycosylated linkers found in bacterial cellulases. 20,28,29,31,32 PT linker models were uniformly decorated with a-mannose (L PT-man ), a-galactose (L PT-gal ), and a-glucose (L PT-glc ) (Fig. 5A).
It is well known that high proline content is generally found in disordered proteins 33 and favors extended conformations of IDRs. 34 Accordingly, the end-to-end distance PMF shows that the non-glycosylated PT linker favors extended conformations similarly to the glycosylated TrCel7A linker L man (Fig. 5B). Elongation and further stiffening of the linkers are observed upon glycosylation and is consistent with NMR spectroscopy data, 34 which demonstrated that glycosylation of PT linkers dampens the dynamics. Interestingly, in the PT linkers, varying the glycan chemistry is not as impactful to the protein dynamics as in the eukaryotic linker cases. To understand this difference, we examined the correlation between protein dynamics and carbohydrate structuring proposed from the ndings with the eukaryotic linker models. In the PT linkers, the presence of the equatorial 2-OH groups in galactosylated and glucosylated linkers does not increase the number of protein-glycan HB compared to L man nor favor perpendicular ring orientations, unlike L gal and L glc (Fig. S12 †). Moreover, the Ramachandran plots of threonines are remarkably similar for the three glycosylated PT linkers (Fig. 6), and show the same preference for extended conformations as L man does (R2 region). Together, these results predict that the C2 hydroxyl stereochemistry is unlikely to impact proline-rich IDPs. That may result from the loss of one of the HB sites in the protein backbone, since the backbone nitrogen atom is part of the pyrrolidine ring of proline residues.
T-REMD simulations of glycosylated tripeptides GTG were also performed to evaluate the effects of 2-OH conguration on glycan orientation and interactions without the inuence of neighboring glycans and amino acids. A single glycan, amannose, a-galactose or a-glucose, was O-linked to the central threonine in the models T man , T gal and T glc , respectively (Fig. S9 †). The parallel glycan-peptide backbone orientation is favored in the small model systems with a-O-mannosylation, T man , relative to other glycans (Fig. S13 †).  In the tripeptides T gal and T glc , the equatorial conguration of 2-OH in a-Gal and a-Glc favors HB interactions with the peptide as in the L gal and L glc linkers. However, an excess of perpendicularly-oriented glycans relative to L man is not observed for these tripeptides, indicating that the local HB interactions between 2-OH and the peptide are not the only factor affecting glycan conformation. Instead, these results indicate that the glycans in L gal and L glc are primarily perpendicularly oriented because of the excluded volumes of neighboring glycans and amino acid side chains, and that the 2-OHpeptide HBs stabilize this glycan conformation. Thus, our results with the small tripeptides suggest that the primary sequence and the distribution of glycosylated residues along the peptide chain are important factors for carbohydrate orientation in these systems.
In summary, experimental comparisons of glycosylated and non-glycosylated IDPs show that O-mannosylation enhances protection against proteolysis by two orders of magnitude relative to the non-glycosylated parent IDP, followed by O-galactosylation (10-fold improved stability). Our results suggest that the resistance to proteolysis is an important driving force for the natural selection of a-mannose as the main O-linked glycan motif decorating IDRs and IDPs in secreted eukaryotic proteins. Furthermore, these results demonstrate that the stereochemistry of C2 in the carbohydrate rings plays a key role on glycan orientation, which is correlated to protein exibility and extension. Accordingly, the axial position of 2-OH in an amannose glycan is related to the observed higher rigidity and extension of the studied IDR. While associating protein elongation with resistance to proteolysis is perhaps counterintuitive, protein stiffening can explain the remarkably higher stability of the O-mannosylated linker. That is, although we have not investigated the interactions between a protease and IDPs, we conjecture, in the light of the present ndings, that increasing the peptide rigidity impairs binding to the catalytic site of a protease. This hypothesis is reinforced by the observation of a similar trend of glycan chemistry impacting resistance to proteolysis of a structured protein and its thermal stability, which is oen linked to protein stiffening. 16 Moreover, the effect of glycosylation on the average elongation of the studied IDR, as a protein linker, may be important to provide the optimum distance between the connected domains for protein function. Therefore, O-linked a-mannose exhibits the unique ability of both extending the IDR while protecting it against proteolysis.
These results also suggest that the high content of proline residues, especially found in linkers from bacterial cellulases, avoids the need for a-mannose for increased protection against proteolysis. This hypothesis will be tested in future experimental studies. We further suggest that the glycosylation pattern in eukaryotic IDRs co-evolved with the primary sequence. That is, the lower content of proline residues in IDPs and IDRs from fungi compared to bacteria is compensated by Olinked a-mannosylation to guarantee optimal linker length, exibility, and protection against proteolysis. Given the compelling alignment of experimental and computational results, we anticipate that our ndings will be useful in the burgeoning eld of glycoprotein engineering.

Conflicts of interest
There are no conicts to declare.