Tokio
Watanabe
a,
Hirokazu
Yagi
a,
Saeko
Yanaka
ab,
Takumi
Yamaguchi
ac and
Koichi
Kato
*ab
aFaculty and Graduate School of Pharmaceutical Sciences, Nagoya City University, 3-1 Tanabe-Dori, Mizuho-Ku, Nagoya, Aichi 467-8603, Japan. E-mail: kkato@excells.orion.ac.jp
bExploratory Research Center on Life and Living Systems (ExCELLS) and Institute for Molecular Science (IMS), National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji, Okazaki, 444-8787, Japan
cSchool of Materials Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi 923-1292, Japan
First published on 26th March 2021
Oligosaccharides play versatile roles in various biological systems but are difficult to characterize from a structural viewpoint due to their remarkable degrees of freedom in internal motion. Therefore, molecular dynamics simulations have been widely used to delineate the dynamic conformations of oligosaccharides. However, hardly any methods have thus far been available for the comprehensive characterization of simulation-derived conformational ensembles of oligosaccharides. In this research, we attempted to develop a non-linear multivariate analysis by employing a kernel method using two homologous high-mannose-type oligosaccharides composed of ten and eleven residues as model molecules. These oligosaccharides’ conformers derived from simulations were mapped into reproductive kernel Hilbert space with a positive definite function in which all required non-redundant variables for describing the oligosaccharide conformations can be treated in a non-biased manner. By applying Gaussian mixture model clustering, the oligosaccharide conformers were successfully classified by different funnels in the free-energy landscape, enabling a systematic comparison of conformational ensembles of the homologous oligosaccharides. The results shed light on the contributions of intraresidue conformational factors such as the hydroxyl group orientation and/or ring puckering state to their global conformational dynamics. Our methodology will open opportunities to explore oligosaccharides' conformational spaces, and more generally, molecules with high degrees of motional freedom.
Oligosaccharide conformations in solution have been experimentally characterized primarily by nuclear magnetic resonance (NMR) spectroscopy.6–8 However, since oligosaccharides generally possess considerable degrees of freedom in internal motion, their three-dimensional structures fluctuate dynamically and are difficult to delineate simply by experimental approaches. Therefore, dynamic conformations of oligosaccharides have been depicted by employing computational approaches typified by molecular dynamics (MD) simulations.9–12 In this context, we have developed a method to explore conformational spaces occupied by oligosaccharides in solution using replica-exchange MD simulation validated with paramagnetism-assisted NMR data.13–15 Our approach has provided the basis for comparison of conformational spaces among structurally homologous but functionally distinct oligosaccharides.
Traditionally, the comparison of oligosaccharide conformational ensembles has been performed based on Ramachandran-type plots for individual glycosidic linkages.15,16 This approach is useful for comparing oligosaccharides' local conformational features, which, however, are difficult to integrate into overall conformational differences. In contrast, the global conformations of oligosaccharides have been represented simply with end-to-end distances or collision cross sections, although these can provide no conformational details.13,17 Hence, methodological development is needed to highlight the features of dynamic conformational ensembles of oligosaccharides with the integration of local structural characters, including ring puckering and hydroxyl orientation as well as glycosidic linkage conformation. In this study, we attempted to address this issue by developing a non-linear multivariate analysis using a kernel method in conjunction with graph theoretical techniques.
Our test molecules are tri-antennary oligosaccharides composed of eight or nine mannose (Man) and two N-acetyl glucosamine (GlcNAc) residues, which operate as quality control tags of glycoproteins in the early secretory pathway (Fig. 1). These high-mannose-type deca- and undeca-saccharides have 91 and 100 degrees of freedom in internal motion, respectively (vide infra). The kernel method can deal with such multivariate systems because the cost of calculation is independent of the number of variables.18 Here we develop a kernel methodology in which all non-redundant variables needed to describe the oligosaccharide conformations can be treated equally (Fig. 2). First, using an appropriate positive definite kernel function that takes a complete set of the non-redundant variables, the conformational ensembles of oligosaccharides are mapped into reproductive kernel Hilbert space (RKHS). Second, a clustering method based on the Gaussian mixture model (GMM) is applied to the images in RKHS, which detects the numbers of clusters based on the Bayesian information criterion (BIC).19,20 Through the clustering process, the images are classified by different funnels (corresponding to individual clusters) in the free-energy landscape to which the preimage conformers belong. Finally, each cluster is characterized by a statistical approach to reveal the unique conformational features of the target oligosaccharides.
In this paper, we denote the conformational ensemble derived from the MD simulation trajectories as X = (x1,x2,…xn). X consists of N conformers constituted from snapshots xi (i = 1, 2,…,N), and each xi represents a single f-dimensional instance. Note that f is the degree of freedom of the conformation of the molecule.
In the description of the oligosaccharide, one possible set of all non-redundant variables is composed of Cremer–Pople variables (Q,φ,θ), glycosidic linkage dihedrals (ϕ,ψ,ω) and dihedrals, which define the orientation of the hydroxyl group and the acetyl group (designated as γ in this paper).
Thus, vector τi corresponding to xi is expressed by the following equation:
Here, θ, Q and φ denote 1 × p vectors whose elements are Cremer–Pople torsion parameters,25 total puckering amplitude, and generalized spherical coordinate variables, respectively. ϕ and ψ are 1 × q vectors whose elements are values of dihedrals, which define the glycosidic linkage conformations as O5–C1–O–CX′ and C1–O–CX′–CX−1′, respectively. ω are 1 × r vectors whose elements are values of dihedrals defined as O–C6′–C5′–C4′. γ are the 1 × s vectors whose elements are dihedrals defined as Cx−1–Cx–Ox–HxO (γx(x=2,3,4,6)), C4–C5–C6–O6 (γ5), C1–C2–N2–C2N (γN) or C2–N2–C2N–O2N (γo).
p, q, and r are the numbers of 6-membered rings, glycosidic linkages, and α1-6 glycosidic linkages in the oligosaccharide, respectively. s is the number of γ dihedrals. The actual values of p, q, r, and s are 11, 10, 2, and 45 for M9 and 10, 9, 2, and 41 for M8B, respectively. Therefore, the degree of freedom is shown as f = 3p+ 2q + r + s, i.e., 100 and 91 for M9 and M8B, respectively.
The parameters σθ, σφ, σϕ, σψ, σω, and σγ are vectors of parameters for “scaling” each variable and are chosen so that the contribution of all variables to “similarity” shall be equal. Their values were estimated to be the median value of |vl − vm| for all l and m (l,m = 1,2,…,N) where vl denotes a single component of τl.
The kernel function k used in this paper is expressed by:
k(x,x′) = 〈![]() ![]() |
![]() |
G = (k(xi,xj))i,j=1,2,…,N, |
A principal component analysis was performed for , which initially has N-dimensional space. We reduced its dimension to the least number of dimension d* whose cumulative variance ratio is over 0.90 to obtain an RKHS, H*. We denote ξi as an element of H* corresponding to an element of preimage xi throughout this paper. To characterize the conformational ensembles of each oligosaccharide, a mapping from the ensemble X of M9 or M8B into H* was performed. For comparison of conformational ensembles between M9 and M8B, the ensemble data of their common structures, i.e., the whole M8B structure and the partial M9 structure without the ManD2 residue (designated as M9*), were projected into the same feature space.
fc = −kBT![]() ![]() |
Furthermore, empirical distribution in RKHS also reflects the empirical distribution in conformational space because a consistent estimate of the density function value at the fixed point x0 in X can be expressed as28
Thus, we may say that local maxima in the empirical distribution in RKHS represent its corresponding local minima in the conformational free-energy landscape.
The GMM clustering was performed for the above mentioned RKHS, to explore the conformational free-energy landscape. In this model, data points are assumed to be generated from a mixture of C component Gaussian distribution, in which p(ξi) represents the density function of c-th component, where θc is the parameter which is estimated for cluster c (c = 1, 2,…,C). The probability density is
The models are estimated by EM algorithm initialized by hierarchical, ellipsoidal, varying volume, shape, and orientation model-based agglomerative clustering.29 The optimal model is then selected according to the BIC shown as
BIC = b![]() ![]() ![]() ![]() |
Then MMD between two distributions P and Q is shown as
MMD(P,Q) = ||μP − μQ||H, |
By the GMM clustering mentioned above, the images in H* are classified by the funnels in the free-energy landscape to which the corresponding conformers belong. Hence, each clusters’ estimate of the mean point in RKHS corresponds to the embedding of the empirical distribution on conformers in each free-energy funnel.
Now we examine whether a cluster of M9* has its “equivalent” among those of M8B (or vice versa) by calculating the Euclidean distance between their mean points. The estimated mean points of the clusters of M9* and those of M8B are used in the calculation. When one mean point of a cluster of M9* is always greater than a threshold away from the mean points of M8B clusters, the cluster was judged as an M9* characteristic cluster containing M9* characteristic conformers, and vice versa. We set the threshold to the MMD value between the ensembles of M9* and M8B, to 0.100.
The null hypothesis of the Kruskal–Wallis test is that the dependent variable's distribution is the same in the different populations that are to be compared with one another.31 Let k (k ≥ 2) denote the number of populations to be compared. The Kruskal–Wallis test starts by substituting the rank in the overall data set for each measurement value. The sum of the ranks Ri is calculated for each group i (i = 1,2,…,k) of size ni, then the test statistic H is calculated as follows:
We performed a mapping from the ensemble X of M9 or M8B into RKHS with the kernel function k, which considers all information of the global conformation of oligosaccharides equally (Fig. S2, ESI†). The values of parameters in the kernel function were estimated as listed in Table S1 (ESI†). To reduce the computational cost, we performed principal component analysis on
. Consequently, the obtained RKHS H* (Fig. S3, ESI†) for M9 and M8B with the least number of dimension (d*) of 4 and 2, respectively. To characterize the conformational ensemble X in the feature space H* in connection to the conformational free-energy landscape, clustering with GMM was performed on the images ξi(∈H*) mapped from the ensemble of M9 or M8B. The scatterplots in H* are shown in Fig. 3A and B. Twenty-one and five clusters were identified for M9 and M8B, respectively. The population of each cluster is shown in Fig. 3C and D. The relationship between the number of clusters and negative BIC is shown in Fig. S4 (ESI†). Because the mixing proportions of the components are proportional to each cluster's population, they represent the depth of the funnels in the free-energy landscape. These results suggested that M9 and M8B have 21 and 5 meta-stable conformational clusters, respectively.
By the Kruskal–Wallis test, we revealed the variables characterizing each cluster. The characteristic variables differ from one cluster to another and their locations in the 3D structure of the oligosaccharides are not always close to each other (Fig. 4). The results implied that not only glycosidic linkage dynamics but also intraresidue conformational factors, including the orientation of each hydroxyl group and the ring puckering state, make non-negligible contributions to the dynamics of the global conformations of oligosaccharides.
![]() | ||
Fig. 4 The result of the Kruskal–Wallis test to reveal the variables characterizing each cluster of (A) M9 or (B) M8B. The cells (p < 0.01) are colored in navy blue. |
For a more intuitive visualization of the difference among conformational spaces occupied by each cluster, we made scatter plot matrices using end-to-end distances between the reducing-terminal GlcNAc1 residue and each outer mannose residue of M9 and M8B (Fig. 5). In our analysis, 10 clusters of M9 (No. 1, 2, 3, 4, 5, 8, 12, 13, 19, and 20), which occupy approximately 70% in total, commonly exhibited extended structures with the GlcNAc1–ManD2 distance at ∼18 Å and the GlcNAc1–ManD3 distance at ∼23 Å (Fig. S5, ESI†). This is consistent with the previous studies showing that M9 mainly forms extended structures as stable conformers.33–35 However, it should be noted that these clusters differ in some orientation of hydroxyl groups, e.g., γ5 in Man4, or ring puckering states, e.g., φ in GlcNAc1 (Fig. S6, ESI†). On the other hand, the unique clusters commonly exhibiting the GlcNAc1–ManA distance at ∼8 Å (e.g., No. 6, 7, and 17), which are therefore characterized as the fold-back conformations, could be unambiguously extracted from the M9 ensembles. Again, these clusters are different in the hydroxyl group orientations and/or ring puckering.
This implies that our methodology could successfully classify the conformers with the previously known features, taking account of implicit information, i.e. alteration in the hydroxyl group orientation and/or ring puckering state. They may be coupled with rearrangements of the hydrogen bond networks of residues associated with global conformational dynamics.
Clustering with GMM was performed on ξi ∈ H* to gain 19 and 13 clusters for M9* and M8B, respectively (Fig. 6). The population of each cluster is shown in Fig. 6C and D. The scatterplot in H* colored by the clusters is shown in Fig. 6A and B. The relationship between the number of clusters and negative BIC is shown in Fig. S8 (ESI†). Among these clusters, we identified the conformationally distinct clusters between M9* and M8B by the MMD analysis (Fig. S9, ESI†). We found that the eleven (No. 4, 6, 7, 8, 12, 14, 15, 16, 17, 18 and 19) and six (No. 3, 4, 9, 10, 11, and 13) clusters are distinctive for M9* and M8B, respectively (Fig. S10, ESI†).
These distinctive clusters shed light on significant differences in the distribution of end-to-end distance (Fig. S11, ESI†), demonstrating our methodology's utility, which enables unambiguous classification of M9*- and M8B-characteristic conformers. The M9* ensembles exhibited the second maximum in the GlcNAc1–ManA distance distribution at ∼8 Å, while the M8B ensemble has no prominent secondary peak but exhibits a trailing for shorter distances.
The most major cluster among the M9*-characteristic clusters was cluster No. 4 (Fig. 7A), in which ManD2 is in close spatial proximity to GlcNAc1 with characteristic distributions such as ψ between ManA and Man4′ and γ6 in GlcNAc2 (Fig. 7C). The conformers in this cluster are frequently stabilized through the hydrogen bonding network involving ManD2 (O2) and GlcNAc2 (H6O–O6) (Fig. 7E). This hydrogen bond is seen in 50% of conformers in the cluster while only 8% in the whole ensemble of M9*. On the other hand, the M8B ensemble is primarily characterized by cluster No. 3 (Fig. 7B). This cluster exhibits characteristic distributions of hydroxyl group orientations as exemplified by γ3 and γ5 in GlcNAc2 and glycosidic linkage dihedrals exemplified by ψ between Man3 and GlcNAc2 (Fig. 7D), suggesting that rearrangements of the hydrogen bonding network are coupled with the formation of this M8B-characteristic cluster. Indeed, the hydrogen bond between ManA (O2) and GlcNAc2 (H6O–O6) can be seen more than 10 times more frequently among the conformers in this cluster than in the whole ensemble (5.2% in M8B cluster No. 3 and 0.5% in the whole ensemble). Since M9 possesses the ManD2 residue, which is linked to the C2–OH of ManA, it precludes the formation of its hydrogen bonding (Fig. 7F), rendering this cluster-specific for M8B.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0cp06448c |
This journal is © the Owner Societies 2021 |