Computational study on the intramolecular self-organization of the macrorings of some ‘ giant ’ cyclodextrins ( CD n , n = 40 , 70 , 85 , 100 ) †

The conformations of some ‘giant’ cyclodextrins (CDn, n = 40, 70, 85, 100) were examined by molecular dynamic simulations using the Glycam06 force field. CD14 and CD26, the largest cyclodextrins, for which crystallographic data are available, were also studied as reference structures. Principal component analysis was used for the analyses of the simulation trajectories. In cases where band-flips were not present in the starting geometry (e.g. CD40), flips appeared later during the conformational search. The results for CDn (n = 14, 26, 40) confirmed an interesting observation for the distribution of band-flips along the perimeters of the macrorings, namely, band-flips separate portions of lengths of about six or twelve glucoses. This allows the formation of energetically favorable small loops of six–seven glucoses or the creation of short two-turns single helices that further enhance the stability of the structures. It was found that flip dihedrals define distributions of fragment lengths 12–6, 12–12, and 12–12–6 residues in the larger CDs (CD70, CD85, CD100). Contributions from 77% (CD40) to 88% (CD26) are from the first three highest-eigenvalue principal components, i.e., a limited number of modes determine the overall deformations of the macrorings. The flexibility of the macrorings increases, going from CD40 to the CDs, with higher degrees of polymerization. CD14 and CD26 present interesting cases – CD26 manifests domination of one deformation mode (ca. 72%), whereas CD14 demonstrates significantly higher flexibility. These results confirm our earlier conclusion, namely, LR-CDs may have more than one cavity. Thus they have the potential to accommodate more than one substrate molecule, as well as larger species by an ‘induced fit’ mechanism.


Introduction
The large-ring cyclodextrins (LR-CDs; CDs with a degree of polymerization (DP) higher than eight), [1][2][3][4][5] have attracted attention in recent years, and advances were marked in the study of their physicochemical properties, [6][7][8][9][10][11][12][13][14][15] in spite of existing difficulties in their synthesis, isolation and purification. The existence of LR-CDs has been proven by the isolation of these products from the action of 4-α-glucanotransferases on amylose and amylopectin, 2 almost a century after the first evidence for the existence of cyclodextrins. 16 The small cyclo-dextrins provide a heterogeneous microenvironment to molecules, accommodated in their cavities that acquire significantly modified properties compared to the free forms. Therefore, numerous well-known properties useful for the practical applications of small cyclodextrins were determined. The expectations were that some of the LR-CDs may have new, specific properties.
It took about twenty years after the first report of the existence of LR-CDs (DP = 9-14) 17 for the preparation method for LR-CDs mixtures to be worked out and δ-CD (DP = 9) 18 to be isolated and its crystal structure characterized. 19 This opened renewed interest in LR-CDs; within a very short time the crystal determinations of CD10 (ε-CD), 14,[20][21][22] CD14 (ι-CD), 20,21 and CD26 (ν-CD) [23][24][25] were reported. Important steps related to the development of an effective purification method for LR-CDs 4 were the chemical synthesis of a LR-CD -CD9, 26 as well as the synthesis of the first chemically modified LR-CD (CD9). 27 Experimental information about the conformations of LR-CDs is rare. Crystal structure determinations have been till date successful only for four of them. 28 The thermal and structural characterization of the small cyclodextrins have been † Electronic supplementary information (ESI) available: Animated eigenvectors, rms curves, moments of inertia, radius of gyration, variation of flip angles, fluctuations of atomic coordinates, averaged for each residue. See DOI: 10.1039/ c4ob02218a a Institute of Organic Chemistry with Centre of Phytochemistry, Bulgarian Academy of Sciences, ul. Acad. G. Bonchev, bloc 9, 1113 Sofia, Bulgaria. E-mail: ivanov@bas.bg b carried out 29 and references to CDs with more than 60 30 and several hundred 31 glucose units have been made. An advantage of the LR-CDs in comparison with the small CDs, particularly with α-CD and β-CD, is that they are without any significant toxicity and that nutritionally they can be regarded as starch. 2 Commercially available CD-mixtures containing LR-CDs with a degree of polymerization from 9 to 21 were examined as additives for food (retrogradation retardant in breads, for freeze resistant jellies and for production of non-sticky rice) and drink products (high energy additive to soft drinks) for improving their texture, mouth feeling, flavor, taste, and palatability. 6,11 Others were studied for the stabilization and solubilization of drugs. 32 LR-CDs mixtures with a degree of polymerization from 22 to 45 and greater than 50 exhibited an efficient artificial chaperone effect for protein refolding (a protein refolding kit containing a mixture of LR-CDs as one of the active components is in the market 3the first practical application of LR-CDs in biotechnology 33 ). Such LR-CDs mixtures were able to strip detergent molecules of unfolded protein-detergent complexes, thus allowing the protein molecules to refold to their proper, folded, active state. 2 LR-CDs have been suggested for applications in the paper industry as an improved paper coating material and as starch substitutes in adhesives and biodegradable plastics. 11 Because the large CDs are able to present a variety of cavity sizes, compared to the small CDs, they may be useful for special applications. δ-CD, for example, has demonstrated to form a stable complex with C 70 buckminsterfullerene that allows its solubilization in water. 34 It has been proven that η-CD (12 glucoses) is effective in the partial separation of carbon nanotubes. 35 Increased interest is evident from the previous studies on developing techniques for isolation of LR-CDs 36,37 and for examining the properties of these macromolecules as new polysaccharidebased biomaterials. 38 The conformations of some LR-CDs were systematically examined using molecular dynamics simulations as a conformational search protocol. [39][40][41][42][43][44][45][46][47][48][49][50][51] We started our studies by performing 5.0 ns simulations on several LR-CDs with degrees of polymerization in the range from 40 to 100. 39 The results were indicative for a variety of geometries with more than one small cavity. Now we can execute an order of magnitude longer simulations, attempting to arrive at more grounded conclusions about the intramolecular self-organization of the macrorings of such 'giant' cyclodextrins. CD14 and CD26 were also studied as reference structures using crystallographic coordinates for the input.

Computational details
The conformational search based on molecular dynamics simulations The MD simulation trajectories were obtained with the AMBER program (versions 10 52 and 11 53 ) using the parameterization for carbohydrates, Glycam06. 54 The same starting geometries as in our earlier study were adopted for CDn (n = 40, 70, 85, 100) ( Fig. S1 †), 39 whereas crystal coordinates were used for CD14 20 and CD26. 23 The molecular dynamics simulations were run in a simulated aqueous solution (a box with TIP4P 55 water molecules) using the particle mesh Ewald (PME) method [56][57][58][59] for the treatment of the long-range electrostatics. A 9.0 Å distance cutoff was used for direct space nonbonded calculations and a 1.0 × 10 −5 Ewald convergence tolerance for the inclusion of long-range electrostatic contributions. The 'solvateBox' command of LEaP was used to create a cubic solvent box around the CD with buffer distances of 15.0 Å between the walls of the box and the closest atoms of the solute. The dimensions of the periodic TIP4P cubic boxes and the number of water molecules were as follows: CD14 (56.8 Å; 5440), CD26 (60.2 Å; 6364), CD40 (75.9 Å; 13 094), CD70 (93.9 Å; 24 982), CD85 (98.5 Å; 28 846), CD100 (107.1 Å; 37 459). The SHAKE option (tolerance 5.0 × 10 −5 Å) was used for constraining bonds involving hydrogen atoms. The same protocol from our previous studies was used for the preparation of the systems for the simulation, as well as for the principal component analyses: 50,51 (i) 5000 steps steepest descent and 200 steps conjugate gradient minimization of the LR-CD in the gas phase, followed by 5000 steps steepest descent and 400 steps conjugate gradient minimization of the entire (LR-CD plus water molecules) completely unrestrained system; (ii) 50 000 steps steepest descent minimization with holding the solute fixed with positional restraints; (iii) 25.0 ps unrestrained MD were run at 100 K on the water alone with holding the LR-CD constrained (this is the stage of the equilibration process where the bulk of the water relaxation takes place); (iv) gradual release of the restraints on the LR-CD in a series of minimizations and MD steps: 1000 steps minimization and 3.0 ps MD with 25.0 kcal mol −1 Å position restraints, followed by five rounds of 600 steps minimization, reducing the positional restraints by 5.0 kcal mol −1 Å each run; (v) The process was completed with a short MD simulation (NVT) after heating the system gradually to 300 K. An additional 500.0 ps simulation (NPT) was executed before starting the productive runs. The productive runs were performed with the recommended maximum time-step of 2.0 fs when SHAKE is used, at 300 K, using Berendsen weak coupling algorithm for temperature regulation 60 and an average pressure of 1.0 bar with isotropic position scaling, using the Berendsen weak coupling algorithm for pressure regulations. 60 Samplings were taken every 2.0 ps. The duration of the MD conformational searches was 100.0 ns, except for CD14, in which case it was 50.0 ns. When examining the modes of conformational interconversions and the dynamics of the intramolecular self-organization, monitoring the appearance and the disappearance of band-flips is of particular importance (variation with the simulation time of the flip angle, O3(n)⋯C4(n)⋯C1(n + 1)⋯O2(n + 1)the dihedral flip between secondary hydroxyls of adjacent glucoses).

Principal component analysis
Principal component analysis (PCA), also called quasiharmonic analysis or essential dynamics method, 61-63 is a very useful analysis tool for MD trajectories. [64][65][66][67][68][69][70][71][72][73] This statistical method allows trajectories obtained from MD simulations to be analyzed by reducing the degrees of freedom of the system to lower dimensions. 67 In large complex systems this approach provides a useful method to extract the most important information pertaining to the essential physics of a biomolecular process such as protein folding or molecular recognition. As a result we can monitor the concerted motions of the atoms of the molecule in a few dimensions, making it easier to visualize and investigate these motions. 67,68 PCA is a linear transformation applied to the fluctuations in the Cartesian coordinates, represented as a matrixpositional covariance matrix, C, derived from the Cartesian snapshots collected along the MD simulation trajectory. C = (3N) −1 TT T , where the 3N rows of matrix T are the coordinates of the N atoms and the columns of T hold the data for the successive time points (snapshots) in the trajectory. T is processed in order to remove the global translation and rotation of the structure, and then the average structure is subtracted. A dihedral angle representation of the method has been also devised. 74,75 Diagonalization of C, to solve Λ = V T CV for an N atom system, provides a set of 3N orthogonal eigenvectors, v n , as columns of matrix V, as well as the corresponding eigenvalues, λ n , the diagonal elements of Λ. 65,73 The eigenvectors provide a vectorial representation of each mode of structural deformation, and the eigenvalue for a mode indicates the relative contribution that this mode has made to motion within the trajectory. 65 Ordering the eigenvalues of the transformation decreasingly, it can be determined how many principal components (usually only a few) capture the 'essential dynamics' of the system (defined as a given variance threshold: typically 90%-95% 73 ). An analysis based on the dominant PCA modes naturally poses the question about the convergence of the first few modes with the length of the simulation. 72,76,77 The duration of the simulation had to be long enough in order to extract by PCA longer time-scale motions. By this method, molecular dynamics trajectories can be explained satisfactorily in terms of a small number of variables (essential degrees of freedom). In addition, the residues whose contributions are important in the formation and stability of different structural elements can be identified by PCA. 78 The covariance matrix, whose diagonalization gives PCA, is important in describing the pathway of the dynamic process and the relation between the movements of different regions of the molecule during the process. 67 Several steps were executed in order to carry out the PCA analysis: (i) processing of the MD simulation trajectory file by removing the coordinates of the water molecules; (ii) preparation of the modified trajectory file for compression of the macroring atomic coordinates by superimposing all trajectory snapshots in a manner to obtain an average structure for the entire trajectory (in this way we effectively removed the overall translational and rotational motions of the cyclodextrin molecule 72 ); (iii) compression of the trajectory file; (iv) preparation of the covariance matrix, in which the atomic coordinates are the variables; (v) diagonalization of the covariance matrix for calculating the eigenvectors and the corresponding eigenvalues. The sum of all eigenvalues presents the total variance of the trajectory, whereas each eigenvalue contributes that part of the total variance, which is explained by the corresponding eigenvector. The predetermined desirable part of the variance of the starting trajectory can be preserved at the compression stage. In our analysis we used the default value, 90%, suggested for the compression quality in the PCAsuite software. 79 It means that 90% of the variance of the original trajectory is contained in the compressed trajectory.  (Fig. S12 †)).
All computed structural parameters have values in the range of the experimental structural determinations available (Table S1, ESI †). The computed average values for dihedrals ϕ and ψ fall slightly below the minimum values determined for CD26 from crystallographic determinations, whereas the computed values for angle O4(n)⋯O4(n + 1)⋯O4(n + 2) are slightly larger than the maximum values for CD26 determined experimentally. The other computed structural parameters closely resemble the results obtained with the parm99 AMBER parameterization. 39 The variations with time of two descriptors of the molecular size were also estimated: the moment of inertia (Fig. S9, ESI †) and the radius of gyration (also provides an absolute measure of compactness) (Fig. S10, ESI †). Both set of curves present similarities for each cyclodextrin. CD40 displays the most pro-  51 namely, band-flips separate portions of lengths about six or twelve glucoses. This allows the formation of energetically favorable small loops of six-seven glucoses or the creation of a short two-turns single helix that further enhances the stability of the structure. 50 The appearance of a new band-flip of dihedral angle 19 of CD40 at  The PCA analysis was carried out for each time interval with duration of 10.0 ns (Table 1 and S2, ESI †). The first six eigenvectors (having the highest eigenvalues and the lowest index numbers) describe more than 90% of the total variance in all cases. Contributions from 77% (CD40) to 88% (CD26) are from the first three highest-eigenvalue principal components, i.e., a limited number of modes determine the overall deformation of the macroring. The average percentage contributions are in the ranges from 45% to 72%, from 11% to 24%, and from 6% to 12%, for the first, the second and the third eigenvectors, respectively. If we assume the ratio of percentage contribution for the first animated eigenvector to the second animated eigenvector as an indication of the difference in flexibility of the macrocycles (lower ratio indicates higher flexibility), then the flexibility of the macrorings indeed increases, going from CD40 to the CDs, with higher degrees of polymerization. CD14 and CD26 present interesting cases -CD26 manifests domination of one deformation mode (ca. 72%, ev1), whereas CD14 demonstrates significantly higher flexibility. This is in accord with our conclusions made earlier for CD14. 44,48,50 Two loops situated in mutually perpendicular planes were clearly seen in the average structure of CD14 when the Glycam04 parameterization was used. 45 The eigenvector explaining the largest part for the variance of the trajectory of CD14 corresponded to a deformation, for which a conformation containing big and small loops, situated in perpendicular planes, assumes an intermediate symmetrical figureeight-looking geometry. 45 The present modeling studies characterize the same cyclodextrin as tending to keep the starting deformed boat-like form to a greater extent (Fig. 2), although figure-eight-like deformations are still visible from the examination of the computed average geometries and the deformation modes ( Fig. 2 and 4, and Fig. S2 and S5, ESI †). The glucoses with the maximum fluctuations of atomic coordinates, averaged for each residue, are situated diametrically along the perimeter of the macroring (Fig. S12, ESI †).
The first three animated eigenvectors of CD26, illustrating the most important deformation modes for conformational interconversions of the macroring, are given in Fig. 4 and S6, ESI. † The differences between the results obtained with the two Glycam parameterizations (Glycam04 44,46 and Glycam06 (this work)) manifest themselves from the very first stages of the conformational search for CD26 (Fig. S1, ESI, † e.g., the geometry after the first 500.0 ps). Two single helical turns at the opposite sides of the macroring are oriented inwards to the cavity of the macroring in this case (Fig. S3, ESI, † for the first 10.0 ns and the average for the entire trajectory) resulting in more compact optimized geometries. Accordingly, smaller values for the radius of gyration were estimated with Glycam06 (compare Fig. S10, ESI, † in this work with Fig. 3S † in ref. 46). Three glucose residues have the maximum fluctuations of atomic coordinates (i.e., maximum deformations). They define sequences of residues 3→11→22 and 4→15→22 for the first and for the second animated eigenvectors, respectively (Fig. S12, ESI †). The different orientation of the loops is the striking dissimilarity between the representative confor-mations of CD26, obtained with Glycam-04 44,46 and Glycam-06. This observation originates from the differences in the estimates for the relative energies of the flip conformations by the   a The ratio of the eigenvalues of the first three most significant eigenvectors with the explained variance in percent (weight). b 0.0 ns to 50.0 ns for CD14. two parameterizations, 80 and this also has impact on the energetics of the syn/flip conformational transitions. These results confirm our earlier conclusion, 39 namely, LR-CDs may have more than one cavity (Fig. 3). Thus, they have the potential to accommodate more than one substrate molecule, as well as larger species, such as nanotubes, by an 'induced fit' mechanism. 35 The starting geometries of CDn (n = 40, 70, 85, 100) contain spiral portions with different number of turns, big elongated loops and extended portions that close cavities of different sizes. CD40 displays the typical doubly wound extended single helical fragments with a loop of eight to nine glucoses at one of the ends. The first deformation mode (ev1) transforms the extended geometry into a bent structure (Fig. 5 (CD40_ev1-5) and Fig. S7, ESI †). The macrorings of the other LR-CDs present extended helical strands or helices of two or three turns forming several cavities. Three main fragments characterize the overall appearance of CD70: two spirals of two and three turns and an elongated portion of winded helical single chains. The principal deformations comprise stretching/compression of the elongated portion, at the expense of the size of the two short helixes, with concomitant variations of the mutual orientation of the two spiral portions, namely, the axes of the two spirals change from parallel to each other to perpendicular to each other (Fig. S7, ESI †). Two to three elongated portions with loops at the ends are formed along the perimeters of the largest CD85 and CD100 macrorings that may allow the accommodation of larger species in their cavities by an 'induced fit' mechanism ( Fig. S4 and S7, ESI †). The larger perimeter of the CD85 macroring allows deformations that produce a wider cavity of irregular shape. Two elongated parts of the CD100 macrocycle 'breath' between states with these portions parallel or perpendicular to each other (Fig. S7, ESI †).
Fluctuations of the atomic coordinate of the LR-CDs (in Å), averaged for each residue, are represented in Fig. S12, ESI. † These graphics contain valuable information about the fragments of the macrorings with the most significant contribution to the conformational deformations. The variations in the shape of these graphics correlate with the appearance and the disappearance of flip dihedral angles between neighbor glucose units (Fig. S11, ESI †). Two glucose residues, diametrically positioned along the perimeter of the CD40 macroring, possess the maximum fluctuations of atomic coordinates for the first animated eigenvector. Four glucoses undergo significant atomic fluctuations within the second and the third modes, and they are evenly distributed along the macroring (sequences 6→15→25→35 and 6→14→24→33 for the second and for the third eigenvectors, respectively (Fig. S12, ESI †)). Six residues of CD70 have the largest fluctuations of atomic coordinates for the first three animated eigenvectors, namely, 12→22→35→47→55→67 (ev1), 16→25→36→47→55→68 (ev2) and 12→25→33→44→53→67 (ev3). Larger numbers of residues may have significant fluctuations of atomic coordinates for the two largest cyclodextrins and they are evenly distributed along the perimeters of the macrorings, e.g., for the first animated eigenvectors, CD85 (3→13→25→35→45→52→63→71) and CD100 (12→28→40→46→56→73→82→90→100). In addition, as it can be deduced from Fig. 3, CD85 changes from a random coil at the beginning of the simulation to a nearly helical arrangement at the mid and finally loses part of the helicity at the end of the simulation. Something similar happens to CD100 that starts off to become a random coil and gradually changes to a more helical conformation.
It may be interesting to compare our computational results for the largest computed CDs (85 and 100 glucose units) with the already known behavior of amylose in aqueous solution, due to the similarity of the systems. Many models have been used to explain the conformational behavior of amylose and they can easily be classified into three different types: (i) a random coil, (ii) an interrupted helix with 10-15 helical turns per region, and (iii) a loosely wound helical chain. 81 However, hydrodynamic studies on amylose solutions, even though not totally consistent with one another, have shown that amylose behaves as a random coil. Interestingly, amylose is metastable in neutral aqueous solutions, indicating aggregation of amylose into large particles or changes in conformation to a more crystalline and compact structure. This observation was also found in DMSO as a solvent. 82 More recently, measurements of static and dynamic light scattering detected that amylose was structured as a random coil in freshly prepared alkaline and briny aqueous solutions. 83 However, after long storage (about 30 days) an increase in the scattered light polarization was observed, a fact which was attributed to a coil-tohelix transition followed by helix amylose crystallization. Spectroscopic studies ( 13 C-NMR) 84 on the amylose conformational behavior in binary DMSO-water mixtures indicate that amylose is in a helical conformation below 33% water but from this point on, the helix is progressively lost and when water is over 66% the transition to a random coil is complete. That work also demonstrates that when the water content is over 60%, amylose can form a complex with iodine or butanol, also showing either the capability of modifying the existing "cavities" or the existence of more than one different cavity.
Although our simulations are extremely short when compared with real experiments, in our opinion they already show this capability of amylose (or of very long glucose chains) to adopt different conformations varying from random coils to more organized helical structures ( Fig. 3 and 5). In addition, our previous finding (the existence of more than one cavity) is reinforced by the experiment described in the previous paragraph and it is probably very convenient to reformulate it as "LR-CDs are flexible enough as to adopt different conformations, each one presenting a different cavity size".

Conclusions
The conformations of some 'giant' cyclodextrins (CDn, n = 40, 70, 85, 100) were examined by molecular dynamic simulations using the Glycam06 force field. CD14 and CD26 were also studied as reference structures. Principal component analysis was used for the analyses of the simulation trajectories.
In cases where band-flips were not present in the starting geometry (e.g., CD40), flips appeared later during the conformational search. The results for CDn (n = 14, 26, 40) confirmed an interesting observation for the distribution of band-flips along the perimeters of the macrorings, namely, band-flips separate portions of lengths about six or twelve glucoses. This allows the formation of energetically favorable small loops of six-seven glucoses, also detected in molecular dynamics simulations of acyclic oligomers, 85,86 or the creation of short twoturns single helices that further enhance the stability of the structures. It was found that flip dihedrals define distributions of fragment lengths 12-6, 12-12, and 12-12-6 residues in the larger CDs (CD70, CD85, CD100). The different orientation of the loops is the striking dissimilarity between the representative conformations of CD26, compared with previous results with the Glycam-04 force field. This observation originates from the differences in the estimates for the relative energies of the flip conformations by the two parameterizations that may have impact on the energetics of the syn/flip conformational transitions.
Contributions to the total variance from 77% (CD40) to 88% (CD26) are from the first three highest-eigenvalue principal components, i.e., a limited number of modes determine the overall deformations of the macrorings. The flexibility of the macrorings increases going from CD40 to the CDs with higher degrees of polymerization. CD14 and CD26 present interesting casesone deformation mode dominates for CD26 (ca. 72%), whereas CD14 demonstrates significantly higher flexibility. These results confirm our earlier conclusion, namely, LR-CDs may have more than one cavity. Thus, they have the potential to accommodate more than one substrate molecule, as well as larger species by an 'induced fit' mechanism.
Our expectations are that we will witness advances in the near future in the experimental examination of some of these structures that may open the way for their further practical applications. We consider a further natural step to be the examination of inclusion complexes of LR-CDs. For the first time inclusion complexes of some lower-size LR-CDs were examined computationally, namely, complexes of CDn (n = 13, 14, 26) with adamantane, and they revealed interesting differences in the behavior of the three macrorings when a substrate molecule is included in the cavity. 51