Emily A. Verhaega,
Hiromichi Hoshina
*b,
Jun Kikuchi
c,
Luca Catalano
*de and
Michael T. Ruggiero
*ef
aDepartment of Chemistry, University of Vermont, 82 University Place, Burlington, Vermont 05405, USA
bRIKEN, Center for Advanced Photonics, 519-1399 Aramaki-Aoba, Aoba-ku, Sendai, Miyagi 9800845, Japan. E-mail: hoshina@riken.jp
cRIKEN, Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
dDynamic Molecular Materials Laboratory, Department of Life Sciences, University of Modena and Reggio Emilia, Modena 41125, Italy. E-mail: luca.catalano@unimore.it
eDepartment of Chemistry, University of Rochester, Rochester, NY 14627, USA. E-mail: michael.ruggiero@uvm.edu
fDepartment of Chemical Engineering, University of Rochester, Rochester, NY 14627, USA
First published on 25th June 2025
Cellulose is a common polymer found in natural sources, with the potential to be used in a wide variety of green and technologically relevant applications. Despite years of effort, the precise three-dimensional structures of two crystalline polymorphs of cellulose, Iα and Iβ, are currently still unknown due to the presence of disorder in the intermolecular hydrogen bond networks, hampering the in-depth understanding of the structure–property relationship of this crystalline material. Disorder in the hydrogen bond networks of cellulose I polymorphs was investigated using terahertz spectroscopy, powder X-ray diffraction, and solid-state density functional theory in order to reveal previously undiscovered atomic-level details about the crystal structures. We show that the nature of the (dis)order in the hydrogen-bonded layers has a direct effect on the terahertz vibrational spectra, providing contrast that allows differentiating between various structures spectroscopically. Comparison between theoretical and experimental data indicates that these two static networks coexist spatially throughout cellulose I polymorphs.
Cellulose is known to exist as many different bulk structures, including amorphous and crystalline forms. A large number of crystalline polymorphs have been reported, including celluloses I, II, III, and IV.1,8–10 These all contain polymeric chains with the cellobiose repeat unit, but differ in the way the chains pack in three dimensions. The rich structural diversity originates from the various combinations of intermolecular hydrogen bonds that are possible, which affect the bulk structures of the materials.11 Cellulose I, known as native cellulose, is the most common polymorph, and can be isolated from natural sources, including bacteria, algae, and plants.12
Determining the atomic-level crystal structure of native cellulose has been the subject of ongoing debate due to the ambiguous data collected from various cellulose I samples.1 While it was originally believed that all native celluloses crystallize in the same structure, recent studies identified two discrete crystal phases of cellulose I, commonly referred to as cellulose Iα and cellulose Iβ. While cellulose Iα is primarily found in bacteria and cellulose Iβ is primarily found in plants, they are often both present in the same sample.2,13,14 The crystal structures of cellulose Iα and cellulose Iβ were determined by Nishiyama et al. in 2003 and 2002, respectively.15,16 The difference between cellulose Iα and cellulose Iβ stems from a difference in the offset between layers. This slight difference is what led to the initial confusion when the cellulose I crystal structure was originally investigated.
The structure studies revealed that both polymorphs of cellulose I have disordered hydrogen bond networks that lead to two possible supramolecular motifs for each polymorph, referred to as A and B.15,16 Thus, despite years of work, the exact structures of cellulose I polymorphs remain unknown. While diffraction methods are the gold standard for structural characterization, alternative methods for determining crystal structures exist. One example is low-frequency vibrational spectroscopy, which excites large-amplitude vibrations of entire molecules at terahertz frequencies, effectively probing weak intermolecular forces in crystals.17,18 Due to the dependence of these weak forces on bulk structures,19 low-frequency vibrational spectroscopy has been used in numerous occasions to identify structures,20–25 specifically including the structure of biopolymers.26,27 Here, both the Iα and Iβ polymorphs of native cellulose were studied using terahertz spectroscopy and powder X-ray diffraction (PXRD). Modeling possible crystal structures and comparing the simulated data to experimental data provide insight into the accuracy of the proposed structures. This reveals previously unrevealed details about the atomic level organization of cellulose I.
Frequency calculations were performed on each optimized structure using the same level of theory, with the convergence set to a more strict threshold of ΔE ≤ 10−11 hartree. Vibrational modes (eigenvectors) and their frequencies (eigenvalues) were predicted numerically using the harmonic approximation,38,39 and IR intensities were determined through the Berry phase approach.40,41
The calculated frequency and intensity of each low-frequency (0–7.5 THz) IR active mode were used to generate a theoretical terahertz spectrum for each structure. Each vibrational mode was convolved using a Gaussian function, with the full width at half maximum (FWHM) of the bands set to 2.43 cm−1 for Iα and 2.64 cm−1 for Iβ, which were determined empirically based on the experimental peak widths. For the cellulose Iα frequency calculations, the experimental use of carbon-13 isotopes was taken into account by utilizing the larger mass of carbon when calculating the vibrational frequencies. We emphasize that the simulated terahertz spectra presented herein have not been subjected to any empirical frequency scaling factors.
Finally, PXRD patterns were predicted for each optimized geometry of cellulose I. This was done using the Mercury software.42 A wavelength of 1.54 Å (Cu Kα) was used to generate the theoretical patterns, and each peak was convolved using a symmetric pseudo–Voight peak shape, with the FWHM set to 0.7° for all Iα and Iβ structures.
For both polymorphs, the parallel polymer chains form hydrogen-bonded layers, with each disaccharide building unit, called cellobiose, participating in seven intralayer hydrogen bonds.15 This makes the attraction between co-planar chains notably strong, while the lack of interlayer hydrogen bonds in the stacking direction results in sheets that are weakly bound to adjacent layers, primarily through van der Waals forces. Fig. 1 presents the structures of two single layers extracted from cellulose Iβ. The organization of how these layers stack is what differentiates between the two crystal structures, with both shown in Fig. 1. Layers within cellulose Iα lie in the (0 1 1) Miller plane. Adjacent layers are offset by about a/4 along the direction of the chain. Layers within cellulose Iβ lie in the (0 0 1) and (0 0 2) planes. Alternating layers are offset by about b/4 along the direction of the chain.
In cellulose Iβ, the organization of the layers gives rise to an origin sheet and a center sheet. They lie along the origin ((0 0 1) plane) and center ((0 0 2) plane) of the unit cell, respectively, as shown in Fig. 1. While there is not a significant difference between the two sheets, there are slight differences that arise from the packing of the layers. The layers can pack together more tightly due to the strained conformation of the center chains.16 Additionally, cellulose Iβ contains more interlayer C–H⋯O bonds than cellulose Iα. In the experimentally determined Iα crystal,15 the interlayer distance is 3.95 Å, while in Iβ,16 the interlayer distance is 3.89 Å, indicating that the layers in Iβ are more strongly bound. This helps to account for the increased stability of the Iβ form compared to that of the Iα form.15
The X-ray and neutron diffraction determined structures indicated that some of the hydroxyl groups were disordered in both polymorphs.15,16 Because hydrogen bonds occur between each hydroxyl group and a nearby oxygen, the disorder in hydroxyl group orientations leads to disorder in the hydrogen bond network. Nishiyama et al. concluded that there must be two possibilities for the orientation of each hydroxyl group that exhibits disorder, corresponding to two possible networks of hydrogen bonds for each polymorph.15,16 The hydrogen bonds branch between adjacent polysaccharide chains, which means that the orientation of each hydroxyl group is dependent on the others in the same layer. It is important to note that the disorder is not dynamic, but arises from the presence of two individual conformations. The two networks are at such deep energetic minima that an interconversion between the two would require a large activation barrier to be overcome.44 Additionally, because the hydrogen bond networks extend through the entire layer, a conversion from one network to the other would require all hydroxyl groups to switch orientations simultaneously, which is extremely unlikely at room temperature.44 This leads to the conclusion that there are two hydrogen-bonded layers of cellulose I: an A layer and a B layer. The two sets of hydroxyl group orientations that are present in both polymorphs are shown in Fig. 1.
One question that remains unanswered is whether the two networks coexist throughout cellulose I crystals or if they exist exclusively within distinct domains. In order to investigate the precise atomic structure and hydrogen-bond ordering in cellulose I polymorphs, DFT simulations were initiated. Three variations of cellulose Iα were used – a structure with all hydroxyl groups in the A conformation (IαA), a structure with all hydroxyl groups in the B conformation (IαB), and a structure with alternating layers of A and B (IαAB). Because of the difference between the origin and center sheets in cellulose Iβ, there are four variations of the structure – one with all A layers (IβA), one with all B layers (IβB), one with A origin sheets and B center sheets (IβAB), and one with B origin sheets and A center sheets (IβBA). All layers in cellulose Iα are symmetrically equivalent, so IαBA is identical to IαAB, and therefore IαBA does not need to be considered. Initially, geometry optimizations for all seven crystals were performed. All unit cell side lengths and angles, as well as the errors between the PBE/VTZP optimized and experimentally determined parameters, are reported in Table 1. In order to directly compare the cellulose IαAB structure to the others, the lengths of the optimized a and b unit cell vectors were halved, as these dimensions were doubled prior to the optimization to allow for altering A and B layers. It is clear that the IαA, IβA, and IβAB structures show the best agreement with the experimentally determined unit cell parameters, with the errors in the unit cell dimensions averaging to less than one percent. This indicates that the B network results in an optimized unit cell that disagrees with the experimentally determined unit cell.
Dimension | Experiment | A | B | AB | BA | |
---|---|---|---|---|---|---|
Iα | a (Å) | 10.400 | 10.403 | 10.435 | 10.388 | |
b (Å) | 6.717 | 6.605 | 6.496 | 6.940 | ||
c (Å) | 5.962 | 5.911 | 6.099 | 5.791 | ||
α (°) | 80.37 | 80.99 | 85.13 | 81.28 | ||
β (°) | 118.08 | 117.45 | 121.66 | 115.74 | ||
γ (°) | 114.80 | 114.63 | 105.21 | 117.70 | ||
Average error | 0.665% | 3.869% | 1.994% | |||
Relative energy (kJ mol−1) | +2.078 | +42.749 | +21.154 | |||
Iβ | a (Å) | 8.201 | 8.149 | 8.562 | 8.117 | 8.155 |
b (Å) | 10.380 | 10.404 | 10.297 | 10.379 | 10.430 | |
c (Å) | 7.784 | 7.728 | 7.811 | 7.912 | 7.952 | |
β (°) | 96.55 | 96.34 | 100.24 | 96.14 | 89.13 | |
Average error | 0.447% | 2.341% | 0.776% | 2.720% | ||
Relative energy (kJ mol−1) | 0 | +50.907 | +28.053 | +60.134 |
In addition to comparing the unit cell parameters of the seven optimized structures, the bulk energy of each crystal is reported in Table 1. For each structure, the energy of one unit cell was divided by the number of cellobiose units contained in the unit cell. Because the energy of IβA was the lowest, its value was set to zero and the rest of the energies are reported relative to IβA. It can be seen that the A and AB layers of both Iα and Iβ are significantly lower in energy than the B and BA layers. For cellulose Iα, the structure including IαAB is more stable than consecutive layers of A and B at distinct locations. This is also true for cellulose Iβ, as long as the layers that take the B form are the center sheets of the crystal. The fact that the A hydrogen bond network is more energetically favorable than the B hydrogen bond network is consistent with previous results determined using other computational methods.44
To further evaluate the proposed structures of each polymorph, theoretical PXRD patterns of the optimized structures were generated and compared to experimental data. Fig. 2 presents the theoretical and experimental patterns of both polymorphs. By looking at the peaks around 2θ = 15°, it can be observed that the IαB, IβB, and IβBA patterns differ the most from the experimentally determined patterns, with those same structures being the least stable energetically of all seven studied structures. To quantify the accuracy of each predicted pattern, the three major reflections for each of the data sets are listed in Table 2 as well as the average error between the reflections from each proposed structure and the experiment. The structures that produce PXRD reflections with errors that average to be about one percent or less are IαA, IαAB, IβA, and IβAB. Overall, it is clear that the orientation of the hydroxyl groups has a drastic effect on the intermolecular forces – and by extension, the bulk structures.
Reflection | Experiment | A | B | AB | BA | |
---|---|---|---|---|---|---|
Iα | (0 1 0) | 14.5 | 14.7 | 14.2 | 14.4 | |
(0 0 1) | 16.9 | 16.9 | 17.1 | 17.0 | ||
(0 1 1) | 22.7 | 23.0 | 23.0 | 23.1 | ||
Average error | 0.900% | 1.525% | 1.014% | |||
Iβ | (1 0 −1) | 14.9 | 15.0 | 14.1 | 14.8 | 15.5 |
(1 0 1) | 16.6 | 16.7 | 16.9 | 16.5 | 15.7 | |
(0 0 2) | 22.9 | 23.1 | 23.1 | 22.6 | 22.3 | |
Average error | 0.716% | 2.683% | 0.861% | 4.023% |
Due to the relatively limited number of reflections inherent to these cellulose polymorphs, full-pattern refinement methods, such as Rietveld refinements, were not employed here, as they would not significantly enhance structural assignments beyond what is already confidently established by our combined experimental and theoretical analyses. And while XRD is effective for studying static structures, low-frequency vibrational spectroscopy is a powerful tool for investigating the interatomic forces in solids. Due to the dependence of terahertz motions on weak, long-range forces, it can be used to effectively probe weak intermolecular forces, and by extension to indirectly probe the structure, as each unique structure will have a unique spectral fingerprint.17,19,45
Fig. 3 shows the simulated spectra of each of the Iα structures overlaid with the experimental spectrum. It is clear that neither the IαA nor IαB spectra alone are able to account for all of the experimentally observed transitions. There is a peak in the experimental spectrum at about 4.7 THz that is not accurately predicted by either IαA or IαB, but is present in the IαAB spectrum. While it appears that the simulated mode at 4.9 THz in the IαB spectrum could correspond to the peak at 4.7 THz, the lattice motion associated with the eigenvector matches the motion of the mode at 5.0 THz in the IαA spectrum, and given that the simulated lattice is expanded with respect to the experimental parameters, it is likely that this mode is estimated to be red-shifted. By visualizing the motion associated with the 4.7 THz peak in the IαAB spectrum, it is clear that this mode is due to asymmetry in the layers of the crystal. The A and B layers are vibrating out of phase with each other, which is only possible because of the fact that they are nonequivalent. In the IαA and IαB crystals, all layers are symmetrically equivalent, so there is no potential for asymmetric motion, and thus this mode at 4.7 THz is only activated when the two different layers are adjacent to one another.
![]() | ||
Fig. 3 Simulated terahertz spectra for IαA, IαB, and IαAB. Each is overlaid with the experimental spectrum. |
Fig. 4 shows the simulated spectra of each cellulose Iβ structure, again with the experimental data. The peak in the experimental spectrum around 2.1 THz is not accurately predicted for the IβA or IβBA structures. While it is present for both the IβB and IβAB structures, the motion associated with this frequency is different for the two structures. Although there are different modes predicted in each structure, the concept of additional motion due to asymmetry in the layers does not apply here like it does in the Iα polymorph. Because of the alternating origin and center sheets in Iβ, adjacent layers are already asymmetric, so the concept of alternating A and B layers does not add asymmetry to the structure.
![]() | ||
Fig. 4 Simulated terahertz spectra for IβA, IβB, IβAB, and IβBA. Each is overlaid with the experimental spectrum. |
Similar to the vibrational mode at 2.1 THz, the region of 4.5–5 THz shows similarities between simulated IβB, IβAB, and IβBA spectra, but the associated motion is different due to the difference between structures. Finally, the range 5.75–6 THz only contains predicted peaks in the IβAB structure, which means that this motion is a direct result of the interaction between A origin sheets and B center sheets.
In order to learn more about the atomic level details of cellulose I polymorphs, observations from the data presented above are used to determine the organization of the hydrogen bond networks.
Nishiyama et al. determined that cellulose Iα contains 55% A layers and 45% B layers from neutron diffraction experiments.15 Evaluation of the three proposed structures via analysis of PXRD patterns showed that IαA and IαAB best match experiment, with errors of about one percent. The error of IαB is almost double that of IαAB. When it comes to the vibrational spectra, the mode at 4.7 THz is only predicted by IαAB, while IαB shows significant disagreement with the experiment.
In order to maintain the occupancy of the hydrogen atoms at the A positions at the determined 55%, there are multiple possibilities for the concentration of each proposed structure. Cellulose Iα could be composed of 55% IαA and 45% IαB, 10% IαA and 90% IαAB, or a number of other combinations that contain all three structures. Looking at the strong disagreement between the theoretical IαB data and experimental data – as well as the large energetic instability (see Table 1) – it is likely that this structure is actually not present on its own, which leaves the only possible combination to arrive at the reported concentration of layers determined previously to be 10% IαA and 90% IαAB.
These concentrations are used to calculate the weighted average of the simulated terahertz spectra and PXRD patterns, which are shown in Fig. 5. The averaged PXRD pattern shows good agreement with the experimental data. Almost all of the peaks in the experimental and theoretical terahertz spectrum are in agreement, with some minor disagreements in the 6–6.5 THz region. This indicates that the composition of 10% IαA and 90% IαAB is an accurate representation of cellulose Iα crystals, which implies that the A and B hydrogen bond networks coexist throughout the crystal rather than existing exclusively in distinct domains.
In cellulose Iβ, Nishiyama et al. determined that 70–80% of the layers have hydroxyl groups that are in the A orientation.16 Evaluation of the four cellulose Iβ geometries and PXRD patterns indicates that the IβA and IβAB structures are in agreement with experimentally determined values, with the error of both structures for both methods being below one percent. While the terahertz analysis is more qualitative, it can be seen that the spectrum is best predicted with the IβAB structure.
Because of the four possible structures, the possibilities for the composition are endless. In order to assist with determining the composition, the four simulated PXRD patterns were fit to the experimental pattern such that the root mean squared deviation (RMSD) between the experimental and weighted averages of simulated patterns was minimized. The significant difference between the predicted reflections in the four proposed Iβ structures made the fitting possible. This was not possible for the Iα structures because of the more similar angles of the reflections, which led to ambiguous results. The PXRD fitting resulted in concentrations of 42% IβA, 58% IβAB, and 0% IβB and IβBA. Because this leads to a composition of 71% A layers, which aligns with the previously reported occupancies,15 it was used to generate the weighted averages of both the terahertz spectra and PXRD patterns.
The averaged terahertz spectrum and PXRD pattern are shown in Fig. 6. Both are in agreement with experimental data. For the vibrational spectra, the weighted average of the simulated modes is in agreement with the experiment. The intensities of some of the modes, particularly those at 2.1, 4.6, and 7.0 THz, do not agree, but this minimal disagreement is not enough to cause major concern. These subtle intensity differences likely originate from minor structural variations or slight residual disorder uniquely affecting cellulose Iβ, but they do not significantly alter the overall conclusions. This indicates that the composition of 42% IβA and 58% IβAB accurately represents cellulose Iβ crystals, which means that the origin sheets take the A layer, while center sheets can take either the A hydrogen-bonded network or the B one. Similar to cellulose Iα, the two hydrogen bond networks coexist throughout Iβ crystals instead of existing solely at distinct locations.
With the combination of experimental and theoretical PXRD patterns and terahertz spectra, it has been shown that in both polymorphs of cellulose I, the A and B networks coexist throughout the crystal. The mismatch between layers results in the activation of additional vibrational modes, which appear in the simulated vibrational spectra. It can be concluded that the asymmetry that arises from nonequivalent hydrogen bonding in adjacent layers gives rise to these additional modes. When adjacent layers have the same hydrogen bonding network, these vibrational modes are symmetrical and therefore there is no change in the dipole moment, making them IR inactive. The asymmetry in the layers translates to asymmetry in the vibrational modes, making them IR active. These modes are present in the experimental spectra, so it can be concluded that in cellulose I crystals, both networks coexist throughout the sample. It is important to note that due to the relatively large particle size and high crystallinity of the cellulose samples studied here, the contributions from surfaces, grain boundaries, or other short-range defects to the observed spectroscopic signals are negligible, ensuring that the spectra predominantly reflect the bulk crystalline properties.
One explanation for the difference in abundance of each hydrogen bond network between polymorphs could stem from the Iα to Iβ phase transition. It has been shown that upon heating, cellulose Iα transitions to cellulose Iβ by translation of the layers along the direction of the chain axis.14,46 During this transition, the center sheets of cellulose Iβ adopt a slightly strained conformation, which allows the layers of the Iβ crystal to pack together more tightly, resulting in an overall stabilization of the crystal.16 This transition is shown in Fig. 7. It is possible that in order to further stabilize the crystal, the origin sheets of cellulose Iβ transition to the A hydrogen-bonded network. Since the center sheets are already more strained, the supramolecular motif of the hydrogen bonding network does not have as much of an effect on the bulk energy of the structure. This proposed conversion mechanism within Iβ from IβBA to IβA is also shown in Fig. 7. Our observations of the coexistence and specific ratios of hydrogen bond networks in cellulose suggest that previously reported energetic barriers for hydrogen bond interconversion might be influenced by local structural disorder. This represents an interesting avenue for future research.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5tc01544h |
This journal is © The Royal Society of Chemistry 2025 |