A crystal structure prediction enigma solved the gallic acid monohydrate system - surprises at 10 K

The seemingly unpredictable structure of gallic acid monohydrate form IV has been investigated using accurate X-ray diﬀraction measurements at temperatures of 10 and 123 K. The measurements demonstrate that the structure is commensurately modulated at 10 K and disordered at higher temperatures. Aided by charge-density modeling and periodic DFT calculations we show that the disorder gives a substantial stabilization of the structure.

Crystal structure prediction of small organic molecular systems has become very successful over the last decade [1][2][3][4] mainly due to advances in the theoretical methods, 2 such as the use of tailormade force-fields and periodic DFT calculations. 5This success has important implications for understanding and controlling the organic solid state -and thereby the design and production of many materials, such as drugs, foods and explosives.However, one system has been shown to be unpredictable in the crystal structure prediction blind tests, namely the polymorphs of gallic acid monohydrate (GAM, Fig. 1).A polymorph of GAM 6,7 was the target of structure prediction in the fifth blind test 1 (CSP target XXI, CSD reference codes KONTIQ03 7 and KONTIQ05 6 ), yet it was the only system that could not be predicted within the three most likely structures by any of the fourteen participating research groups. 1 Subsequent thorough experimental screening for polymorphs revealed a fifth polymorph and a wealth of solvates, 8 yet it did not resolve the enigma: the system consists of rigid molecules of light elements, and therefore normally falls within the range of systems that should be easy to predict by contemporary methods.
However, this is a two-component system, which is still considered challenging.
Is GAM unpredictable because the crystal energy is not determined with sufficient accuracy, as claimed in a recent work 9 on benzene polymorphs?Are hydrates and other co-crystals too complicated for contemporary crystal structure prediction methods? 10 Are the structures not the thermodynamically most stable forms, but simply the kinetically most favored? 11To answer such questions we have carefully considered the experimental evidence for the reported polymorphic structures.
One major problem in crystal structure prediction in generaland for GAM in particular -is that the results are compared with room temperature single crystal X-ray diffraction data. 1 § Certainly, the ability to predict crystal forms under ambient conditions is important, yet contemporary methods for crystal structure prediction rely on the assessment of the most thermodynamically stable structure in terms of the structure with the lowest potential energy (a few exceptions exist, see e.g. the fourth 3 and sixth 12

blind tests).
This means that little, if any, concern is taken to include thermal effects, such as vibrational enthalpy and entropy, which is necessary to obtain the true thermodynamic stability, given by the Gibbs free energy, G = H-TS.
Moreover, as we demonstrate in the present work, room temperature structures may mask important information regarding alternative molecular conformations in the crystal.
Therefore, because the results of CSP are neglecting thermal motion and disorder, they should be compared with accurate structures from low-temperature measurements. 13Subsequently, high temperature structures should be used in comparison with CSP methods that take entropy into account.
To shed light on the enigmatic GAM system, we have investigated the CSP blind-test target GAM XXI using accurate single-crystal X-ray diffraction measurements at temperatures of 10 and 123 K.
Surprisingly, and in contrast to our measurements at room temperature and 123 K, which confirmed the unit cell dimensions found in previous works, [6][7][8] a supercell with three formula units in the asymmetric unit (Z 0 = 3) was observed at 10 K. Crystallographic and experimental details are presented in Table S1 in the ESI.† To understand how this structure is related to the hightemperature phase, layers of the reciprocal lattice were reconstructed based on the full data collections.The h1l planes of the 10 K and 123 K data are compared in Fig. 2a and b.Furthermore, additional peaks larger than two standard uncertainties were harvested in the 10 K diffraction images and visualized in a coordinate system corresponding to the Z 0 = 1 reciprocal cell.A plot showing these harvested spots is given in the ESI.† Both of plots clearly indicate that a larger unit cell is present; however, the additional Bragg peaks are weak, indicating that the 10 K structure is a modulation of the high-temperature structure.
It is possible to refine a Z 0 = 1 basic structure against the 10 K data, at the expense of omitting the satellite reflections corresponding to the larger cell.The final agreement with the experimental data is quite similar for the Z 0 = 3 and Z 0 = 1 models; however, there are larger peaks in the residual density of the Z 0 = 1 structure indicating a substantial amount of disorder (see Fig. 3).
The changes in the diffraction signal as a function of temperature indicate a phase transformation.Should the structures observed at 123 K and 10 K be considered different phases?It is evident that the diffraction pattern observed at 10 K indicates a structural correlation -a modulation -that is not present at 123 K.The peak profile of the satellite reflections is similar to the profile of the main reflections.This indicates that the peaks correspond to long-range ordering, and that the structure is a new phase, although closely related to the high-temperature phase.
However, upon close inspection of the residual density after an aspherical atom model refinement against the high-resolution 123 K data, the residual density shows signs of disorder.We tested whether a model including anharmonic motion could describe these features, but this was not possible.Rather, these features correspond to the observed differences in positions and torsional angles between the three symmetry-independent molecules (Fig. 3b, 5 and 6) observed at 10 K.This suggests that the molecules at 123 K are distributed stochastically between the three positions observed at 10 K.As the temperature is reduced, the differences in intermolecular interactions among the molecules become significant compared to the thermal energy, and we observe a packing correlation of molecules in the different positions and conformations.
Apart from this displacive modulation, the low-temperature experiments also reveal new information regarding the positions of hydrogen atoms in the structure.
The correct positions of the hydrogen atoms in GAM were difficult to predict in the CSP blind test, because several Fig. 3 The residual density of the Z 0 = 1 (a) and Z 0 = 3 (b) models for the alternative hydrogen bond patterns could be proposed for the same overall structure: a ring of hydrogen bonds between two gallic acid molecules and two water molecules can be reversed (see Fig. 4).Our single crystal X-ray diffraction measurements at 10 K and 123 K exhibit maxima of residual density between the oxygen atoms from the p-hydroxyl group and the hydrogen atoms from a distance different from what can be expected for lone-pair densities of the oxygen atoms.This indicates the presence of static disorder corresponding to two different orientations of the hydrogen atoms.This alternative orientation appears to be present in roughly 20 percent of the crystal, as indicated by refinement of the occupation of the alternative H positions (see Fig. 4).
What are the implications of these structural features for the prediction of the GAM system?
To investigate whether the Z 0 = 3 structure would change the ranking of structures, we computed the energy difference between the structures with Z 0 = 1 and Z 0 = 3. Periodic dispersion corrected DFT calculations of the computed total electronic ab initio energy and cohesive energy 15 (electronic energy difference between the crystal and the isolated molecule in the crystal conformation) reveal that the Z 0 = 3 structure is more stable in terms of both total electronic and cohesive energies.As expected the differences are very small: the total energy difference is 0.6 kJ mol À1 per molecule, whereas for the cohesive energy it is 0.3 kJ mol À1 per molecule.The stabilization is thus both a consequence of packing and conformational relaxation.Such energy differences can change the ranking of structures after CSP.7][18][19][20] To shed further light on the stability of the Z 0 = 3 structure, we performed a full geometry-optimization of coordinates and cell parameters, starting from the Z 0 = 3 structure as well as from the Z 0 = 1 structure in the supercell corresponding to Z 0 = 3.In both cases we obtained a structure closely resembling the Z 0 = 3 structure.This result confirms the Z 0 = 3 structure, and indicates that this structure can be predicted, if present CSP methods were adapted to look for all local energy minima.
However, at high temperatures the structure is not modulated, and it is therefore of relevance in the present context to consider the entropy due to the disorder of the hydrogen bond pattern and to the modulation observed at 10 K, which appears as disorder at 123 K.An estimate of these contributions can be obtained by calculating the entropy of mixing.
Considering the disorder related to the modulation, the 10 K structure suggests three local minima on the potential energy surface, which at higher temperatures can be populated by the gallic acid molecules.We make the assumption of three minima based on the fact that we have three independent molecules in the 10 K structure, and that at higher temperatures, disorder is present at the corresponding positions in the Z 0 = 1 structure.Further computational work to test this hypothesis is needed.
Assuming an equal and random distribution among these sites gives an entropic contribution of 9.1 J mol À1 K À1 -a huge difference in entropy, which corresponds to a stabilization of 2.7 kJ mol À1 at room temperature.This must be considered an upper bound of the entropy of mixing: although the residual density indicates disorder, it is hardly corresponding to an equal distribution among the three sites.To obtain an estimate of the distribution, we performed a constrained refinement of three partly overlapping structures stemming from the Z 0 = 3 geometry.For each structure an overall occupancy was refined, and contributions of 20, 42 and 38 percent, respectively, were observed.Based on these occupancies, the entropy of mixing is Fig. 4 The residual density (10 K data) in the plane of the intermolecular hydrogen bond network involving water.The contour map was drawn using contouring levels of 0.07 e Å À3 .The peaks between hydroxyl oxygen and water have maxima of 0.42 e Å À3 , at a distance of 0.732 Å from oxygen.
Fig. 5 The residual density (isosurface level 0.2 e Å À3 ), showing the differences in electron density based on the observed data and a refined model using aspherical atomic scattering factors. 21,22Residual density corresponding to the alternative molecular conformations is clearly visible near the carboxyl oxygen atom and on the hydroxyl groups at meta positions.8.8 J mol À1 K À1 , or a stabilizing factor of 2.6 kJ mol À1 at room temperature.
Similar considerations can be applied to the entropy of the disorder associated with the hydrogen bond patterns.A random 20 : 80 percent distribution gives an entropic stabilization of 4 J mol À1 K À1 , corresponding to 1.2 kJ mol À1 at room temperature.
In conclusion, by using accurate X-ray diffraction measurements at very low temperatures we have demonstrated that the CSP target structure of GAM is in fact a very complex system.At 10 K, the structure is commensurately modulated with respect to the room temperature one.
Even using this model there is still disorder, which is not described.At higher temperatures, the modulation is no longer present, but the variable molecular positions and molecular geometry are retained to some extent, giving rise to a disorder, as evidenced by the residual density.Of further interest is the clear demonstration that some H atoms are disordered over two sites, giving rise to two different hydrogen bond patterns.
The disorder present at 123 K gives rise to a considerable entropic stabilization of the structure -corresponding to more than 3 kJ mol À1 at room temperature.This large stabilization, which is not considered by current crystal structure prediction methods, is sufficient to alter the energy landscape significantly, and may allude to the difficulties associated with predicting polymorph IV as a stable structure.
We hope that these results will open the eyes of the crystallography community to the -in our opinion -absolute necessity of using very accurate structures, collected at the lowest possible temperature for the understanding of complex phenomena in molecular crystalline solids -such as the development of CSP methods.Such accurate structures provide the best possible platform for the assessment of crystal energies.Only then may the incorporation of thermal effects, such as enthalpies and entropies of vibration -subsequently -provide sufficient information to predict the structures and properties at ambient temperature.It is not worthwhile predicting ghost structures!
We acknowledge the Villum foundation and the Carlsberg foundation for financial support.Calculations were carried out using resources provided by the Wrocław Centre for Networking and Supercomputing (Grant 115).
Notes and references § The structure used for the CSP blind test is reported (ref. 1) to be collected at 100 K.We believe this is a typo, since the related submission in the CSD (ref.6) reports room temperature measurements, and the volume is similar to the other room temperature structure found in the literature (ref.7).

Fig. 2 (
Fig. 2 (a and b) Reciprocal layers h1l of GAM form IV at 10 K and at 123 K.The naming of layers corresponds to the Z 0 = 1 unit cell.The size and direction of the reciprocal unit cell vectors are shown as an inset.

Fig. 6
Fig. 6 (a) Packing of the three molecules in the Z 0 = 3 structure for 10 K data, (colored red, green and blue) compared with the packing in the Z 0 = 1 structure (colored black).(b) Overlay of structures 14 corresponding to the three independent molecules in the Z 0 = 3 model.The packing similarity is further discussed in the ESI.†