Hugh P. G.
Thompson
a and
Graeme M.
Day
*b
aDepartment of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
bSchool of Chemistry, University of Southampton, Highfield, Southampton, SO17 1BJ, UK. E-mail: G.M.Day@soton.ac.uk
First published on 13th May 2014
The ability to anticipate the shape adopted by flexible molecules in the solid state is crucial for engineering and predicting crystal packing and, hence, properties. In this study, the conformations adopted by flexible molecules in their crystal structures are assessed in terms of their relationship to the calculated global conformational landscape. The study quantifies the limits on molecular strain that can be induced by intermolecular interactions in single-component crystal structures of molecules with no intramolecular hydrogen bonding, demonstrating that some molecules are distorted by up to 20 kJ mol−1 by crystal packing forces. Furthermore, we find that crystallisation often selects high energy conformers, but only when the high energy conformer is more extended than the lower energy options, allowing for greater intermolecular stabilisation. Based on these observations, we propose that the crystallisability of conformers is assessed in terms of their energies and surface areas. We formulate this as a parameterised pseudo-energy related to molecular surface area, which leads to a dramatic improvement in our ability to predict the conformations adopted by molecules in their crystal structures.
Here, we address the issue of predicting which conformation will be adopted in a flexible molecule's crystal structure. We approach this problem from the point of view of computationally-guided crystal engineering and the development of methods for crystal structure prediction (CSP). CSP, as applied to organic molecules, has made substantive progress over the past decade.1–3 The most commonly applied CSP method is based upon a global search of the lattice energy surface.4 The main assumption is that the most likely observable crystal structures correspond to the lowest energy minima. This approach has been successfully applied to the prediction of both single-5–8 and multi-component6,7,9 molecular crystal structures, sometimes guiding experimental efforts towards the discovery of new solid forms.10,11
In terms of crystal engineering and crystal structure modelling, a rigid molecule should be defined as a molecule that has a single conformation and where that conformation is unaffected by crystal packing forces. For a rigid molecule, CSP requires an exploration of possible structures as a function of unit cell dimensions, molecular positions and orientations. Once a set of crystal structures is generated, their ranking by relative energies requires the calculation of intermolecular interactions only. Putative crystal structures are often separated by small energy differences, and the goal of CSP has encouraged the development of accurate models for intermolecular interactions.12
The challenges involved with extending the capabilities of CSP methods to dealing with flexible molecules have hindered applications to pharmaceutical molecules, and are apparent in the results of the CSP blind tests,2,3 where success rates for flexible molecules are low. The two principle difficulties associated with molecular flexibility are that:
1. The flexibility of intramolecular degrees of freedom means that the molecular geometry may be distorted under the influence of intermolecular interactions in a crystal structure, to reach the optimum balance between inter- and intramolecular energies. Therefore, the impact of the crystalline environment on the molecular geometry must be modelled.
2. Conformational degrees of freedom increase the dimensionality of the energy landscape that must be explored when generating crystal structures.
The distortion of molecules away from their gas phase structures has recently been discussed in the context of conformational polymorphism.13 Off-the-shelf force field methods for modelling intramolecular interactions do not provide a sufficiently accurate description of the associated energetic cost.14 Therefore, the high demands of CSP for flexible molecules have required either solid-state electronic structure based approaches15,16 or hybrid energy models, where intramolecular degrees of freedom are treated quantum mechanically and atom–atom models are used for intermolecular interactions.17–21
The problem of sampling conformational phase space and predicting the relevant molecular geometries is less well developed. Conformational degrees of freedom can be sampled concurrently with packing parameters, but for highly flexible molecules it can be advantageous to split the high dimensional global optimisation problem into two smaller problems: a search for relevant conformers followed by a crystal packing search for each. Apart from reducing the dimensionality of the search space, this approach leaves room for additional selection criteria when assessing which molecular conformations might lead to the most stable crystal structures.22,23
A purpose of this work is to study the conformational energy landscapes of a large set of flexible, pharmaceutical-like molecules, to investigate the relationship between the ensemble of possible conformers of the isolated molecules and the molecular geometries seen in their observed crystal structures. These comparisons should provide insight into conformational preferences in crystal packing and inform the future development of structure prediction methods for flexible molecules. Therefore, we have carried out a series of calculations to address the following questions:
• Which conformation will a molecule adopt, and where on the energy landscape of the isolated molecule is this conformation found?
• Apart from energy, can other molecular properties be identified that predict which conformation will be observed in crystal structures?
A starting assumption in this work is that the geometry adopted by a molecule in its crystal structure will be close to that of one of the conformers of the isolated molecule. However, in comparing the energy landscape of the isolated molecule to its crystalline geometry, we also investigate to what extent molecular geometries are distorted away from the geometry of the isolated conformer by intermolecular forces. The results will help quantify the importance of considering distortions away from idealised molecular geometries when attempting to predict the crystalline arrangement of organic molecules, in a general crystal engineering context as well as for the specific goal of ab initio CSP.
For consistency, we apply the same level of theory in an atomic basis set method for crystal and molecular energy calculations: we use dispersion-corrected density functional theory (DFT-D), which has been shown to provide an accurate description of the structures and energies of molecular organic crystals.15,24,25 All DFT-D calculations were performed using the CRYSTAL09 (ref. 26 and 27) software, with the B3LYP hybrid functional,28,29 6-31G* basis set, and the rescaled empirical dispersion correction suggested by Civalleri et al.30 All atomic positions were relaxed during lattice energy minimisation, with unit cell dimensions constrained at experimentally determined values. All single molecule calculations were also performed in CRYSTAL09, using the same functional, basis set, DFT integration grid and optimisation thresholds as the periodic lattice energy minimisations. Full details of DFT-D calculations are provided in the ESI.†
All symmetry-independent molecules were extracted from the optimised crystal structures and single-point molecular energy calculations were performed to give the energies of the molecules in their crystalline geometry (see point A in Fig. 1). These molecular geometries were also taken as starting points for unconstrained molecular geometry optimisations, yielding the local energy minimum on the conformational energy surface associated with the crystalline molecular geometry (point B, Fig. 1). We refer to the energy difference between the single-point and optimised molecular geometries as the intramolecular strain energy, ΔEstrain.
All unique conformers resulting from an initial search using the OPLS2005 (ref. 34) force field were re-optimised using DFT-D, applying the same computational parameters described above for the crystalline molecular geometries. The result should be a complete set of low energy conformers for each molecule (points 1–4 in Fig. 1), assuming that the conformer search was complete and that all local minima on the conformational energy surface described by DFT-D have a corresponding local minimum on the force field described energy surface. The predicted conformations were re-clustered to remove duplicates after DFT-D re-optimisation and, in most cases, additional structures were removed at this stage, suggesting that the DFT-D conformational energy surface is smoother than the OPLS surface.
Geometric comparisons were performed to locate the optimised conformations taken from the observed crystal structures in the sets of predicted conformers (e.g. point B = point 2 in Fig. 1). We refer to the energy difference between the conformer corresponding to the crystalline geometry and the lowest energy computer-generated conformer as the relative conformational energy, ΔEconf (Fig. 1).
As further characterisation of the conformational energy landscape, the Connolly surface area of each DFT-D-optimised conformer was calculated using Material Studio35 with a probe radius of 1.8 Å.
Molecular flexibility was measured by the number of rotatable (exocyclic) single bonds in the molecule, excluding terminal methyl groups. Three molecules were chosen with each of 4, 5, 6, 7 and 8 such rotatable bonds, yielding a test set of 15 molecules. While their presence was not used as a criterion for the selection of molecules, the set also includes molecules with flexible ring systems, whose flexibility is sampled during the low-mode conformational search.
All chosen molecules have experimentally determined crystal structures and the molecules were selected to include diversity in known polymorphic behaviour. For each level of flexibility, one molecule was chosen with known conformational polymorphism, one with known packing polymorphism (alternative crystal packings of the same molecular conformation) and one molecule with no reported polymorphism. To maximise the data obtained from each molecule, molecules with the most reported polymorphs at each level of flexibility were selected. For molecules with the same number of known polymorphs, and for the non-polymorphic molecules, the molecule with the highest quality reported crystal structure (as measured by the R factor) was chosen. To our knowledge, no solvate structures have been reported for the molecules in our set, so the study includes only single-component crystal structures. We refer to molecules by their 6-letter Cambridge Structural Database (CSD)37 REFCODE family names which are provided, along with chemical diagrams, in Table 1.
The distribution of ΔEstrain is summarised in Fig. 2 for the 36 distinct molecular geometries in the 29 lattice energy minimised crystal structures of the 15 molecules (see Table S1 in ESI†), including molecular geometries from different polymorphs and independent molecules within the same crystal structure (for structures with multiple molecules in the asymmetric unit).
There is a correlation, albeit weak, between ΔEstrain and the geometrical distortion of the crystalline molecular geometry away from the geometry of the isolated molecule (see Fig. S3 and S4 in ESI†). However, there are cases where a large molecular strain energy results from very small changes to stiff intramolecular degrees of freedom or where large geometry changes come at a relatively low energetic cost. The latter typically correspond to rotation about soft dihedrals (for example, see Fig. 3a).
The majority (75%) of molecules are strained by less than 10 kJ mol−1 and it is unsurprising in these cases that the increase in intramolecular energy can be compensated by improved intermolecular interactions in the crystal. However, we find several cases of surprisingly high strain energies. The highest values calculated for the molecules studied here are ΔEstrain = 18.3 and 21.6 kJ mol−1 in the two polymorphs FIBKUW01 and FIBKUW02 and ΔEstrain = 19.0 mol−1 in the monoclinic polymorph of N,N′-di(phenylethyl)terephthalamide (CELHIL01, see Fig. 3b). One important observation is that high intramolecular strain energies are not restricted to large molecules: two of the smallest molecules in our set show large strain in their crystal structures: SIKRIN (ΔEstrain = 14.6 kJ mol−1) and HIBGUV (ΔEstrain = 14.4 kJ mol−1, in its monoclinic beta polymorph). The results that we find here are in broad agreement with those recently reported for a large set of conformational polymorphs.13
At the low energy end of the ΔEstrain distribution, we note that none of the molecules are completely unaffected by crystal packing. The smallest calculated value is ΔEstrain = 2.4 kJ mol−1 for the triclinic polymorph of 1,4-dibenzoylbutane (CSD REFCODE GALCAX01), corresponding to an all-atom RMS difference in atomic positions of 0.03 Å between gas and solid phase molecular geometries (Fig. 3c). From a structure prediction perspective, these results demonstrate the limitations of any rigid-molecule treatment of even moderately flexible molecules, where the molecule in the crystal structure is assumed to be undistorted from its gas phase geometry. Energy contributions of 2–3 kJ mol−1 to the relative stability of crystal structures can be crucial in the context of CSP, where energy separations between computer-generated crystal structures are often 1 kJ mol−1 or less.4
We define the conformer associated with the crystalline molecular geometry as the structure reached after DFT-D optimisation of the molecule extracted from the crystal structure. From our initial set of 36 independent molecular geometries, we find only 21 distinct conformers because some molecular geometries (e.g. of the packing polymorphs) converge to the same conformer on the gas phase molecular energy surface.
For all molecules apart from one, we find a corresponding conformer in the sets generated by the conformation search. For the remaining molecule, SEVJAF, the original LMCS search failed to return the conformation obtained from the known crystal structure. However, the observed conformer was obtained after initiating a second, longer, search from the geometrically closest conformer in the initial set. For the purposes of this study, we assume that the initial set of SEVJAF conformers is representative of the conformational surface, despite being incompletely explored, and we proceed with the original set of SEVJAF conformers supplemented by the conformer corresponding to the observed crystalline molecular geometry.
The results are summarised in Table 2 using the quantity ΔEconf (Fig. 1), the energy difference between the conformer corresponding to the crystalline molecular geometry and the global minimum conformer found by the conformational search. Surprisingly, only 6 of the 15 molecules adopt the global minimum conformer in one of their known crystal structures. Two of these 6 molecules display conformational polymorphism, adopting a non-global minimum conformer in one polymorph. Therefore, a non-global energy minimum conformer is observed in the crystal structures of 11 of the 15 molecules.
Molecule | N26 | ΔEconf | Rank | ΔEconf,biased | Rank |
---|---|---|---|---|---|
a Rankings and ΔE may be larger than reported for SEVJAF, due to an incomplete sampling of conformations (see text). | |||||
HIBGUV1 | 10 | 4.83 | 7 | 1.55 | 7 |
HIBGUV2 | 10 | 4.17 | 5 | 0.74 | 4 |
MABZNA | 2 | 0.00 | 1 | 0.00 | 1 |
SIKRIN | 1 | 0.00 | 1 | 0.00 | 1 |
FAHNOR1 | 13 | 4.40 | 5 | 0.00 | 1 |
FAHNOR2 | 13 | 0.00 | 1 | 0.99 | 2 |
ODNPDS | 4 | 0.00 | 1 | 0.00 | 1 |
COCAIN | 7 | 1.23 | 2 | 0.43 | 2 |
VEMTOW1 | 12 | 0.35 | 3 | 0.55 | 4 |
VEMTOW2 | 12 | 0.00 | 1 | 0.00 | 1 |
FIBKUW | 35 | 5.88 | 10 | 0.00 | 1 |
NEQNIG | 105 | 0.00 | 1 | 0.00 | 1 |
HAJYUN1 | 124 | 15.01 | 67 | 4.29 | 29 |
HAJYUN2 | 124 | 13.85 | 52 | 2.15 | 9 |
GALCAX | 73 | 18.55 | 27 | 0.00 | 1 |
SEVJAF | 623 | 21.25 | ≥283a | 3.70 | ≥49a |
DANQEP1 | 115 | 3.54 | 34 | 2.12 | 12 |
DANQEP2 | 115 | 2.18 | 18 | 1.82 | 11 |
DANQEP3 | 115 | 1.46 | 12 | 0.58 | 7 |
CELHIL | 142 | 22.11 | 108 | 4.81 | 37 |
DADNUR | 30 | 25.54 | 28 | 4.87 | 5 |
It is clear from these results that a range of molecular conformers must be considered when assessing which molecular conformation will be adopted in the solid state. In the majority of cases, the crystalline geometry is energetically close to the most stable conformer of the isolated molecule: of the 21 conformers in this study, the crystalline conformation is within 5.9 kJ mol−1 of the global minimum in 15 cases. The remaining 6 are more interesting, ranging from ΔEconf = 13.9 to 25.6 kJ mol−1; the highest energy observed conformer in this set of molecules is ΔEconf = 25.6 kJ mol−1 in crystal structure DADNUR. These values suggest that, for a flexible organic molecule, conformations up to about ΔEconf = 26 kJ mol−1 must be considered as possibilities for what will be observed in the solid state.
These results are discouraging for structure prediction and crystal engineering: such an energy window can encompass large numbers of conformers (Table 2), whose shapes and arrangement of functional groups can vary greatly. Using the molecular energy as a sole selection criterion, any conformer within this range should be considered a possibility for what will be observed in a molecule's crystal structure. This raises the question of whether non-energetic descriptors of molecular conformers can be found that highlight those conformers that are most likely to form low energy crystal structures, or filter out those that will not lead to stable crystal structures.
Molecular symmetry may be beneficial for close packing and it has been shown that molecules that adopt a centrosymmetric geometry in their crystal structure almost exclusively crystallise with molecules on crystallographic special positions.38,39 However, it is not clear what role symmetry plays in determining the conformer that a molecule adopts. Here, we observe that, of the six molecules with potential centrosymmetry (FAHNOR, ODNPDS, FIBKUW, GALCAX, CELHIL SIKRIN), five adopt centrosymmetric or nearly centrosymmetric conformations in their crystal structures. For SIKRIN and one polymorph of FAHNOR, the observed centrosymmetric conformer is the global minimum conformer. In all other cases, the global minimum energy conformer lacks centrosymmetry. FIBKUW adopts its lowest energy centrosymmetric conformer, while CELHIL and GALCAX adopt centrosymmetric conformers that are high in energy, in favour of alternative, lower energy centrosymmetric and non-symmetric conformers. The sample is too small to judge the overall importance of centrosymmetry in conformer selection, and further studies focussed on symmetry could be interesting.
Indeed, in analysing the sets of observed and computer-generated conformers, we find that the conformer that is found in a molecule's crystal structure tends to have a relatively extended geometry (e.g.Fig. 4b). Similar observations have been made regarding a preference for extended conformations of ligands bound to proteins.40 The potential for a molecule to form close intermolecular interactions is limited, at a basic level, by its accessible surface area – the surface area that can be brought into contact with an adjacent molecule. Therefore, to quantitatively compare conformers, we calculated the Connolly surface area41 of all predicted conformers of each molecule, providing the contact surface mapped out by a spherical probe rolling across the van der Waals surface of the molecule. We used a probe radius of 1.8 Å, the approximate radius of a methyl group, to provide a measure of the molecular surface area that is accessible for close intermolecular contact with other copies of itself in a crystal structure.
The results for three of the most flexible molecules are summarised in Fig. 5, as plots of relative conformational energy against Connolly surface area. It is clear from these results that conformers with large surface areas are favoured in the crystal structures of flexible molecules. The molecules summarised in Fig. 5 adopt conformers with low (DANQEP, Fig. 5a), middling (HAJYUN, Fig. 5b) and large (DADNUR, Fig. 5c) ΔEconf, while in each case the crystalline conformers are amongst those with the largest possible accessible surface areas.
This observation is not limited to these three molecules; for all 15 molecules, a high energy conformer is adopted in the crystal structure only if it has a larger surface area than that of the global minimum energy conformer (surface area vs. energy plots for all remaining molecules are available in the ESI, Fig. S5–S9†). On the basis of these observations, it appears that compact high-energy conformations are unlikely to occur in crystal structures.
The relationship between sublimation enthalpy and molecular surface area is remarkably linear (Fig. 6) with very little scatter about a linear regression with a gradient of mEvsSA = 0.75 kJ mol−1 Å−2. This suggests that the increase in intermolecular contributions to lattice energy with increasing molecular surface area should be predictable, at least in the absence of strong, specific intermolecular interactions.
Fig. 6 Variation in measured ΔHsublimation with molecular AConnolly for a set of small rigid hydrocarbon crystal structures. |
Fig. 7 summarises the energies and surface areas for all conformers of the 15 molecules in our test set; each conformer is represented by its relative energy and relative Connolly surface area, calculated with respect to the energy and surface area of the lowest energy conformer of the same molecule. The gradient, mEvsSA derived from the sublimation enthalpies of rigid hydrocarbons is superimposed on the energy/surface area distribution. We note that all conformers found in the observed crystal structures (highlighted in red, Fig. 7) fall below and to the right of the mEvsSA line, where the expected intermolecular stabilisation due to greater surface area outweighs the higher intramolecular energy than the global minimum conformer.
Fig. 7 Plot of ΔEconf against relative Connolly surface area, ΔAConnolly for all predicted conformers of the 15 molecules. Conformers that are observed in the known crystal structures are highlighted in red. The diagonal line represents the gradient of energy vs. surface area derived from Fig. 6. |
The calculated molecular surface area seems to be predictive of what conformers will be found in crystal structures and we propose that conformers falling in the shaded area (Fig. 7) – compact, high energy conformers whose higher intramolecular energy is not compensated by their expected intermolecular stabilisation – are unlikely to be observed.
We propose that this observation can be formulated as a biasing term added to the calculated conformational energy:
ΔEconf,biased = ΔEconf + ΔEpseudo,inter | (1) |
ΔEconf,biased = ΔEDFT-D + mEvsSAΔAConnolly | (2) |
Most importantly, the location within the overall ensemble of the conformers that are adopted in the crystal structures (red bars in Fig. 8) is changed dramatically. The application of the Epseudo,inter correction leads to a significant enrichment of the observed conformers in the low energy region of the distribution. This is evident from a comparison of the cumulative distribution functions in Fig. 8: all crystalline conformers are found within the first 7.1% of the overall distribution of Econf,biased, while they range over a much greater proportion (39.5%) of the overall distribution of pure Econf.
The improvement is also reflected in the ranking of the observed conformers for each individual molecule (Table 2). The ranking of all of the high energy conformers is dramatically improved after applying the pseudo-intermolecular energy correction. Furthermore, the energy difference, ΔEconf,biased from the global minimum is decreased in almost every case.
Practically, this demonstrates that the combined consideration of calculated intramolecular energy and molecular surface area can narrow the ensemble of conformers that must be considered as likely candidates for what will be found in crystal structures. Specifically, the application of the surface area pseudo-energy correction that we propose here means that the conformers that are seen in observed crystal structures are found much nearer the top of ranked lists of possible conformers than when ranking is based on the calculated energy of the isolated molecule.
In terms of applications to CSP, this extends the size of molecule that can be practically studied, by focussing computing effort on the smaller set of conformers that are able to achieve the best balance of inter- and intramolecular energies in the solid state. In the more general context of crystal engineering, the combined criteria based on calculated energies and molecular surface area can help anticipate the molecular shape and spatial arrangement of functional groups on a molecule, which determines the mutual arrangement of molecules in the solid state, and therefore many materials properties of interest.
We propose that the relative conformational stabilisation due to intermolecular interactions in a crystal can be estimated from an empirical linear relationship with accessible molecular surface area. This relationship has been parameterised from the variation of measured sublimation enthalpies with molecular surface area for a series of rigid hydrocarbons. By adding this pseudo-energy to the calculated intramolecular energies of conformers, we achieve a dramatic enrichment in the ranking of observed conformers in the low energy range of the global conformational landscape.
A further observation is that, for any molecule with moderate flexibility, there is some molecular distortion away from the gas phase geometry due to intermolecular interactions in a crystal structure: ΔEstrain is never zero and can be as large as about 20 kJ mol−1 for some molecules. This is important for anticipating overall molecular shape when considering how a molecule might pack in its crystal structures. Molecular geometries can be significantly distorted by the close-packed environment in a crystal structure.
The findings are clearly relevant to CSP, where conformational diversity can quickly become the bottleneck in fully exploring structural space for large, flexible molecules. Any narrowing down of the conformational space that must be considered improves the efficiency of global structure searching, reduces the resources that are required to perform predictions for a given molecule and effectively increases the range of these methods. The observations made here are also relevant in more traditional crystal engineering approaches, where the geometrical attributes of a molecule, particularly the arrangement and relative orientation of functional groups, must be anticipated in order to use their interactions to direct the arrangement of molecules in the solid state.
Footnote |
† Electronic supplementary information (ESI) available: Details of crystal and molecular structure optimisations, testing of DFT settings, full details of conformational searches, calculated strain energies for all molecules, correlation between strain energy and molecular geometry changes, energy vs. surface area plots for all molecules and sublimation enthalpies of rigid hydrocarbons. See DOI: 10.1039/c4sc01132e |
This journal is © The Royal Society of Chemistry 2014 |