Crystal structure prediction of fl exible pharmaceutical-like molecules : density functional tight-binding as an intermediate optimisation method and for free energy estimation †

Successful methodologies for theoretical crystal structure prediction (CSP) on flexible pharmaceutical-like organic molecules explore the lattice energy surface to find a set of plausible crystal structures. The initial search stages of CSP studies use relatively simple lattice energy approximations as hundreds of thousands of minima have to be considered. These generated crystal structures often have poor molecular geometries, as well as inaccurate lattice energy rankings, and performing reasonably accurate but computationally affordable optimisations of the crystal structures generated in a search would be highly desirable. Here, we seek to explore whether semi-empirical quantummechanical methods can perform this task. We employed the dispersion-corrected tight-binding Hamiltonian (DFTB3-D3) to relax all the interand intra-molecular degrees of freedom of several thousands of generated crystal structures of five pharmaceutical-like molecules, saving a large amount of computational effort compared to earlier studies. The computational cost scales better with molecular size and flexibility than other CSP methods, suggesting that it could be extended to even larger and more flexible molecules. On average, this optimisation improved the average reproduction of the eight experimental crystal structures (RMSD15) and experimental conformers (RMSD1) by 4% and 23%, respectively. The intermolecular interactions were then further optimised using distributed multipoles, derived from the molecular wavefunctions, to accurately describe the electrostatic components of the intermolecular energies. In all cases, the experimental crystal structures are close to the top of the


Introduction
In computational materials science, approaches that are based on the fundamental laws of quantum mechanics (QM) are now integral to almost any materials design initiative in academia or industry. [1][2][3] The prevalence of polymorphism in organic molecules 4 and its importance in determining the physical properties of organic solids 1,5-8 has led to an interest in using simulation methods to predict the range of possible crystal structures and their properties, possibly to design molecular materials with desired characteristics, like high porosity. 9 CSP studies aim to predict all the possible putative polymorphs (PPMs) of a molecule, starting from the chemical diagram only. 10 This is particularly important for pharmaceutical development when performed as a complement to solid form screening, which attempts to establish the range of solid forms that could either be developed into the pharmaceutical product or must not appear during manufacturing and storage. 8,11 Several successful CSP studies on large but relatively rigid systems, such as organic porous cages, 12,13 have been reported. For inorganic solids, ab initio random structure searches have been used successfully. [14][15][16] Pharmaceuticals are more challenging since they tend to have a considerable conformational exibility, being designed for their biological activity, and so most molecules in drug development can crystallise in a variety of conformations. 8 For pharmaceuticals with few intramolecular degrees of freedom, several successful CSP studies have been conducted. [17][18][19][20][21][22][23][24] However, the CSP search space grows exponentially with conformational exibility and hence more efficient methods are needed for application to modern pharmaceuticals. The challenge of dealing with molecular exibility is compounded by small changes in some bond or torsion angles having a signicantly greater effect on the overall molecule shape in large pharmaceuticals than in smaller molecules. 25 Hence, the number of conformational degrees of freedom that may change from isolated molecule (ab initio) values because of the crystal packing forces can be larger than expected 26 from combining the exibility of fragment model molecules. Whilst optimising all atomic positions simultaneously with the crystal lattice positions avoids the problem of selecting the crystal packing-dependent conformational degrees of freedom, the need to balance the inter-and intramolecular forces sufficiently accurately makes this exceptionally demanding. This has been demonstrated by the challenges posed by the largest molecules in the recent Blind Tests of CSP. 1,27,28 These molecules, though large enough to generate interest in CSP methods from industry, are still small and of limited exibility compared with pharmaceuticals in development. Hence, if CSP is to full its promise for pharmaceutical development, innovations that allow CSP studies on larger molecules within a restricted time frame 11 are needed.
All CSP methods begin with a search step in which the lattice energy (E latt ) surface is explored to locate all the possible local minima, each corresponding to a separate possible crystal structure. 1,29 Since the surface that must be explored is extremely wide, especially for exible molecules, a large number of candidate crystal structures are produced, at least of the order of hundreds of thousands. 22,29 Hence, cheaper but less accurate energy models must be utilised in the search. They can take the form of tailor-made force elds, 30 exp-6 atomistic intermolecular potentials combined with quantum-mechanical calculations of the conformational energy, 31,32 or transferrable empirical force elds. 33 The lattice energy evaluations used in the crystal structure generation ("search") stage are rarely accurate enough to give a reliable energy ranking, since the lattice energy differences between observed polymorphs are usually less than 5 kJ mol À1 . Hence, thousands of generated crystal structures must be optimised and re-ranked to obtain a meaningful list of PPMs. Since this nal step requires the use of accurate and expensive models, such as periodic dispersion-corrected density functional theory (DFT-D) 34-40 on all crystal structures (the J crys method), or the calculation of a high-quality wave-function for each molecular conformer (the J mol method), 8,41,42 the computational cost can easily become unfeasible. A possible solution is the use of an intermediate optimisation method, which can bridge the gap between the simple models used to generate the crystal structures and the more accurate ones used to generate an accurate nal energy ranking. For exible molecules, several intermediate methods are used in different CSP workows, such as optimising some torsion angles with a transferable force-eld to improve the molecular conformation geometries, 43,44 single-point energy calculations with a more accurate description of the intra-and intermolecular interactions 18,20 or partial lattice energy optimisations. 1,22 Recently, semi-empirical QM methods have been applied to large chemical and biochemical systems; 45,46 these methods do not have the transferability problems of standard force elds but are signicantly less demanding than other QM methods. 45,47,48 In this work, the interest is focused on dispersion-corrected density functional tight binding methods (DFTB3-D3), which are up to three orders of magnitude cheaper than DFT-D, and have been shown to provide reasonably accurate lattice energies for a benchmark of small molecular crystals. 49 We present the rst use of semi-empirical QM methods within a full CSP work-ow. We aim to verify whether DFTB3-D3 can be used as an intermediate reranking step in CSP. There are at least four ways in which it could provide useful insights: (1) Producing more accurate molecular and crystalline geometries and packings than the original crystal structures generated in the search, which could be used as starting points for better lattice energy evaluations.
(2) Obtaining a more accurate energy ranking than the one obtained aer the crystal structure search, in order to limit the number of structures that need to be taken to the nal optimisation stage.
(3) Producing fewer minima that are artefacts of the constraints imposed by the space group and conformational variables, by optimising all crystal structures with P1 symmetry.
(4) Computing the phonon modes of plausible crystal structures to estimate thermodynamic free energies in the harmonic approximation.
We test the suitability of DFTB3-D3 for these purposes on structures generated in a set of searches 25 performed with the J mol method. E latt , the energy needed to completely separate all the molecules in the static crystal structure to isolated molecules in their most stable conformation, 10 can be described as: DE intra is the energy penalty for modifying the conformation of the molecule from its isolated gas-phase minimum, calculated with a good quality molecular electronic structure method, and U inter is the intermolecular energy, which sums all the interactions between the molecules in the crystal. 10 The search method variant used in this work treated the most exible torsion angles as search variables, and the other conformational degrees of freedom correspond to the isolated molecule ab initio values, an approach that is comparable to using multiple rigid conformations. 43 Five exible molecules for which full CSP studies had already been performed ( Fig. 1) are used to test the application of DFTB3-D3 methods: molecule XX was the rst "pharmaceutical-like" molecule used in a Blind Test, 50 XXIII and XXVI are from the sixth blind test, 1 GSK269984B was used in an early pharmaceutical study, 23 while the CSP study of mebendazole was performed for an on-going collaboration. 51 These CSP methods were successful in predicting all the observed polymorphs with one molecule in the asymmetric unit cell (i.e. Z 0 ¼ 1), and so provide a stringent test for the suitability of an intermediate DFTB3-D3 step to extend CSP methods to even larger, more exible molecules. Fig. 1 Chemical diagrams of the six molecules used in this work. The torsion angles in red were treated as explicitly flexible in the crystal structure searches, those in black were constrained to a set of values defined by the CSD Conformer Generator, and those in green were constrained to a set of values after an ab initio gas-phase scan.

Crystal structure generation
The method for generating the crystal structures used statistical information retrieved from the Cambridge Structural Database (CSD) to guide the determination and the analysis of the conformational search space. 25 For each molecule in Fig. 1, the CSD distributions of each rotatable bond were retrieved from the CSD knowledge base conformational libraries; 52 the effect of each of these rotamers on the overall shape of each molecule was subsequently determined via ultra-fast shape recognition. 53 This information was used to divide the torsion angles into two groups: those that were treated as explicitly exible in the searches, and those that were constrained to a set of values, determined by the CSD Conformer Generator. 54 Molecular electronic structure calculations at the PBE0 6-31G(d,p) level of theory using Gaussian09 55 were used to limit the set of potential conformations to be considered and to calculate grids of DE intra values for the explicitly exible torsion angles. These calculations were also used to determine the xed atomic charges used by CrystalPredictor 1.8 32 to estimate U inter in combination with the empirically tted FIT repulsion-dispersion intermolecular potential. 56 The number of crystal structures generated varied from $350 000 for the C tautomer of mebendazole to $2 200 000 for molecule XXVI. The outputs from this search method and from the original CSP studies are compared in a previous publication. 25 Thus the molecular geometries prior to renement were mostly determined by PBE0 6-31G(d,p) isolated molecule optimisations, with the torsion angles shown in red in Fig. 1 at values specically produced in the search.

Density functional tight-binding method
As an established semi-empirical method, we used a third order density functional tight-binding Hamiltonian with self-consistent charge redistribution as implemented in db+. [57][58][59] We used the 3OB Slater-Koster les, 60,61 and an additional damping of all hydrogen-containing pair-potentials, and we corrected for missing London dispersion interactions with the atom-pairwise D3 scheme in the rational damping variant. 62,63 The Brillouin zone was then sampled with a Gcentred mesh with a density of 0.025 A À1 . The original DFTB3-D3 parametrisation focused on energetic properties only. 49 However, it has been shown that the geometries of large molecular complexes and molecular crystals are not reproduced sufficiently accurately. 64,65 A re-parametrisation of the DFTB3-MBD model has been shown to improve the geometrical properties substantially, 66 so we followed a similar strategy and optimised the D3 damping parameter to minimise the errors in centre-of-mass distances on a set of small molecular dimers (S66). 67,68 This results in a DFTB3-D3 parameter set (S 6 ¼ 1.0, S 8 ¼ 0.0, a 1 ¼ 0.841, a 2 ¼ 3.834 Bohr) with 8 kJ mol À1 mean absolute error on the S66 binding energies and 1.3% error in the centre-of-mass distances, compared to 3.8 kJ mol À1 and 3.2%, respectively with the original parametrisation. This re-parametrisation also improves the mass density errors of a set of small molecular crystals (X23) 69,70 from 11.6% to 4.2%. Additional atom pair-wise corrections to DFTB3-D3, such as geometrical hydrogen-bonding corrections 71,72 and halogen-bonding corrections, 73 were not utilised.
Structure relaxations have been conducted with a quasi-Newton optimisation in internal redundant coordinates as implemented in the CRYSTAL14 programme. 74 As DFTB3-D3 is used as an intermediate optimisation method, it is important to test the relaxation thresholds. This was tested on the POLY59 benchmark set 75 and an optimum trade-off between the accuracy in relative energy ranking and computational cost was obtained with thresholds of 0.12 a.u. and 0.003 a.u. for the root-mean-square (RMS) displacement and RMS gradient.
Phonon calculations were used as a nal step to estimate the free energies for the experimental crystal structures and a few diverse low energy structures (given in ESI Table 10 †). Phonon calculations using symmetric nite displacements required geometry optimisations with one order of magnitude tighter thresholds, corresponding to the default settings in CRYSTAL14. The Brillouin zone sampling for the phonon calculations was initially formed by the common practice of constructing supercells with minimum cell lengths of 10 A. 70 As we are interested in relative polymorph energies, we additionally ensured that a similar number of atoms are covered in each supercell to enhance possible error compensations. The Helmholtz free energy, A, can be dened as: where F vib is the vibrational contribution to the free energy, which includes the thermal contribution to the free energy and the zero point energies calculated by summation. 41,65

DFTB3-D3 intermediate optimisation of all the CrystalPredictor generated structures
All the unique generated crystal structures within 40 kJ mol À1 of the global minimum in CrystalPredictor energy were optimised with DFTB3-D3. In order to perform the DFTB3-D3 optimisations, the symmetry of each crystal structure had to be reduced to the P1 space group. 9215 crystal structures were optimised for molecule XXVI, 16 744 for GSK269984B, 28 249 for molecule XXIII, 26 650 for molecule XX, 4165 for the A-tautomer of mebendazole and 4284 for its Ctautomer. The symmetries of all the crystal structures that were successfully optimised were reintroduced with Platon, 76 and then they were clustered to remove duplicates using the Crystal Packing Similarity tool within the CSD Python API; 77 details on the clustering procedure can be found in ESI Section 1.1. † The energy ranking calculated via DFTB3-D3 proved to be very different, as shown in Fig. 2 for XXVI, with the calculated E latt values having a larger energy spread than aer the search stage. However, the poor energies of the structures corresponding to the experimental forms showed that, contrary to the original hope, DFTB3-D3 had not produced a more accurate energy ranking. Hence, all the unique optimised crystal structures within a large window of 50 kJ mol À1 of the global minimum in DFTB3-D3 energy were taken forward. Clustering to remove duplicates and applying the energy cut-off each reduced the number of structures by a molecule-dependent proportion, which is broken down in ESI Table 1. † This resulted in 3346 crystal structures for molecule XXVI, 5328 for GSK269984B, 13 490 for molecule XXIII, 19 146 for molecule XX, 3078 for the A tautomer of mebendazole and 3352 for its C tautomer, requiring optimisation with a more accurate method.

Re-ranking using improved molecular wave-functions
The lattice energies with DFTB3-D3 were clearly inadequate, hence a further lattice energy evaluation with a more accurate description of the wave-function was required. We assumed that the DFTB3-D3 optimisation had improved the molecular geometries, and this was exploited by reverting to the J mol approach used to generate the crystal structures, but using distributed multipoles instead of atomic charges, and a more accurate evaluation of DE intra . For all the DFTB3-D3 optimised crystal structures, the molecular wave-function of each distinct molecular conformation was calculated at the PBE0 6-31G(d,p) level of theory using Gaussian09, and distributed multipoles up to hexadecapoles were derived from the charge density using GDMA 2.2. 78 Finally the intermolecular interactions were optimised with DMACRYS. 79 U inter was modelled using an electrostatic component calculated from distributed multipoles combined with a repulsiondispersion component calculated with the empirically-tted exp-6 FIT potential. The crystal structures optimised with this specic method, denoted J PBE0+FIT mol , were ranked in terms of E latt , with DE intra estimated as the difference between the energy of each crystalline conformer and the PBE0 6-31G(d,p) energy of the DFTB3-D3 optimised gas-phase minimum calculated with Gaussian09. This was used to avoid the absolute values of E latt being affected by a DE intra component calculated relative to a value optimised with a different wave-function. Although the energies of the DFTB3-D3 optimised gas-phase minima are 12-25 kJ mol À1 higher than their PBE0 6-31G(d,p) optimised counterparts, this systematic error does not change the energy differences between polymorphs, which are the main assessment criteria of CSP studies. For each molecule, all the crystal structures within 50 kJ mol À1 were clustered to remove duplicates.
Rigid-molecule harmonic phonon calculations were carried out on the same structures as for the DFTB3-D3 phonons with DMACRYS using supercells generated following the methodology of Nyman and Day. 80 This computationally cheap approach effectively assumes that the molecular modes do not affect DF vib , in contrast to the DFTB3-D3 calculations that allow the molecular and lattice modes to couple. The rigid-molecule J PBE0+FIT mol model assumes that the molecular modes make an identical contribution to F vib for all structures. This is a necessary assumption since the pure molecular modes could only be calculated at a local isolated-molecule minimum, and many conformations are far from such minima (although they would not necessarily be dened as conformational polymorphs). 81

Assessment of the results
Crystal structures were compared with the experimental crystal structures and sets of the most signicant computer-generated PPMs from the previous CSP studies identied in a previous work 25 using the Crystal Packing Similarity tool in the CSD Python API, trying to match 15-molecule clusters with 20% distance and 20 angle tolerances. In a few cases, the tolerances had to be slightly increased to nd matches with some of the computer-generated structures.
The generated crystal structures that upon optimisation ended up matching the experimental crystal structures and the signicant computer-generated PPMs 25 were optimised with CrystalOptimizer 82 at the PBE0 6-31G(d,p) level of theory, as a function of a set of torsion and bond angles (shown in ESI Fig. 6 †) that were selected according to rules dened by Nyman and Day 26 based on chemical intuition derived from small model molecules. These calculations were used as a stringent test to verify how the geometries and energies obtained using DFTB3-D3 and J PBE0+FIT mol compare with those obtained through optimising the most exible conformational degrees of freedom using a more accurate and computationally expensive J mol model.

Intra-and intermolecular geometries
The quality of reproduction of each experimental Z 0 ¼ 1 crystal structure and the molecular conformation within it, before and aer optimisations with DTFB3-D3 and J PBE0+FIT mol is shown in Table 1, and contrasted with the reproductions achieved with the more computationally demanding CrystalOptimizer.
DFTB3-D3 optimisations are effective at improving all the crystalline molecular conformations from the search generated structure, with a 23% improvement in the average RMSD 1 . The improvement in the overall crystal packing is more variable, with some becoming slightly worse and some slightly better with a mere 4% improvement in the average RMSD 15 . Keeping the molecule rigid at the DFTB3-D3 geometry but using the distributed multipoles and intramolecular energy from a superior electronic structure method improves the intermolecular packing, leading to an overall improvement of the average RMSD 15 of 22% aer the J PBE0+FIT mol optimisations. Both the molecular and crystalline geometries are of a slightly worse quality than those obtained aer the CrystalOptimizer optimisations, which improve the search structure by an average RMSD 1 of 31% and an average RMSD 15 of 38%. However, DFTB3-D3 renements could be carried out at a much lower computational cost for these types of molecules, and yet successfully reproduced the structures.

Lattice energy rankings
The energy rankings of the crystal structures matching the experimental ones and their stabilities relative to the most stable one at that stage are shown in Table 2, and compared to those obtained in the original studies. DFTB3-D3 energies are inadequate for CSP and in most cases the experimental crystal structures are ranked worse than aer the search. On the other hand, the nal optimisation of just U inter (i.e. keeping the DFTB3-D3-optimised molecular geometries rigid) with the J PBE0+FIT mol model leads to a large improvement in the relative energies, with the crystal structures matching their experimental counterparts having rankings similar to those in the original CSP studies, which had been performed at a much larger computational cost. In three cases (molecule XX, GSK269984B, and mebendazole form A) the experimental crystal structure is still the global minimum. Experimental form B of molecule XXIII, form C of mebendazole, and XXVI are all within a few kJ mol À1 of the global minimum, while forms A and D of molecule XXIII are higher in energy, although still within a sensible energy window considering the issues with predicting the stability of the polymorphs of XXIII with any method as shown in Fig. 5. 1 A question that arises is whether the energy landscapes, such as the two examples in Fig. 3, differ signicantly in the range of putative polymorphs produced. Any comparison with the original studies is not straightforward because of the developments in the algorithms, variations in settings and the potential energy surfaces being used. A selection of the lowest energy PPMs and a few conformationally diverse structures from the original CSP studies had been selected for testing the workow for generating the crystal structures 25 and we nd that the vast majority of these structures remain (see ESI Tables 3-7 †) on the crystal energy landscapes (Fig. 3 and ESI Fig. 7-9 †). Out of the 180 targeted PPMs for the ve molecules, 21 were not found in the set of J PBE0+FIT mol optimised crystal structures. However, in eight of those cases the PPMs were already missing aer the search, and for a further nine the recent search structure was a very poor match. 25 Only four PPMs that had been considered as 'certainly found' in the search 25 were lost in this study, and either these were towards the high energy end of the sample, or similar types of packing are present in the crystal energy landscape. Some PPMs were found, but only above 20 kJ mol À1 from the global minimum. This can reect the huge sensitivity of E latt to quite subtle changes in conformation, particularly when hydrogen bonds can be inter-or intramolecular as in GSK269984B, and a crystal structure containing a molecular conformation optimised at the DFTB3-D3 level may converge to a less favourable local minimum when the intermolecular interactions are optimised with the J PBE0+FIT mol method. The huge dependence of E latt on subtle changes is further highlighted by the differences between the energies obtained in previous CSP studies and those obtained via re-optimisation of the key crystal structures with CrystalOptimizer, as shown in ESI Tables 3-7. † These optimisations also seem to lead to different geometries compared to the J PBE0+FIT mol optimised crystal structures, highlighting how different routes can lead to different outcomes.

Phonon calculations
The calculation of phonons by DFTB3-D3 methods proved difficult to automate for reasons that would apply equally to signicantly more expensive periodic electronic structure methods. For certain specic crystal structures, it was necessary to perform optimisations to an even tighter convergence on the energies and gradients in order to remove small imaginary frequencies (<100 i cm À1 ).
The recipe of only needing supercells to have a minimum length of 10 A proved insufficient to converge the Brillouin sampling so that the difference in F vib between structures could be converged below 1 kJ mol À1 , mainly because of the differences in entropy. Thus the contrasting calculations in F vib (Fig. 4) contain signicant numerical uncertainties, which do not cancel between different structures of the same molecule. Overall, the DFTB3-D3 phonons do not cause signicant re-ranking; the most notable exception is for molecule XXVI, where the experimental crystal structure becomes lower in free energy than the global minimum. It is clear that the inclusion of the coupling of the molecular modes with the lattice modes in the DFTB3-D3 calculations leads to a wider range of free energy differences than with rigid-body phonons, implying that there is a considerable effect from the variations of the modes in different conformations and their couplings. The rigid-body phonons, which used 4 or 5 linear supercells to converge the Brillouin zone sampling, may be subject to greater errors in ignoring the coupling of the modes for this type of molecule.

Computational cost
The savings in computational cost are quite large with this methodology, when compared with the original CSP studies, as detailed in ESI Tables 11-13. † The smallest saving (30%) was for the two tautomers of mebendazole, which is the least complex molecule in terms of size and conformational exibility, and the greatest was for XXVI (90%). Some of this reduction in computational cost would be due to the use of different computer clusters, but the saving from replacing CrystalOptimizer and sometimes intermediate renement methods with DFTB3-D3 structure renement and J PBE0+FIT mol seems to increase as a function of molecular size and exibility. The rigid-molecule phonons were much cheaper to calculate than the DFTB3-D3 phonons.

Discussion
This study pioneers the assessment of the DFTB3-D3 method for CSP studies of pharmaceuticals. In principle, the free energy calculated using a dispersion corrected PBE0 functional for each crystal structure (i.e. J crys using the same wavefunction as in the J PBE0+FIT mol method) should be more accurate than using the PBE functional that is commonly used for the nal lattice energy evaluation in CSP studies, 1 with both being more accurate than DFTB3-D3 calculations. However, the enormous computational cost makes this impractical for use in accurate CSP studies of large exible molecules with current computer resources. Full DFTB3-D3 optimisations are signicantly better than xed cell optimisations of selected torsion angles with a transferable force-eld, a previously used intermediate step in CSP for exible molecules, 43,44,50 where the energies were discarded. In practice, theoretical accuracy, which is very dependent on the molecule and range of crystal structures, has to be compromised by the huge number of structures that need to be considered, and the need to make such studies affordable for larger pharmaceutical studies. Thus the time saving produced by an intermediate DFTB3-D3 step is an important consideration for extending CSP to be used more widely in pharmaceutical development. DFTB3-D3 scales as N ln(N), in contrast to N 3 for periodic DFT-D methods, 58 where N is the number of atoms in the unit cell. The cost of the J mol approach is more variable, as by using databases of previous calculations and methods of interpolation 82 the Fig. 4 The relative vibrational contributions to the Helmholtz free energy DF vib , which can be added to E latt to calculate A. For each molecule, DF vib is calculated relative to the structure which is the E latt global minimum within the J PBE0+FIT mol method, for which DF vib ¼ 0. The experimental structures are in green. The rigid-body modes are pure lattice modes, calculated from the J mol method. Full details about the crystal structures and their energies at the different stages can be found in ESI Table 10 cost of this approach decreases as more structures are optimised. However, the J mol approach scales badly with the number of conformational degrees of freedom that are allowed to vary in response to the packing forces of specic crystals. The other conformational variables cannot be xed, but have to be simultaneously ab initio optimised as intramolecular energy barriers can be drastically lowered by small changes in other parts of the molecule. 83 Pharmaceutical molecules, where packing forces acting on large aromatic rings at different ends of the molecule strain the conformation of linking groups, and inter-and intramolecular dispersion need to be balanced, are very demanding of the quality of the modelling of the potential energy surface. The numerical results shown in Table 2 suggest that DFTB3-D3 is of similar accuracy to PBE0 partial optimisation using CrystalOptimizer for reproducing the experimental conformers, as judged by RMSD 1 values. Holding the DFTB3-D3 conformations as rigid during the nal optimisation with the J PBE0+FIT mol model gives a slightly worse reproduction of the crystal structures in terms of RMSD 15 , but at a reduced cost. Comparisons with the experimental conformations (see ESI Table 2 †) show that in most cases the DFTB3-D3 reproduction is good, probably within the amplitude of the vibrational motions of the terminal groups. The only exception is XXIII form D, where the optimisations have found a conformer close to the search generated structure, but visually different from the experimental one. The differences between crystal structures, as judged by RMSD 15 , powder pattern similarity, or any other measure, raise the issue of clustering: how can one eliminate duplicates while being certain not to eliminate potential polymorphs? Many polymorphs are very similar in their packing. 84 Determining which structures correspond to distinct free energy minima at crystallisation temperatures requires realistic and expensive modelling of molecular motions within the crystals and would only ever be applied to the most promising PPMs resulting from a CSP study. The question of whether structures could remain distinct during crystallisation and growth is closely linked to the possibility of disorder within crystal structures, and indeed CSP can help rationalise disordered pharmaceuticals. 8,11 For example, molecules that can pack in hydrogen-bonded layers that can stack in different ways with similar energies could form polytypic packings or have stacking faults. 20 Closely related structures can manifest as static or dynamic disorder depending on the energy barriers. 17 Hence, decisions about the extent of the search and the clustering criteria can have a major effect on the number of structures that need to be considered and their interpretation. It appears that using an electronic structure level optimisation (DFTB3-D3) has not signicantly reduced the number of minima (only by 1-4%, see ESI Table 1 †) for the molecules considered in this study, hinting that few structures are artefacts of the J mol approach.
The disappointment is that the relative energies generated by DFTB3-D3 are too poor (see ESI Section 2.3.1 †) to allow a condent, drastic reduction in the number of structures to be investigated with more accurate and expensive methods. This is probably due to the limitations of this methodology, such as the approximate Hamiltonian used in the molecular overlap regime where atoms are in van der Waals contact, and approximating the long-range electrostatic potential by atomic monopoles, although they are iterated to self-consistency to allow some polarisation of the molecules by the eld of the other molecules. 59 There is also an underestimation of the Pauli repulsion that can lead to an overestimate of the density of molecular crystals; 45 the unbalanced description of Pauli repulsion in the parametrization of DFTB seems to originate from the underlying basis connement. 60 Some observed deciencies in hydrogen bond and halogen bond geometries and energies have led to the proposal of additional corrections, 71,73 but these do not seem to improve the results for large molecular assemblies. 46 The long range electrostatic interactions are described by atomic charges in both DFTB3-D3 and in the crystal generation stage, but from atomic multipoles in the nal energy evaluation, and the quality of the electrostatic model is very critical in crystal structure prediction, particularly for polar hydrogen bonded molecules; 85 this explains why the nal energy ranking appears to be much more accurate than the initial and intermediate ones. Several other differences between the various methodologies employed exist in the ways in which intra-and intermolecular interactions are treated and how they are combined to optimise the crystal structures and calculate E latt . All these differences play a part in determining the quality of the geometries and of the energy ranking, and it appears that despite the clear advantages of periodic electronic methods a high-quality crystal energy landscape cannot be calculated with a simplied wave-function. DFTB3-D3 and the J mol approach are less applicable in the presence of strong induction forces, suggesting they should not be used for salts or zwitterions, though there have been successful studies which make allowance for these deciencies. 86,87 The affordable density functionals such as PBE also have known problems with these types of systems. [88][89][90] Furthermore, it is very challenging to have computational methods that are equally reliable for intramolecular dispersion effects 91 and intermolecular dispersion extending through the range of separations sampled in crystal structures, but this balance underlies the observed preference for exible molecules to pack in extended crystal structures. 92 The large dispersion contributions make the energies very density dependent, as well as being sensitive to conformational distortion and the stronger more directional interactions such as hydrogen bonding.
The phonon calculations within the J crys approach, which are only affordable using DFTB3-D3, are conceptually different from the rigid-body J mol approach, in including effects for molecules having many molecular modes that are of comparable frequency to the lattice modes. 41 The use of DFTB3-D3 accounts for differences in the molecular modes within different crystalline geometries. This leads to an even larger spread in F vib values than with the rigid-body model, which already shows a range of DF vib that is comparable with polymorph energy differences, 4 and was estimated to reorder the calculated stability of about 9% of over 500 observed pairs of polymorphs. 80 This raises a question as to whether thermal effects are likely to be more signicant in determining the relative thermodynamic stability of pharmaceutical polymorphs than for more rigid molecules, particularly when there is a larger range of densities amongst the PPMs. DFTB3-D3 calculates all phonon modes within the specic crystal in a reasonable timeframe, and these can be added to E latt calculated with J PBE0+FIT mol or a better periodic DFT-D method. 70,93,94 The disadvantages of a poorer potential energy surface need to be balanced with the problems of converging energies with the size of the supercell to equivalent accuracy for different polymorphs demonstrated in this study. 95 Even if it were possible to calculate completely reliable harmonic modes, which the current results in Fig. 5 show has yet to be achieved, thermal expansion is likely to be important for both absolute free energies 65 and the differences in thermal expansion between polymorphs, which will affect the relative free energies. 41,96 These estimates assume that the (quasi)harmonic model is equally adequate for all polymorphs, which is questionable for larger molecules where methyl groups may be rotating and phenyl rings may have different large amplitude motions, depending on the packing. Hence, the potential energy surface should be explored more explicitly, for example by molecular dynamics simulations, 97,98 as none of the methodologies currently available can fully describe all the possible thermal effects that can affect crystal structures and their relative free energy ranking. Many of the challenges in CSP studies are illustrated by the contrast between the crystal energy landscapes of XX and XXIII in Fig. 3, which from the chemical diagrams look like similar extended pharmaceutical molecules. The crystal energy landscape of molecule XX represents the simplest outcome of a CSP study, with a clear prediction of one crystal structure being more stable than any of its competitors. This structural renement has led to a larger energy gap than previously found with two different J mol approaches, 50 using CrystalOptimizer (FCC) and a force-eld (RCM) for modifying the molecular conformation in response to the packing forces. Both had the experimentally-known crystal structure as the global minimum in E latt but with a margin of less than 1 kJ mol À1 from the second most favourable PPM. Comparing the quality of this landscape with those of the previous studies is impossible as only three structures among the lowest ten found with the RCM method were among the top ten when optimised with the FCC method. ESI Table 5 † shows that seven of the ten lowestenergy FCC crystal structures are among the ten lowest energy J PBE0+FIT mol crystal structures, while for RCM this is only true for the crystal structure that matches the experimental form (see ESI Table 8 †). The sensitivity of energies to the E latt model is further shown by a generally very effective J crys methodology developed Fig. 5 Relative energies of the polymorphs of XXIII calculated in this study compared with those reported by the participants of the 6 th Blind Test. 1 Note that values linked by dashed lines denote the changes from adding free energy estimates, with the group identifier in parentheses, and R denoting groups which only re-ranked the crystal structures. by Neumann et al. 21,24,[99][100][101] having the experimental crystal structure ranked 7 th , 27 although that search included some Z 0 ¼ 2 crystal structures. Some of these Z 0 ¼ 2 structures also appear to be highly competitive when optimised with DFTB3-D3 and J PBE0+FIT mol , as shown in ESI Table 9. † Nevertheless in our study the structure matching the only known experimental form is calculated to be the most stable by approximately 3 kJ mol À1 , and there are relatively few alternatives to be considered as possible metastable polymorphs.
In contrast, the crystal energy landscape of XXIII has successfully found the three Z 0 ¼ 1 polymorphs, but amongst a plethora of competitive Z 0 ¼ 1 structures. Two further Z 0 ¼ 2 polymorphs are known from the industrial polymorph screening. Only form B is among the lowest energy crystal structures, while forms A and D are much higher in energy amongst many other structures of similar energy and density. Comparing these results with the sensitivity of the known polymorphs to different energy models, Fig. 5 suggests that the energy differences would have been reduced if the nal energy optimisation had been performed using a good periodic DFT-D method, like TPSS-D3, 102 PBE-MBD 98 or B86bPBE-XDM. 94,103 However, this contrasts with experimental stabilities from slurrying data, where form A is the most stable at 257 K and form D at 293 K. 1 Overall, the sensitivity of the energies of the known forms of XXIII highlights the challenges of modelling the thermodynamics of the organic solid state. The contrast between the crystal energy landscapes of XX and XXIII emphasises the need for more CSP studies to be performed in collaboration with experimentalists in industry and academia.

Conclusions
Periodic DFTB3-D3 is suitable for optimising the crystal structures of exible molecules, but does not provide a sufficiently accurate energy ranking. Experimental conformers and crystal structures are reproduced by DFTB3-D3 with average RMSD 1 and RMSD 15 values of 0.16 A and 0.52 A, respectively. DFTB3-D3 calculations suggest that the phonon modes may cause a more signicant reranking of the energies of polymorphs of exible pharmaceuticals than for smaller, more rigid molecules. The demands of performing CSP for exible pharmaceuticals, and hence the most cost-efficient strategy, are very dependent on the molecule and the number of competitive crystal structures.

Conflicts of interest
There are no conicts to declare.