Is zeroth order crystal structure prediction (CSP_0) coming to maturity? What should we aim for in an ideal crystal structure prediction code?

Crystal structure prediction based on searching for the global minimum in the lattice energy (CSP_0) is growing in use for guiding the discovery of new materials, for example, new functional materials, new phases of interest to planetary scientists and new polymorphs relevant to pharmaceutical development. This Faraday Discussion can assess the progress of CSP_0 over the range of types of materials to which CSP is currently and could be applied, which depends on our ability to model the variety of interatomic forces in crystals. The basic hypothesis, that the outcome of crystallisation is determined by thermodynamics, needs examining by considering methods of modelling relative thermodynamic stability not only as a function of pressure and temperature, but also of size, solvent and the presence of heterogeneous templates or impurities (CSP_thd). Given that many important materials persist, and indeed may be formed, when they are not the most thermodynamically stable structure, we need to de ﬁ ne what would be required of an ideal CSP code (CSP_aim).


Introduction
Intellectual curiosity as to whether we can predict crystal structures predates the computer era, with Kitaigorodsky's mechanical structure seeker tting the "projections" (bumps) formed by the atoms of one molecule into the hollows of another so that the molecules dovetailed into a close packed structure, or various radius ratio rules for simple inorganic crystals, that also are based on the principle of close packing.Maddox's famous quote 1 in 1981, "One of the continuing scandals in the physical sciences is that it remains in general impossible to predict the structure of even the simplest crystalline solids from a knowledge of their chemical composition", can be seen as reecting the common assumption at the time, that a given molecule or ionic composition always crystallised in the same crystal structure.The term Crystal Structure Prediction (CSP) originates from the days when they were seeking to predict the crystal structure, which was that of the rst crystal that could be grown to a size and quality suitable for a crystallographic determination.The crystallography was usually done to prove that the correct molecule had been synthesised.As McCrone said in 1965 "In spite of the fact that different polymorphs of the same compound are, in general, as different in structure and properties as the crystals of two different compounds, most chemists are almost completely unaware of the nature of polymorphism and the potential usefulness of knowledge of this phenomenon in research". 2omputers make it possible to test whether we really understand what determines how a molecule will crystallise, by programming a theory and applying it to test the "predictions" against experiment.There is a big distinction between a theory or program that seeks to predict the crystal structure and one which seeks to predict all polymorphs.This distinction needs to be clear in this discussion.Most CSP programs are based on the theory that the crystal structure is the most thermodynamically stable structure, and, at least initially, assume that the relative thermodynamic stability can be approximated by the lattice energy, the energy of the static lattice relative to innitely separated molecules in their lowest energy conformation (or relative to all electrons and nuclei at innite separation).
I would like to dene CSP_0, the zeroth order model, as the attempt to predict the most likely (thermodynamically stable) crystal structure as the most stable in lattice energy.That some of the competitive local minima in the lattice energy correspond to polymorphs is fortunate for the practical interest in CSP.
We can then dene CSP_thd as the full implementation of the assumption that realistically estimates the relative thermodynamic stability of the crystal structures, and so will predict the most thermodynamically stable crystal structure under given conditions of temperature and pressure and any other thermodynamic variables that are relevant.The pharmaceutical industry would really welcome such a code, but currently the calculation of the phase diagram of crystalline methanol represents a major step forward in this regard. 3The situation is more advanced for other materials, with a revision of the phase diagram of CaCO 3 over half the pressure range within the earth's mantle, 4 leading to the discovery of a new phase with implications for carbon storage in the deep mantle. 5ifferences in the approximations required for estimating the appropriate relative thermodynamic stability for different material types are currently important.
If we calculate the free energy minima by calculating the true thermodynamic stabilities of the CSP_0 structures, we expect that some lattice energy minima will have merged into the same free energy minimum.This structure could have a higher symmetry on average than any lattice energy minimum, if it is a dynamically disordered structure.The structures that are within the likely energy range of possible polymorphism constitute the crystal energy landscape, the set of thermodynamically plausible structures.Thus CSP_thd methods will undoubtedly improve the prediction of polymorphism at practically relevant temperatures, but how do we dene which local minima on the crystal energy landscape are possible, practically relevant polymorphs?
One practical justication for developing CSP methods is in the design of new materials, to guide synthetic work.The main benet of such studies is to avoid the synthesis of molecules or materials that will not readily crystallise in a structure which has the desired properties.This requires the calculation of the property of interest from the computer-generated structures, with sufficient realism to convince the experimental group that the synthetic effort is worth making.If more than the thermodynamically stable structure is of interest, which will generally be the case if metastable polymorphs (or variants in chiral composition) are likely to complicate product manufacture, then the property needs to be calculated for all the structures on the crystal energy landscape, giving crystal-structure-property maps.CSP_0 may be adequate for excluding an organic molecule from the synthetic program, but CSP_thd is desirable to test whether the desired novel organic material is the most thermodynamically stable at ambient.Other materials can survive over a much greater range of temperatures and pressures, and for these CSP_thd can be essential, with pressure being particularly widely used for planetary science.
The second practical justication for developing CSP is polymorphism.The importance of polymorphism for the quality control of industrial materials, which is particularly acute for the pharmaceutical industry, has inspired the development and commercialisation of CSP.However, intellectually, it messes up the ability to test the theory for predicting crystal structures, as it is unclear what crystal structures have to be predicted.Conclusions from the Cambridge Crystallographic Data Centre's blind tests of organic crystal structure prediction 6 have always been qualied by the possibility of unreported polymorphs.Over the lifetime of CSP, the question has changed from which systems are polymorphic to which are not.Systems, such as aspirin, that used to be quoted as examples of monomorphic systems now have polymorphs in the Cambridge Structural Database (CSD). 7t is now een years since a group of UK scientists convinced Research Councils UK that a computational method of crystal structure prediction would be a "Basic Technology" for many areas of science.It is gratifying that one of our industrial contributions raises the question as to whether CSP is changing from basic science to applied technology (DOI: 10.1039/c8fd00033f).I would like to start this discussion meeting by asking a few questions about the state of CSP_0 and CSP_thd and considering what we would expect of a genuine crystal structure prediction code (CSP_aim) in terms of polymorph prediction.Here we must recognise that over-predicting polymorphs is not helpful, though it is more useful than having no computational guidance about the completeness of the experimental polymorph screen.Should we be aiming for a code that only produces the polymorphs that can be experimentally found, along with sufficiently reliable predicted properties to ensure that they could be found or safely dismissed as irrelevant?
The other dimension to this discussion is the extension of CSP methods to the widest range of materials.The ability to predict the crystal structures of benzene will not guarantee that all pharmaceuticals can be predicted by the same methods, any more than a CSP method that works for NaCl will work for zeolites.We now are developing CSP for many new forms of materials such as MOFs (metal organic frameworks), organometallics, and anything that the synthetic materials chemist can develop, including interface structures.This discussion should exchange ideas between the CSP communities working on different types of materials.The differences arise from the nature of the building blocks being used, and different interatomic interactions being balanced.A lot of the challenges in organic CSP for exible molecules come from trying to balance the different interactions within the molecules, such as exible torsions, with the variety of possible intermolecular interactions.When the molecules are linked by a great diversity of atoms, such as metals, the distinctions between inter-and intramolecular, ionic and covalent, oxidation and spin states may become less clear-cut and so cause inaccuracy for a given type of lattice energy modelling.As the coverage of the periodic table increases, the possibility of chemical substitutions in the lattice increases, as different ions can be exchanged with less effect on structures and relative energies than changing organic functional groups.
When CSP is applied to geological materials there is a problem of dening the composition.The different classes of materials have different plausible starting assumptions in CSP, such as (approximately) rigid building blocks, but the complexity and difficulty comes as we combine different types of interactions of different strength and directionality.Hence "variety of interatomic interactions" is a major aspect of the material complexity axis on a two dimensional diagrammatic scheme on which to assess our progress as shown in Fig. 1.Dirac's famous quote, "The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known", is essentially saying that we have the physical laws for a universal CSP code.Developments in computer power are allowing us to make huge strides in applying CSP_0, however we are all limited by resources in testing the theoretically more correct CSP_thd.How long will it be necessary to make material and application-dependent short cuts?Against the huge ambition of a universal CSP_aim code, we can plot our progress on Fig. 1.At each point, we have to tension reliability for all materials in that class against McCrone's famous Fig. 1 Schematic of the two directions of progress required for a universal crystal structure prediction code, which predicts only all the experimentally observable polymorphs of a system, and sufficient properties to enable them to be found.
quote "every compound has different polymorphic forms, and that, in general, the number of forms known for a given compound is proportional to the time and money spent in research on that compound". 2CSP_0, finding the structure that has the lowest lattice energy

Accurate determination of the relative lattice energies
Most theoretical chemists will see an analogy between Fig. 1 and the diagram for systematically improving approximate wavefunctions as a function of basis set size (one axis) and electron correlation (the other axis) to reach true solution to the Schrödinger equation for a molecule or solid.Density functional methods have lost the advantages of the variation theorem, but there is still a hierarchy of methods [8][9][10] and the methods widely used in CSP are on the lower rungs.The hierarchy of what method of calculating the lattice energy is good enough for CSP is very system dependent.To what extent is the quantum mechanical method good enough for the type of material that the nal CSP_0 energies can be relied upon as being accurate enough?For inorganic systems, are the possible variants in oxidation states, spin states and even relativistic effects being adequately represented?For some materials, theory shows that there is no hope of a worthwhile CSP study without evaluating the lattice energy by using an electronic structure theory accuracy calculation on all competitive structures (J crys ).
Electronic structure theory methods suffer from the intimate link between the dispersion energy and electron correlation.Hence a high level of theory is required to incorporate dispersion realistically ab initio.Modelling the dispersion (DOI: 10.1039/c8fd00066b) is very important for organics, particularly if you are contrasting crystal structures which have only dispersion between layers with those that have stronger forces, such as hydrogen bonding in all three directions.Structures such as molecular ionic cocrystals (DOI: 10.1039/c8fd00036k) have many competing forces.This can lead to complex systems e.g. a pharmaceutical salt which exhibits many solid forms, including a series of alcohol solvates, where the alcohol interdigitates between the ionic and molecular layers. 11At one level this is a simple series of salt solvates, but there are subtle differences that can give rise to disorder in the alcohol hydrocarbon tail layer (causing problems in crystal growth and characterisation) and disorder in Cl À ions only detectable with the most accurate crystallography.In addition, this study that started from a CSP on the pharmaceutical salt, was further complicated by the late appearance of a highly metastable polymorph. 11This illustrates the issue of the level of atomic detail and energetic accuracy at which we want to predict crystal structures, in systems where there can be considerable stabilisation by solvents in a disordered form (DOI: 10.1039/c8fd00031j), or multiple similar ions can occupy the same position.The level of atomic detail required is intimately linked with the quality of the lattice energy model needed to accurately balance all the different types of intermolecular interactions present.
We now know that calculating the lattice energy by electronic modelling (J crys ) is very expensive, if it will be applied to a sufficient number of structures.We have different approaches to this challenge.One innovative approach is to use datadriven learning of the potential energy surface.There are applications of this approach to the CSP of medium and large-sized boron clusters (DOI: 10.1039/ c8fd00055g) and to phosphorus (DOI: 10.1039/c8fd00034d) starting our conference.
The more traditional approach is to have a hierarchical scheme, increasing in accuracy as it is possible to focus on the more plausible structures.This needs care but CSP_0 is a valuable test for the developers of different lattice energy codes, from the variety of lattice energy models that can be used, to the vital details that improve efficiency and accuracy, such as use of symmetry, optimisation methods, lattice summations and convergence criteria.We should be careful.Tests made on the X23 set of molecular crystal lattice energies, such as the recent one using 4 codes and 18 functionals 12 are valuable, but this is a data set of very small, virtually rigid molecules.For pharmaceuticals or other molecules with signicant exibility and considerable variation in the intramolecular dispersion and packing density, then the reliability of different methods could be different (DOI: 10.1039/c8fd00010g).Balancing the effects of exibility/conformational change with the other forces is a problem for a wide range of materials, from pharmaceuticals, to novel inorganic materials such as ultra-exible boron oxide frameworks (DOI: 10.1039/c8fd00052b) but the forces differ according to the types of atoms and structures involved.
The demands of the search mean that many methods will use atomistic modelling somewhere in the hierarchy, with an appropriate force-eld.The accuracies of force-elds for different applications vary according to the crystals, but they are widely used for ionic and molecular systems.The use of molecules or xed ionic groups makes a lot of sense for restricting the search to the molecule or structural type of interest.For molecular systems, it seems that a molecule-specic electrostatic model is usually essential.For exible molecules, this requires wavefunction calculations on the molecule covering the possible range of conformations in the crystal (J mol ), which also provides the conformational energy penalty.Atomistic modelling of molecular crystals can then be performed with the addition of a set of empirically tted repulsion-dispersion intermolecular atom-atom potentials.We are now revisiting the early work of Williams in empirically parameterising such atom-atom potentials for organics, with Day's group recently showing that such parameterizations combined with distributed multipoles can rival popular DFT-D methods in accuracy. 13A new scheme is being proposed for empirical repulsion-dispersion potentials for use with a specic level of J mol (DOI: 10.1039/c8fd00064f).Empirically tted potentials have the advantage of absorbing some of the approximations into the parametrisation, but that can also be a disadvantage in preventing extrapolation to other regions, e.g.predicting a high pressure phase of pyridine required a non-empirical anisotropic atom-atom intermolecular potential. 14We also risk double counting thermal effects when these potentials are used in Molecular Dynamics (MD) approximations to CSP_thd.On the other hand, empirically tted potentials, or other CSP schemes that are based on experimental crystal structures, could be more effective than CSP_0 from implicitly absorbing non-thermodynamic effects, i.e. partially moving towards CSP_thd.
Any system of CSP_0 is reliant on relative lattice energies, and relies on cancelation of errors between different crystal structures of the same system.This accounts for the huge successes that have been achieved by CSP_0, when the absolute lattice energies may be very inaccurate and the J crys or J mol calculations are so far from even state-of-the-art approximate solutions to the real wavefunction.Hence the need to be aware of the theoretical basis of the different approximations used as we increase the variety of types of interatomic interactions in our crystals.

Search coverage in CSP_0 structure generation
Choices are made in dening the search space to be covered.We can use sophisticated methods to try to ensure that the search is complete within a dened range, but a lack of computer and human time oen prevents this limit of condence being reached.Rening our methods to make them more efficient can potentially extend the scope of CSP studies, but to what extent are these renements universal or material dependent, for example using genetic algorithms for molecular crystals (ref.15, DOI: 10.1039/c8fd00067k) or zeolites (DOI: 10.1039/c8fd00035b)?Recognised patterns in the appropriate class of materials may be incorporated into the structure generation methodology very explicitly for the class of materials, from oxidation states (DOI: 10.1039/c8fd00032h), to properties at given site-symmetries in zeolites (DOI: 10.1039/c8fd00040a), optimised blueprints for MOFs (DOI: 10.1039/c8fd00051d), or probe structure for the inorganic compositional space (DOI: 10.1039/c8fd00045j).Ensuring that this provides an efficient encapsulation of the possible chemistry without restricting the ability to propose exciting new materials, particularly those that bridge traditional classes, is the real challenge in improving the complexity of structures being created rst in the computer by CSP (Fig. 1).We need to ensure that assumptions made for computational convenience do not become outdated.For example, when CSP_0 was rst instigated for organic molecules, the assumption of Z 0 ¼ 1 (one molecule in the asymmetric unit cell) was more reasonable than it is now, as more Z 0 > 1 structures are determined from better crystallographic methods and Z 0 > 1 is common for metastable polymorphs. 16Disorder plays a different role in different types of materials, but it is notable that one prediction made at the CCDC's 50 th party 17 was that in another 50 years, all organic crystal structures would be disordered.Does using data on known crystal structures (such as the Cambridge Structural Database) 18,19 risk bias from sociological inuences on research and reporting or historic limitations in analytical capabilities?
We need to be careful in dening our search and the expectations of users of the CSP codes.There is a history of claims being made that CSP has shown that all the polymorphs of a molecule are already known, which is what industry would love the codes to be able to do, but the earliest commercial CSP code had problems when claiming this for paracetamol, when many experimental polymorphism experts were aware of the literature evidence of another form.Subsequent CSP work has identied models for form III of paracetamol, 20 and there are now claims of more polymorphs formed under pressure 21 requiring a structure determination.More recently, CSP led to the experimental nding of the room temperature stable form of creatine 22 whereas an earlier CSP had helped determine the rst structure and claimed that there would be no polymorphs. 23urrently, the most effective search strategy and the type of lattice energy evaluation that is good enough is very dependent on the type of material being studied.The shortcuts allowed by choices of composition and stoichiometry also differdifferent elements can substitute into the same inorganic crystal structure, but even the best functional group swap favoured by crystal engineers, methyl for chloro in organic molecules, leads to a change of crystal structure unless there are no close or strong (e.g.hydrogen bonding) interactions involved at the substitution site. 24The importance of polymorphism appears to differ between materials, though this may reect the range of crystallisation/synthesis methods that can be applied, or the motivation for experimental screening for polymorphs.However, the ease of computation means that CSP_0 is applied to the widest range of materials.Whilst generally focussing on the global minimum in the lattice energy, we need to look with care at the energy gap to alternative structure(s) to determine whether it is within the likely accuracy of the method, i.e. a reasonable approximation to CSP_thd.For organic crystals, this is sometimes the case, but this monomorphism does require a uniquely good packing dening the structure in all three dimensions, which will rarely be true of all members of a class of materials that adopts low symmetry crystal structures.

CSP_thd; defining the crystal energy landscape
CSP_thd is calculating the thermodynamics accurately, so that you are genuinely predicting the most thermodynamically stable crystal structure at a given set of thermodynamic conditions.If thermodynamics was the only determinant of crystal structures, then the global minimum in CSP_thd at ambient would be the only structure observed at ambient conditions.Since this is oen not the case, with strictly metastable polymorphs being sufficiently long-lived to appear to be stable, we interpret the crystal energy landscape, the set of all thermodynamically plausible structures, and expect that this would show all the polymorphs that would be stable under those thermodynamic conditions.
How this relates to the assumption that CSP_0 should be adequate, as a starting point, can be seen by considering the thermodynamics of chiral separation by crystallisation, i.e. the ability to predict when you would get a mixture of enantiopure crystals from a racemic solution.Using the sublimation thermodynamic cycle, where you separate your molecules by subliming the crystal and then solvate them to form the solution, we can predict the solubility ratio of the racemic crystal (RS) and the enantiopure crystal (S): Since the solutions are composed of the same molecule (apart from 50% of the RS solution being the mirror image molecule) the energy of solvation should be the same, assuming the solutions are ideal or non-ideal to the same extent, so the solubility difference is determined by the crystal thermodynamics.Assuming the thermal entropy terms are the same, the free energy of sublimation differences reduce to the enthalpy of sublimation differences, and that, ignoring heat capacity and zero-point energy differences, leads to the lattice energy determining the solubility ratio.Thus CSP_0 covering both chiral and enantiopure space groups provides the most stable structure for each to determine whether chiral separation is possible.If this ratio is turned into the eutectic excess, then the thermodynamic prediction of the crystal that is formed is shown to be very sensitive to the approximated energy difference, as shown in Fig. 2.
The steepness of the curve in Fig. 2 shows that if the lattice energy difference is large, you can condently predict which crystal will form by CSP_0, but if it is small, the outcome is horribly dependent on the energy difference and hence the cancellation of errors in all the other energy terms.Our recent comparison of measured heat capacities at low and ambient temperatures for three pairs of enantiopure and racemic crystals of diverse organic molecules, and other measured thermodynamic quantities, 25 shows that the approximations listed above are not good enough for any of these molecules.The contributions differ in their sensitivity to the molecular and structural differences, particularly the extent to which the molecular vibrational frequencies are unchanged on crystallisation.Differences in the hydrogen bonding motif and consequent frequency shis in the IR spectra between the two crystals, can lead to a temperature dependence of the heat capacity difference around ambient.The performance of our attempts to predict thermal corrections using the harmonic approximation, 25 using rigid molecule J mol or J crys methods similar to those used for larger systems (DOI: 10.1039/c8fd00010g), shows the challenge of predicting free energies or relative solubility at a useful accuracy.
These arguments also apply to polymorphs and can be extended.What are the free energy difference implications of thermal expansion properties (DOI: 10.1039/c8fd00048d), which can differ so much in their anisotropy between polymorphs or systems?Correcting experimental quantities back to give an "experimental" lattice energy, as done in benchmarking set X23, will be limited by Fig. 2 The proportion of enantiopure and racemic crystal structures as a function of their energy difference DG ¼ DG(RS) À DG(S), plotted for three temperatures.The asymmetry comes from the definition of one mole of the racemic crystal corresponding to 1/2 a mole of each enantiomer, as used often in experimental work 25 but not in CSP where the reference state is a mole independent of chirality. 26The spread of energies that changes the crystallisation outcome is not dependent on the reference state.the accuracy of the calculated 27 or experimental 28 thermal corrections (which are nevertheless a big recent improvement on the 2RT correction that initially justi-ed the use of CSP_0 for chiral resolution).The cancellation of the small thermodynamic terms between different structures of the same compound will be very dependent on the structural differences.Perhaps a more signicant implication of Fig. 2 is a limit on the energy differences between polymorphs obtained by solution crystallisation, which is comparable to the estimated range of polymorphic lattice energy differences of observed polymorphs, 29 and the cutoffs oen used for progressing CSP_0 to more demanding calculations.This can mean that polymorphs obtained by other methods, for example by desolvating solvates 11 or the solid state synthesis of the molecule, could be much more metastable than polymorphs obtained by solution crystallisation.The concept of a crystal energy landscape requires a cut-off of thermodynamic plausibility.If this is dependent on the methods of crystallisation that can be applied, then it could be a context-determined parameter.
There are other thermodynamic contributions that should be included, given that real crystals are not perfect and innite, such as disorder.Congurational disorder is an energy term which has been estimated to account for why the low temperature phase of caffeine is statically disordered. 30Going from CSP generating potential disorder components to realistic modelling of the thermodynamics of disorder, let alone estimating disorder that is not thermodynamic but frozen in during crystallisation, is important for pharmaceuticals (DOI: 10.1039/ c8fd00072g), and yet order-disorder phase transitions are challenging to experimentally characterise or model (DOI: 10.1039/c8fd00042e).Some forms of disorder, such as polytypism, imply that coverage of the search space will never be complete.
The size of a crystallite and the presence of solvent have recently been shown to be important in thermodynamic stability by the demonstration of a reversible cross over in relative stability of the polymorphs of a cocrystal and an aromatic disulphide compound in liquid-assisted grinding experiments. 31These were rationalised by considering solvent-dependent surface energies relative to the bulk lattice energy.The effect that size can have on the relative stability of nuclei of different forms comes out clearly from consideration of the nano-cluster modelling of inorganic oxides (DOI: 10.1039/c8fd00060c).
The development of accurate calculations of the relative thermodynamic stability for CSP_thd will be a great step forward for the theoretical modelling of important properties, such as the solubility, morphology and stability range for pharmaceuticals.Doing this accurately for all properties related to thermodynamics for all known phases, let alone the entire crystal energy landscape, provides a signicant aim for theory and computer modelling.However, a major benet of the move from CSP_0 to CSP_thd is to eliminate structures that are artefacts of the use of a static lattice in CSP_0, i.e. are not minima in the free energy crystal landscape.The number of lattice energy minima that are eliminated by some form of molecular dynamics, from just a short shake-up 32 to a metadynamics treatment, [33][34][35] is very dependent on the molecule and structures concerned.A realistic CSP_thd should produce the more symmetrical structures that arise from dynamic disorder averaging over a variety of lattice energy minima (such polymorphs do not correspond to a lattice energy minimum and so are not produced by CSP_0).Experimental identication of such plastic crystals may be aided by simulations, allowing insight into the energetic and mechanistic forces that drive order-disorder phase transitions in plastic crystals (DOI: 10.1039/ c8fd00042e).There may be experimentally observed intermediate phases that we may feel are beyond the scope of CSP, for example, CSP gave the low temperature ordered phase III of cyclopentane, and MD the hexagonal plastic phase I, but the size of the MD cell probably contributed to the inability of the dynamical simulations to get good agreement with the experimental powder pattern of the intermediate plastic phase II. 36The development of sophisticated MD based methods to explore the free energy surface and study phase transitions, such as meta-shooting (DOI: 10.1039/c8fd00053k) will improve our understanding of the effects of the potential energy surface of CSP_thd.

CSP_aim
What do we consider that a genuine crystal structure prediction code (CSP_aim) should do?
Most people would want a CSP code to predict all practically important crystal structures that could be found for a dened system, and those involved in material synthesis or solid form screening would also want a recipe for how to nd them.Full CSP_thd would be adequate for systems where you expect to form the most stable structure at a given set of thermodynamic conditions, so experimentally you should just move to the relevant thermodynamic conditions.
CSP is undoubtedly leading to an increase in the number of polymorphs whose structures are now known.The use of CSP has helped to solve the structures of extremely brous polymorphs of coumarin grown from the melt, 80 new needle forms of resorcinol 37 (another example of a new form being found despite published predictions that there were no more forms), and another form of aspirin 38 from the melt.The system ROY, with its various colours, is one where it has long been known that there are further uncharacterised forms 39 that exist at ambient, but CSP produces too many candidates. 40The use of CSP to increase the number of polymorphs that have their structures determined, and the challenges involved will come up in discussion of the 8 th solved structure (DOI: 10.1039/c8fd00039e).Can CSP also, by proposing new experiments or encouraging more detailed analysis, increase the number of polymorphs that are detected (DOI: 10.1039/ c8fd00069g), as well as aiding their characterisation?
Disappearing polymorphs, where there is an inability to maintain control of the crystallisation of an apparently stable form aer a more thermodynamically stable form appears, have been a major justication for developing CSP because of their practical importance to the pharmaceutical industry. 41CSP_0 must generate any structure that is signicantly more stable than the known forms, as well as the known polymorphs, with CSP_thd helping conrm the relative thermodynamic stability at ambient.However, the scientic rationalisation of disappearing polymorphs would conclude that you ought to be able to reproduce any polymorph that has ever been observed, using identical crystallisation conditions, even if this requires a fresh laboratory and student 41 to avoid seeding.Having identical crystallisation conditions may be practically unachievable, as a 50 year old sample of the disappearing form II of progesterone contained a cocktail of impurities not found in the modern material. 42How does the issue of elusive or disappearing polymorphs, and the effects of impurities on the polymorph formed, impact the computational prediction of metastable polymorphs?This is illustrated by the new polymorph (g) of succinic acid that was found in an attempt to purify a synthesiser-made peptide from the resultant soup of impurities, by cocrystallisation with succinic acid (Fig. 3). 43We were challenged to predict the structure.It was readily generated as a metastable polymorph by CSP.Periodic DFT-D lattice energy calculations had g slightly more stable than the other known polymorphs for most dispersion corrections, though using the PBE + TS model, harmonic phonon estimates of the free energy differences made the b form more stable at room temperature.This closeness in energy is consistent with the g form being found concomitantly with b in the failed cocrystallisation experiment, and illustrates the improvement of CSP_thd on CSP_0.This case is relevant to dening CSP_aim because the g form has not been crystallised again, despite a large series of experiments testing the different impurities. 43he serendipitously-found g crystal is a conformational polymorph of succinic acid, with the conformation that NMR and MD simulations in water show as being dominant in solution. 43MD on the g form does not give a solid state transformation, but reproduces the structure with a small change in the monoclinic angle, which, if forced by metadynamics, leads to extensive defect formation.That is the best explanation we can nd to the problem of reproducing the new polymorph: the g form can be rather susceptible to defect formation that may lead to a ready transformation to the b form.Perhaps g succinic acid will be found again, and it is in the CSD as a challenge to anyone who can design a method of nding it.The high-temperature a phase of succinic acid was once considered elusive at normal temperatures, but has been found as a contaminant aer grinding 44,45 and in liquid-(but not air-) segmented ow crystallisation 46 or by spray-drying from water. 47This suggests that the a polymorph is observed when it forms rst and there has not been sufficient time for the solvent mediated transformation to b. Gradually we are beginning to see the effects of kinetics and the role of impurities in catalysing or inhibiting polymorphic transformations.The serendipitous observation of g succinic acid raises the question of what is wanted from CSP_aim: a recipe for producing the g form and predicting its stability as a function of crystal size?If it had not been observed, would its apparent feasibility in the computational modelling have appeared to be an overprediction of polymorphs?An early CSP study had noted that the planar conformation of succinic acid was less stable than that in the g form, but had used the dominance of the planar conformation in the many crystal structures of Fig. 3 The three polymorphs of succinic acid, showing the prediction of the novel g form by CSP_0 using a J mol approach, and the relative lattice energies by a variety of J crys methods.Despite its relative thermodynamic stability, the g polymorph has only been crystallised once in a failed cocrystallisation experiment.succinic acid, and a very limited search in the alternative conformation, to dismiss the possibility of any conformational polymorphs. 48hus, if we are to achieve CSP_aim, we need to work with a knowledge of the variables in crystallisation/synthesis conditions that can lead to the formation of different structures.This will be very much a function of the types of materials and the range of interatomic interactions involved, i.e. going up the vertical axis of Fig. 1 and will essentially encapsulate the whole eld of crystal engineering into our CSP theories and eventually codes.The possible synthetic methods in every eld are evolving rapidly, and CSP plays the crucial role of predicting structures that should be achievable, and there may be structure dependent methods that can help nd them.For example, CSP has inspired an isomorphous heterogeneous solution seeding experiment to produce a CSP predicted cocrystal which could not be made in four expert labs without the seeds initiating the rst cocrystallisation. 49Metastable pharmaceutical polymorphs have been successfully targeted by sublimation onto an isomorphous template crystal. 50Will we ever have the control to be able to make a crystal by placing each atom in the position corresponding to the CSP-generated desired crystal structure?This is more possible for inorganic materials than pharmaceuticals.The pharmaceutical industrial scientists recognise that there is no such thing as a standard solid form screen; not only can the basic workow vary between companies, 11,29 but also between molecules from the same drug discovery program, e.g. if one is able to obtain an amorphous starting material for one, and not for the other 51 then there may be a difference in the extent to which the starting point of the crystallisation has lost the memory of the input material structure.This sensitivity of crystal structure outcome to exact material synthesis conditions probably extends to all types of materials, which is why CSP is most effective when there is very close collaboration between the people doing the calculations and those in the laboratory.In the aim for a code that can predict all the crystal structures that can be found, but not those that could never be made, we will have to consider relative nucleation and growth rates.Nucleation has been the focus of another recent Faraday Discussion. 52There is an increasing realisation of the variety of mechanisms for crystallisation by particle attachment, and how this can vary in synthetic, biogenic and geological environments. 53The aspect of whether crystallisation occurs by monomer attachment, let alone according to the models of classical nucleation theory, is up for debate and observation.Atomic force microscopy (AFM) showed dense hillocks forming at ledges of the dominant surface of olanzapine in water. 54This two-step nucleation process, with a dense disordered solution forming, can be rationalised as the suspected growth unit (a dimer) has a variety of attachment sites on the ledge that are more stable than docking in the crystallographic site on the surface.The growth unit will not oen dock directly into the most stable crystallographic ledge site when it can get caught in so many ledge, or solute-solute, and solute-water complexes.The AFM also showed that the dense blobs on the surface could grow into orientated crystals of the thermodynamically stable olanzapine dihydrate D, whereas dissolution of olanzapine in water usually resulted in the metastable dihydrate B. We can computationally rationalise the better registry (in the correct orientation) of a small nanocluster of the stable dihydrate D on the specic olanzapine anhydrate surface than the metastable dihydrate B, representing the support the surface gives to nucleating one form over another. 54s we consider interfaces and growth rates, the problem of distinguishing thermodynamics from kinetics gets worse.Morphology predictions based on minimising the surface energy give unrealistically spherical structures (consistent with classical nucleation theory), which led to the attachment energy model, which is strictly only appropriate for vapour grown crystals if all surfaces are below their roughening temperature.The attachment energy model is the cheap model for morphology-structure-energy landscapes, with relative growth volume rates being a possible means of eliminating unlikely polymorphs. 55Predicting the morphology in different solvents uses models based on kinetic rates of attachment, dissolution, loss of solvent shell etc. derived from MD. 56 If we need to be reliant on MD to predict nucleation, growth and transformation rates, we are starting to enter the realm of multi-scale approximations to solving the appropriate nuclear and electronic time-dependent Schrödinger equation for the real system.We have lost the simplication of distinguishing between thermodynamics and kinetics.
The question of the extent to which CSP_aim will need to consider kinetic effects arising from differences in the experimental growth conditions is very problem dependent.The organic electronic community are obviously interested in substrate-induced polymorphs. 57Are these cases where full thermodynamic treatment including the interfacial effects would predict these structures?Concomitant polymorphs raise further questions 37,58,59 as to whether we could ever devise a computational method that would allow the condent prediction of the conditions needed to crystallise phase pure samples.Extensive crystallisation work could not obtain phase pure samples of the metastable forms II and III of olanzapine.Aer the structure of form II was solved from a serendipitous single crystal and CSP suggested a structure for form III, the degree of similarity, (differing only in the stacking of layers), made it rather improbable that the crystallisation could ever be controlled to crystallise the forms separately. 60he question as to how changing crystallisation or synthesis conditions can affect which structures are formed, and how readily they transform to the most thermodynamically stable structure at the specied temperature and pressure, brings us back to the original assumption behind CSP.Does the thermodynamically most stable form have to be obtainable?It is the central tenet of CSP methods, and yet when so many molecules are difficult to crystallise at all, the rst structure formed is quite unlikely to be the most stable by Ostwald's rule. 61Indeed, if CSP could guarantee that a pharmaceutical would never crystallise, so that the amorphous form could be condently developed, that would be a valuable application.)It is quite conceivable that the most stable form could be so kinetically hindered that it would not form.The monohydrate of 4-aminoquinaldine 62 is an important example, as CSP predicted that a more thermodynamically stable structure should exist.It was later found by careful experimentation using either hydrothermal conditions or an impurity, but the metastable polymorph is kinetically favoured in both nucleation and growth.
There are examples where analysis of the CSP_0 most stable structure, in contrast with the observed forms, can show that its formation seems rather unlikely on either crystal engineering or statistical likelihood. 63It is noteworthy that despite extensive polymorph screening, a signicant number of drugs are being marketed on the false assumption (as shown by CSP_0) that they are in the most stable thermodynamic form (DOI: 10.1039/c8fd00069g).Predicting polymorphs that could never be found also has major industrial implications (DOI: 10.1039/ c8fd00033f).If CSP_0 is now an applied technology for the pharmaceutical industry (DOI: 10.1039/c8fd00033f), we must have a clear concept of the limitations of the technology, which involves determining how it differs from CSP_aim for this class of material.

Practicality: CSP_aim and energy-structure-function maps
The eld of CSP as an aid to the discovery of new materials is becoming a reality across many types of materials.For example, in the eld of the 18-valence-electron ABX family, consideration of the unreported family members led to the rst synthesis of 15 compounds, including HfIrAs, a topological semi-metal of interest in quantum electronics, ZrNiPb, a small-gap semiconductor with a large Seebeck coefficient suitable for thermoelectric applications, and ZrIrSb, a rare example of a transparent p-type conductor with a high conductivity of holes. 64This approach has also led to the design and discovery of the transparent conductor TaIrGe. 65he use of computation for the accelerated discovery of new crystal structures (rather than just changing elements within known structures) in the complex inorganic Y-Sr-Ca-Ga-O phase eld has led to new structures with ordered variants Sr 2 Ca 3 Ga 6 O 14 and SrCa 2 Ga 2 O 6 . 66CSP on varying stoichiometries has also impacted energy materials research, with Li 7 Ge 3 being predicted 67 and then found in research on germanium anodes in lithium batteries. 68he progress in the CSP-led design and realisation of the lowest density porous organic cage crystal, with its gas storage and selectivity properties 69 is a particular landmark, as a case when an organic molecule was synthesised following CSP prediction.However, it is noteworthy that the synthesis was targeting a highly metastable structure on the energy-structure-function map.The most thermodynamically stable form (according to the CSP_0) was not reported.This aspect of solvents stabilising porous structures (DOI: 10.1039/c8fd00031j) further illustrates how crystallisation conditions affect the observed structure.Zeolites, another important porous material (DOI: 10.1039/c8fd00035b, DOI: 10.1039/ c8fd00040a), can also be more specically templated, 70 though the organic molecules that direct the structure are burnt off aerwards.
Organic functional materials were already expanding in scope at the start of CSP, when the emphasis was on the most stable possible structure, with density being the key property for energetic materials, or non-centrosymmetric packing for non-linear optically active materials.Nowadays, energy-structure-function maps can be made for advanced properties, both in terms of technological applications and the challenge of calculating the property such as charge carrier mobility in organic semiconductors. 71The differences between an organic crystal or lm and the traditional silicon semi-conductors illustrates how spanning different types of materials can combine other properties.If a material is being produced industrially, there are a myriad of properties that are relevant to its crystallisation, storage, use, and the quality control procedures that need to be in place.Our CSP studies are just the start of the multi-scale modelling that is involved in the digital design agenda, requiring information on the behaviour of the material on a range of length and time scales, as the pharmaceutical materials science tetrahedron 72 relates the structure, properties, performance and processing of a drug.Hence the role of CSP in polymorph screening, including the design of experiments to nd new forms, but the number of published examples contrasting the number of polymorphs found by an industrial screening process with those found by CSP is limited. 73This iterative work where CSP may inspire experiments that nd more forms, versus the problem of over-prediction, really illustrates the need to dene what is required from CSP. Pressure is becoming increasingly important in the discovery of CSP "predicted" forms for pharmaceuticals, 14,74,75 whereas CSP at pressure has long been established as a route to nding new phases of interest to planetary scientists, including changes in bonding, such as ionic ammonia. 76,77ence, CSP is playing a huge role in our ability to make new crystalline solids for many purposes.The combination of CSP and calculating (nonthermodynamic) properties, whether for targeting new functional materials or for structural characterisation, such as various diffraction 78 or solid state NMR experiments, 79 is really contributing to science.The question though is how much this relies on human scientic understanding interpreting the results of the CSP_0 (or CSP_thd) for experimental collaborators, or whether we can have a black-box computer code.

Conclusion
CSP_0 searches for the lowest lattice energy structures are becoming an established technique for a wide range of materials research, with an increasing diversity of types of interatomic forces within the materials.Hence the area of coverage of different materials in this discussion of the CSP_0 column in Fig. 1 is very wide.However, the most stable structure in CSP_0 is not always observed, and certainly is oen not the only structure that can be found.This sometimes reects the limitations of the thermodynamic modelling accuracy (i.e.really needing CSP_thd) but also the basic assumption that thermodynamics determines crystallisation.If the energy gaps are sufficiently large, then CSP_0 is good for determining whether a material is sufficiently likely to crystallise with a desired property that it is worth some experimental work.
However, when CSP_0 produces many structures that are close in energy, then eliminating those that are artefacts of approximating the free energy is more challenging.This is not only a challenge in obtaining the free energy of the perfect innite crystal at the practically relevant temperatures and pressures.There is also correctly modelling the thermodynamics related to size, solvent, and the presence of specic surfaces or other molecules in solution.How realistic are thoughts of a CSP_thd code, spanning a wide diversity of interatomic interactions?
A CSP_aim code would reliably output the crystal structures that could be formed, and no more.It seems likely that calculating the true relative thermodynamic stability will sometimes incorporate information relevant to the experiments to nd them, but much more understanding is needed of the mechanisms for nucleation catalysts and growth inhibitors.
CSP is maturing in the sense that an increasing number of experimental groups, including in the pharmaceutical industry, are taking it seriously enough to invest in CSP.It has been a long haul to get to this state.Over-prediction could lose this impetus.We need to be realistic about how much the codes can deliver, and the extent to which it is the experience of the scientist in interpreting the This journal is © The Royal Society of Chemistry 2018 Faraday Discuss., 2018, 211, 9-30 | 17 Open Access Article.Published on 27 July 2018.Downloaded on 12/23/2023 11:55:25 AM.This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

43
Fig.3The three polymorphs of succinic acid, showing the prediction of the novel g form by CSP_0 using a J mol approach, and the relative lattice energies by a variety of J crys methods.Despite its relative thermodynamic stability, the g polymorph has only been crystallised once in a failed cocrystallisation experiment.43