Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials

The reliability of organic molecular crystal structure prediction has improved tremendously in recent years. Crystal structure predictions for small, mostly rigid molecules are quickly becoming routine. Structure predictions for larger, highly flexible molecules are more challenging, but their crystal structures can also now be predicted with increasing rates of success. These advances are ushering in a new era where crystal structure prediction drives the experimental discovery of new solid forms. After briefly discussing the computational methods that enable successful crystal structure prediction, this perspective presents case studies from the literature that demonstrate how state-of-the-art crystal structure prediction can transform how scientists approach problems involving the organic solid state. Applications to pharmaceuticals, porous organic materials, photomechanical crystals, organic semi-conductors, and nuclear magnetic resonance crystallography are included. Finally, efforts to improve our understanding of which predicted crystal structures can actually be produced experimentally and other outstanding challenges are discussed.


Introduction
Molecular organic crystals occur across many areas of chemistry.The majority of small-molecule pharmaceuticals are administered in crystalline form. 1 Molecular crystals are key components of fertilizers, 2,3 pesticides, 4,5 and pigments. 6They can function as eld effect transistors, light-emitting diodes, and photovoltaic cells. 7Porous organic crystals can perform gas storage and separations. 8][12] The properties and functions of these crystals, including color, stability, solubility, carrier mobility, etc., oen depend strongly on the crystal packing.Notably, about half of all organic molecules are thought to exhibit polymorphism, 13 or the ability to adopt multiple distinct crystal packing motifs, and this creates both challenges and opportunities when working with organic materials.While the crystallization of the "wrong" polymorph can hinder the bioavailability of a pharmaceutical and force its recall, for example, the possibility to tailor crystal packing to achieve desired physical properties is alluring.Unfortunately, experimental polymorph control can be difficult, and even seemingly minor changes in the crystallization conditions or to the molecular structure can alter the crystal structure signicantly.The choice of solvent system, heat, pressure, or time can similarly transform a system from one polymorph to another.
For these reasons, developing new organic materials oen requires an understanding of the landscape of crystal structures that can occur for the species of interest.Given the difficulties in ensuring that all important crystal forms have been discovered experimentally, researchers have long sought the complementary ability to predict crystal polymorphs theoretically.Seventy years ago, science ction author Robert Heinlein dreamed of a future when "mathematical chemists will design new materials, predict their properties, and tell engineers how to make them-without entering the laboratory." 14Progress toward this goal remained slow for decades, and in 1998 Maddox famously referred to the difficulty in predicting crystal structures as "one of the continuing scandals in the physical sciences." 15Since then, however, crystal structure prediction (CSP) has transformed from scandal to reality, and Heinlein's vision is nally now being realized for organic crystals.
7][18][19][20][21] The results of the most recent 7th Blind Test will be published in the near future.The scope of successful predictions has progressed from small, rigid molecules to larger pharmaceutical-sized molecules with conformational exibility and/or disorder, and from single-component crystals to multi-component hydrates, solvates, co-crystals, and salts.Even the denition of what constitutes a "successful" crystal structure prediction has evolved to become more stringent over time.In the rst Blind Test, for example, simply nding the experimental crystal structure during the search procedure was considered a partial success, even if the energy model ranked it poorly.Today, a successful CSP is expected to predict both the structures and the relative stabilities accurately, and sometimes also how those stabilities vary with temperature and pressure.
3][24][25][26][27][28][29][30][31] CSP has expanded from a purely academic endeavor to one with multiple private companies developing soware, creating new algorithms, and providing contract CSP services.Some larger pharmaceutical companies have their own internal CSP teams as well.Beyond pharmaceuticals, CSP is being used to understand or discover new functional organic materials.][34] This perspective article seeks to highlight what organic CSP can accomplish today, how it can transform the discovery and understanding for a broad range of problems in organic materials, and where major outstanding challenges remain.Section 2 discusses reasons why CSP is such a difficult problem, while Section 3 provides a high-level overview of the methods currently used to overcome those challenges.Section 4 presents a variety of recent case studies that highlight the diverse frontiers of CSP, including examples from pharmaceutical formulation, its incorporation into nuclear magnetic resonance (NMR) crystallography, the discovery of new, highly porous organic crystals, the study of photochemical transformations in the solid state, and efforts towards the rational design of new materials.Finally, Section 5 discusses several directions in the eld that will likely prove important in the next few years.[36][37][38][39][40][41][42][43][44] 2 The crystal structure prediction challenge The difficulty of crystal structure prediction stems from several factors (Fig. 1): rst, the search space of potential structures is massive, including 230 possible space groups, one or more molecules in the asymmetric unit, and, for many species, a competition between intramolecular conformational and intermolecular packing forces.While some of these complexities can reasonably be managed by, for example, constraining the search to the most common space groups and/or to crystals with just one molecule in the asymmetric unit, plenty of experimental crystals lie outside these constraints.Moreover, the conformational degrees of freedom in many modern active pharmaceutical ingredients and other highly exible molecules are harder to circumvent, and they can dramatically increase the search space and the resulting computational costs of the structure prediction.
Second, the energy differences separating crystal polymorphs are small.Nearly all experimentally-known crystal polymorphs lie within 10 kJ mol −1 of one another, 13,45 and the energy differences are oen just ∼1-2 kJ mol −1 .Those small energy differences manifest from competitions among the hydrogen bonding, electrostatics, induction/polarization, and van der Waals dispersion interactions within and between the molecules.Achieving kJ mol −1 resolution in modeling these diverse interactions can be difficult for both force elds and electronic structure methods, especially for conformational polymorphs 46 whose crystal structures result from the interplay between changes in intramolecular conformation and the intermolecular crystal packing.
Third, while CSP oen focuses on predicting 0 K crystal lattice energies, 40 real-world crystal structures are determined by free energies at nite temperatures and pressures.In smaller molecules with limited exibility, the differences between relative lattice energies and relative room-temperature free energies are usually small (<2 kJ mol −1 ). 45,47However, the magnitude of the relative entropic/free energy contributions can Fig. 1 Organic molecular crystal structure is difficult due to the large search space of potential structures (blue dots) on the 0 K crystal energy landscape which are separated by small lattice energy differences.Moreover, the relative free energies between polymorphs vary as a function of temperature and pressure, and not all thermodynamically feasible crystal structures can be readily crystallized experimentally.
increase signicantly in large, exible drug-like molecules 30,48 and disordered crystals. 49,50Moreover, factors such as thermal expansion and dynamics can alter the nite-temperature crystal structures themselves.The magnitude of these effects is frequently modest, but not always.
Finally, the vast majority of CSP research has focused on the thermodynamic stability of the crystal, but polymorph crystallization is highly inuenced by kinetics.CSP routinely predicts far more thermodynamically viable candidate structures than are ever observed experimentally.There are multiple reasons for this over-prediction of structures, 51 but crystallization kinetics are one major reason that more candidate polymorphs are not found experimentally.While there have been important advances in modeling organic crystal polymorph nucleation and growth in recent years, [52][53][54][55] the statistical mechanical sampling challenges and the need for accurate but computationally inexpensive potentials represent on-going hurdles to reliable prediction of crystallization kinetics.Moreover, CSP routinely focuses on innite crystals, ignoring the surface energy contributions that depend on the size and shape of the nite crystallite.7][58][59][60][61] More detailed discussion of these issues is beyond the scope of this article.
3 Current methods of crystal structure prediction

Overview of hierarchical crystal structure prediction
The most common organic CSP approaches employ hierarchical stages of structure renement and ranking (Fig. 2).For example, the rst stage in the hierarchy might employ an inexpensive force eld potential to screen ∼10 5 -10 7 (pseudo-)randomly generated crystal structures, depending on the complexity of the species and the search space.The second-stage renes the ∼10 3 lowest-energy structures with an intermediate-quality model.In the third stage, the few hundred most stable structures might then be rened further and ranked with dispersion-corrected density functional theory (referred to here as "DFT-D" for brevity, though many different dispersion-inclusive DFT models are used in practice).Optionally, one might perform nal freeenergy corrections for a handful of the most stable crystal structures to predict their stabilities at nite temperatures and pressures.][36][37][38][39][40][41][42][43] A number of features factor into a successful crystal structure prediction.Ensuring a suitably-thorough search of crystal packing space is crucial.A routine search might focus only on crystals with a single molecule in the asymmetric unit (Z ′ = 1) and from the ∼15-20 most common space groups that account for over 90% of observed organic crystals. 62More exhaustive searches might consider all 230 space groups and/or crystals with Z ′ > 1. Signicant additional complexity is introduced to the CSP for exible molecules, due to the need to consider various equilibrium and non-equilibrium intramolecular conformations, or for multi-component crystals (co-crystals, solvates, hydrates, etc.), due to the much larger search space.Addressing these various complications can substantially increase the overall computational cost.Within the chosen search space, random [63][64][65] or low-discrepancy pseudo-random search approaches [66][67][68] are common in molecular CSP, though other global search algorithms such as simulated annealing, 69 particle-swarm optimization, [70][71][72] basin hopping 73 or evolutionary optimization 42,[74][75][76][77] are also used.
The low-cost computational models used in the early stages of a hierarchical CSP enable broad searching, and the subsequent ltering out of poor candidates allows the more expensive methods to be applied only to the more promising candidates.Care must be taken to ensure the models used in the early stages are accurate enough to identify and select the relevant structures for later-stage renement.For example, conventional off-the-shelf force elds are oen not reliable enough for CSP.Section 3.2 will discuss how more customized potentials are oen used instead.
Much of the current success in CSP stems from the widespread adoption of density functional theory for late-stage renement and ranking.In the 4th Blind Test of CSP, a DFT-D-driven approach was the rst to correctly predict the crystal structures of all the target molecules. 78The development of accurate, non-empirical, and computationally efficient van der Waals dispersion corrections for DFT, 79 such as D3 and D4, [80][81][82] many-body dispersion (MBD), [83][84][85] and the exchange-hole dipole Fig. 2 A typical hierarchical crystal structure prediction approach might (1) generate and rank large numbers of candidate structures with an inexpensive force field, (2) refine many of the most promising structures with some method of intermediate accuracy and computational cost, (3) perform dispersion-corrected DFT refinement on a few hundred structures, and (4) perhaps end with free energy calculations on a small number of structures.
moment (XDM) model, 86 has been particularly important.][95][96][97][98] While many CSP studies nalize their predictions with DFT-D structures and lattice energies, others proceed further to consider nite-temperature free energies.Surveys of small molecule crystals have found that vibrational free energy contributions change polymorph stability orderings for ∼10-20% of molecules at room temperature, 45,47 though the differences between lattice energies and free energies can increase for larger, more complex systems due to conformational exibility or disorder.For simpler molecules, harmonic, quasi-harmonic, and/or other simplied anharmonic treatments capture the vibrational free energy contributions reasonably well.On the other hand, molecular dynamics-based approaches are potentially superior for describing more complex crystals, assuming a suitably accurate potential energy model.Such techniques will be discussed further in Section 3.2.Overall, the combination of accurate DFT-D models and (sometimes) vibrational free energy contributions frequently leads to successful crystal structure predictions, as demonstrated for many Blind Test targets 91,92,94,95,97 and for examples that will be discussed in Section 4.
In the end, performing a CSP produces a crystal energy landscape (Fig. 1), which is the set of predicted crystal structures and their relative lattice energies or free energies.Crystal energy landscapes at 0 K are oen plotted as lattice energy versus crystal density, both because van der Waals forces generally favor more dense crystal packing motifs and because a scatter plot facilitates visualization of the large number of predicted structures.In some cases, one may simply wish to identify the most stable crystal structure(s).However, consideration of the full crystal energy landscape can provide valuable insights into the crystallization behaviors of a species 22,23 or help elucidate crystal structure-property relationships for materials design.Before discussing such applications in Section 4, we discuss several areas where methodological developments are actively underway.
3.2 Areas of active methodological developments 3.2.1 Improved models for early-and intermediate-stage structure renement and ranking.In a hierarchical crystal structure prediction such as Fig. 2, the late-state DFT-D structure renement and ranking typically consumes a large fraction of the total computational cost.Because the lower-cost intermediate stage models are generally less reliable than DFT-D, it is common to carry a relatively large number of structures forward to the DFT-D renement to reduce the risk that an important structure is discarded early on (as happened in some cases during the sixth Blind Test 21 ).Unfortunately, performing DFT-D renement on many structures is computationally expensive.Therefore, the total computational cost of the CSP can potentially be reduced by improving the quality of the early/ intermediate ltering model(s) so that fewer structures need to be carried forward to the DFT-D stage.
A number of strategies are currently being used to achieve this.One very successful approach involves parameterizing tailor-made force elds for each system based on DFT-D calculations. 99The force elds oen employ fairly standard functional forms, with terms describing the intramolecular geometry, short-range intermolecular repulsion, long-range London dispersion, point-charge or multipolar electrostatics, and sometimes induction/polarization, but system-specic parameter tuning achieves higher accuracy than could typically be obtained with off-the-shelf force elds.
Multiple force elds can be tted to different subsets of data to predict and score structures independently, thereby potentially increasing the extent of the crystal packing space searched and providing insight into the uncertainties in the models. 72Moreover, as the CSP proceeds, the force elds can be reparameterized iteratively based on the results of DFT-D structure renement as well as monomer/dimer quantum mechanical benchmarks (Fig. 3). 72Iterating the force eld parameterization toward self-consistency with DFT-D helps ensure the search is performed with near-DFT-D quality.This iterative process also produces a more robust force eld that can be used to evaluate nite-temperature free energy corrections. 48Machine learning potentials represent a natural extension of this idea. 44,100,1013][104][105][106][107][108][109] These can be further combined with D-ML, in which an ML model is trained to correct a simpler model up toward the quality of a more expensive one.Species-specic D-ML models have been used in CSP to correct semi-empirical density functional tight binding (DFTB) toward the accuracy of hybrid functional DFT-D, 110,111 or to correct GGA functionals up to hybrid functional DFT-D or correlated wave function methods. 112,113inally, ab initio force elds tted to symmetry-adapted perturbation theory (SAPT) 114,115 have also improved considerably.SAPT calculations naturally decompose the different types of intermolecular interactions (electrostatics, exchangerepulsion, etc.), which can be used to help ensure physicallysensible parameter ts in the potentials.Successful SAPT potentials could already be found in the literature 15 years ago, [116][117][118][119][120] but the algorithms and protocols have now matured to enable highly-automated tting for organic molecules with modest conformational exibility. 121,122A recent study 123 of een organic molecules found that this approach placed the experimental structure within the top 10-20 structures (and oen in top 5).Subsequent DFT-D renement of the top 20 structures generated by these potentials for each species ranked the experimental structure as the most stable one in every case.Thus, these potentials are very accurate on their own and can provide an excellent short-list of candidate structures for subsequent renement with fully quantum mechanical approaches.The biggest outstanding question is how efficiently these tting algorithms can be generalized to highly exible molecules 3.2.2Addressing DFT delocalization error in crystal structure prediction.Many CSP successes rely on dispersioncorrected DFT functionals.Commonly-used GGA and hybrid density functionals balance accuracy and computational cost and usually enable reliable renement and ranking of hundreds of crystal structure candidates.However, approximate density functionals generally suffer from delocalization error (a.k.a.many-body self-interaction error), 124,125 which manifests as a spurious tendency to prefer overly delocalized electron densities.Delocalization error leads to systematic errors such as the underestimation of band gaps, underestimation of chemical reaction barriers, erroneous spin state energy differences, over-estimation of hydrogen bond strengths, and problematic conformational energies.
The impacts of delocalization error in CSP were rst highlighted by Johnson and co-workers in the context of reanalyzing the conformational energies in candidate structures for Blind Test molecule X, 92 the lattice energies of halogen bonded crystals, 127 and, most dramatically, by showing how it could spontaneously convert neutral acid-base co-crystals to their charged salt forms. 1283][134][135][136][137] All of these systems have crystal structures which differ in the extent of p conjugation, either due to changes in the intramolecular conformation (conformational polymorphism) or chemical reactions that convert sp 2 -hybridized atoms to sp 3hybridized ones (e.g.cycloaddition reactions).Fig. 4 shows how DFT delocalization error over-stabilizes the more planar conformations of the ROY molecule, 126,138,139 the impacts of which will be discussed further in Section 4.3.
Delocalization error is particularly pronounced in GGA functionals such as PBE.Hybrid functionals such as PBE0 help mitigate the impacts of delocalization error, 96,97 though the necessary amount of exact exchange needed can vary. 97,126ecause the impacts of delocalization error on conformational energies are intramolecular in nature, 126,140 an alternative strategy can be to perform a simple conformational energy correction, that computes the DFT crystal energy E DFT crystal and replaces the DFT-D intramolecular energy E DFT intra with one computed using a more advanced model that is free of delocalization error, E High intra , such as correlated wave function methods 126 advanced density functionals, or even density-corrected DFT. 141.2.3Improved treatment of nite-temperature free energies.Switching the focus from 0 K lattice energies (E) to nitetemperature Gibbs (G) or Helmholtz (F) free-energies, can be important for making real-world predictions about the most stable polymorphs, polymorph phase transitions, the Fig. 4 Delocalization error in GGA and hybrid functionals such as PBE and PBE0 leads to over-stabilization of more planar conformations of the ROY molecular relative to those with a dihedral angle closer to 90°, as compared to high-level coupled cluster benchmarks. 126This impacts the predicted crystal energy landscape, as will be discussed in Section 4.3.

Chemical Science Perspective
formation of hydrates as a function of humidity, etc.The simplest approximation for these effects involves computing the static harmonic Helmholtz vibrational free energy contributions F vib via lattice dynamics.The quasi-harmonic approximation 142,143 renes the treatment further by approximating how the phonons and F vib contributions change as a function of unit cell volume, which is especially important for the low-frequency modes. 144The quasiharmonic approximation enables predicting the temperaturedependent thermal expansion of the crystal lattice up to moderate temperatures, leading to improved-quality structures, [145][146][147] thermochemical properties, 148-151 spectroscopy, [152][153][154] and even polymorph phase diagrams. 152,155,156attice dynamics calculations are considerably more expensive than computing the energy, particularly due to the need to capture phonon dispersion.For this reason, they are typically computed with relatively inexpensive DFT-D functionals.A multi-level approach that combines a higher-level treatment of the phonons in the crystallographic unit cell with a lower-cost treatment in the supercell can reduce the costs further. 157,158lthough the quasi-harmonic approximation improves the description of lower-frequency modes, it does not address anharmonicities in the higher-frequency modes that are insensitive to the lattice parameters. 144One simple approach for those phonon modes employs a 1-D anharmonic model to improve the description of each individual mode. 94,95Vibrational self-consistent eld calculations can capture anharmonicity more fully, 159,160 albeit at signicantly higher computational cost.
Alternatively, molecular dynamics (MD) techniques can improve upon these static lattice dynamics approaches.MD simulations naturally capture anharmonicities. 144Moreover, the nite-temperature dynamics will sometimes sample multiple minima on the potential energy surface, capturing contributions which would be missed entirely by (quasi-)harmonic models. 161,162MD approaches are also inherently better-suited to describing dynamically disordered crystals.
One successful MD approach employs a pseudo-supercritical path approach to relate the free energies of the crystal polymorphs to that of an Einstein crystal reference state. 49,163,164For example, a CSP study of the polymorphs of drug candidate oxabispidine found that the form A was several kJ mol −1 less stable than form B, contrary to experimental observations.However, applying this free energy correction approach on top of the 0 K lattice energy predictions demonstrated enantiotropic relationship, with form A becoming the thermodynamically preferred form near ambient temperature (Fig. 5). 30Beyond classical molecular dynamics, path integral studies have shown that nuclear quantum effects can also be important for determining the relative polymorph stabilities in aspirin 165 and ices. 166he biggest challenge with MD approaches is the need for extensive sampling, which means that ab initio MD simulations are extraordinarily expensive computationally-e.g.∼2 million central processing unit (CPU) hours for paracetamol. 165Therefore, inexpensive energy potentials must be used in practice.As noted before, standard force elds will frequently lack the requisite accuracy needed for CSP applications.However, goodquality tailored force elds and machine learning potentials being tted as part of the search process (as described above) can also be used for the free energy simulations. 48,101Reweighting strategies that map from a low-cost free energy simulation to a higher-level one with only moderate sampling at the high level are also possible. 163,164D free energy approaches have benets beyond simply predicting polymorph stabilities.Molecular crystal free energy landscapes tend to be smoother than lattice energy ones, with multiple lattice energy minima separated by small barriers coalescing into a single free energy well at nite temperatures.This feature enables reducing the number of predicted structures on a crystal energy landscape or even searching for crystal structures directly on the free energy landscape (see Section 4.9).

Selected applications at the frontiers of crystal structure prediction
Having discussed some of the model features that lead to successful crystal structure prediction, we now focus on case studies that demonstrate the range and capabilities of presentday CSP.These examples were chosen to highlight the diverse ways in which CSP can complement experiment across a broad range of organic materials, rather than aiming for a comprehensive review of the literature.

Pharmaceutical solid-form screening: rotigotine and galunisertib
Choosing a suitable solid form for manufacturing is an important step in pharmaceutical formulation.Researchers

Perspective
Chemical Science desire crystals with suitable solubility proles, mechanical properties, and stability.They want to avoid the surprise, latestage appearance of new polymorphs with undesirable properties, such as those which necessitated the recall and reformulation of ritonavir 167,168 and rotigotine. 169,170The risks are signicant: it has been estimated that the most stable crystal form has not yet been discovered experimentally for some ∼15-45% of small-molecule pharmaceuticals. 28By providing a detailed understanding of the crystal energy landscape, CSP can complement experimental solid-form screening and help manage the risks of late-appearing polymorphs in pharmaceutical development. 22,23,171onsider two examples: rotigotine and galunisertib.Transdermal rotigotine patches are used to treat Parkinson's disease and restless leg syndrome.In 2008, the unexpected appearance of snowake-like and highly insoluble crystals of a new crystal polymorph (form II) on the patches led to a major recall and restrictions on the drug in Europe, and its complete withdrawal from the U.S. market. 170It took four years to reformulate the patches and return them to the U.S. market. 169lthough CSP techniques were less mature in 2008, recent work by Mortazavi et al. 172 demonstrates how modern-day CSP techniques could have anticipated form II rotigotine.Starting from only the 2-D molecular structure of rotigotine, they employed a mixture of tailor-made force elds (tted against DFT-D calculations), dispersion-corrected DFT, and harmonic vibrational enthalpy/free energy contributions to predict the most stable crystal structures of rotigotine, including both forms I and II (Fig. 6a).Their models indicate that form II is 7.6 kJ mol −1 more stable than form I, in exceptional agreement with the 7.5 kJ mol −1 measured experimentally (such excellent agreement probably reects some fortuitous error cancellation).Today, a CSP prediction of a new polymorph that was so much more stable than the known form would warrant signicant concern and would motivate further experimental screening efforts.Moreover, the higher packing density predicted for form II also would suggest that high-pressure experiments might facilitate its crystallization.In fact, similar CSP insights motivated the high-pressure crystallization experiments that discovered new polymorphs of dalcetrapib 27 and iproniazid, 171 as will be discussed in Section 4.9.
Overall, rotigotine has a sparse crystal energy landscape, with only two predicted crystal structures other than forms I and II in the low-energy (10 kJ mol −1 ) region.One of those has stability intermediate between forms I and II.This putative "form III" has never been observed experimentally, and perhaps further investigations are warranted.
Whereas the CSP of rotigotine was performed long aer its behavior was understood experimentally, CSP was directly integrated into the solid-form screening process for galunisertib. 26This drug candidate for metastatic malignant cancer 173 has a complicated solid-form landscape: ten neat polymorphs and over 50 crystalline solvates have been discovered to-date.Its propensity for solvate formation complicated the experimental search for neat polymorphs, and a CSP was performed to identify any potentially important missing forms.The CSP revealed hundreds of potential crystal structures in the 10 kJ mol −1 energy window (Fig. 6b).Such densely populated crystal energy landscapes are unfortunately more typical for pharmaceuticals than the sparser rotigotine one.
The initial CSP for galunisertib predicted seven of the ten of the polymorphs eventually found experimentally, but it missed the remaining three due to search constraints that had been imposed to expedite the CSP.As crystal forms lying outside the initial CSP search space were discovered experimentally, a second, broader CSP was performed using techniques very similar to those for rotigotine.This second landscape successfully predicted all experimentally-discovered polymorphs.
The galunisertib CSPs helped solve the crystal structures of forms VII and VIII.Experimental difficulties obtaining pure crystals of these forms complicated the powder X-ray diffraction patterns, but the structures were eventually solved using  26 Red points indicate experimentally-observed polymorphs.For rotigotine, a pair of static structures was identified for each of forms I and II which correspond to the two possible conformations of the disordered thiophene ring.The structures labeled "form III" for rotigotine and "GM" for galunisertib have not yet been found experimentally.

Chemical Science Perspective
comparisons against simulated powder diffraction patterns computed on candidate CSP structures.On the other hand, both CSPs predict that the most stable "global minimum" crystal structure has not yet been found experimentally, despite extensive efforts.This unrealized global minimum structure highlights two potential issues in CSP which will be discussed later: when are the accuracy limitations of widely-used DFT-D models problematic (Section 4.3)?When are predicted crystal structures actually crystallizable (Section 4.9)?

Addressing the further complexities in pharmaceutical crystal structure prediction
With mounting numbers of successful polymorph predictions for neat pharmaceuticals, CSP techniques are increasingly being applied to more complicated aspects of pharmaceutical formulation, 23 including disorder and multi-component hydrates, solvates, and co-crystals.Cases such as the experimental cancer drug gandotinib, 25 with its multiple hydrates, disorder, and difficulties crystallizing various forms exemplify the real-world complexities of pharmaceutical solid-form landscapes.Consider rst disorder, which is present in ∼20-25% of crystal structures. 50Static disorder results from molecules adopting a statistical distribution of different congurations or orientations in the lattice, while dynamic disorder is associated with the nite-temperature motions of molecules in the crystal.The distinction between the two types of disorder is not always sharp, however, and it can even vary with temperature. 174Both types of disorder can stabilize a crystal structure entropically.][177] To obtain more quantitative results, disorder needs to incorporated into the models.Dynamic disorder can potentially be described via molecular dynamics simulations, 49,161,175 for example, while a symmetry-adapted ensemble model which includes weighted energy contributions from all the congurationally unique structures is oen used to treat crystals with static disorder. 178Such descriptions are considerably more computationally demanding than conventional static structure models, unfortunately.A symmetry-adapted ensemble for a system with N disordered sites having two possible states each requires evaluating the energy for 2 N possible congurations, though symmetry reduces the number of unique congurations in practice.
Accounting for the effects of disorder can be important.A CSP study on the antihistamine medication loratadine, 50 for example, found multiple crystal structures corresponding to different components of the disorder.The initial landscape suggested that form I was relatively high in energy compared to other predicted forms.Form I became the most stable form only aer it was modeled with a symmetry-adapted ensemble.Similarly, the initial predicted crystal energy landscape of gandotinib suggested that the most stable crystal polymorph had not yet been found experimentally.However, accounting for the disorder in form I made it isoenergetic to the predicted global minimum structure. 25lti-component crystals are also extremely common in pharmaceuticals.Incorporation of water or other solvent molecules into a molecular crystal structure occurs frequently.In other cases, the active pharmaceutical ingredient (API) is deliberately crystallized as a salt (e.g. with hydrochloride) or with inactive co-formers to improve their solid form properties. CSP of multi-component systems can be considerably more difficult than single-component systems.][185] Despite these challenges, clear progress is being made.A number of successful hydrate predictions have been performed, 187,188 including ones that predicted the correct stoichiometries. 185The use of free energy calculations to compare the stabilities of different co-crystal stoichiometries has also been demonstrated. 31Data-driven algorithms can identify plausible locations for water molecules within an anhydrous crystal structure, enabling a high-throughput screen of potential hydrates from an existing CSP landscape. 189Separately, Dybeck et al. impressively demonstrated that with the help of one experimentally determined co-existence point, the phase boundary between anhydrate and hydrate forms could be predicted as a function of temperature and relative humidity to within 10% relative humidity of experiment (Fig. 7). 186An example of cocrystal stoichiometry prediction was also included in the recent 7th Blind Test, with results to be published soon.
For an example of a successful CSP applied to a multicomponent salt crystal, consider the recent studies of the sleeprelated drug candidate B5. 190 Whereas the neutral form of B5 has just one important crystal form, 190 understanding the solidform landscape of its hydrochloride salt B5HCl proved much more difficult. 177Extraordinary experimental effort was required to uncover two neat polymorphs of B5HCl, a dihydrate, and 11 alcohol solvates of B5HCl.The concurrent CSP study made Fig. 7 Predicted phase-boundary between hydrate and anhydrate forms of three drugs as a function of temperature and relative humidity.They show nearly quantitative agreement between theory (red lines) and experimentally-derived coexistence points. 186everal contributions to the eventual understanding: It highlighted the stability of form I, which helped explain its insolubility in various solvents and the difficulties in producing other crystal forms experimentally.It showed that the experimental forms discovered included examples of all major packing motifs found on the computational landscape, suggesting that the experimental screen was suitably complete.Moreover, the large number of closely related crystal structures on the computed landscape also pointed to the likelihood of disorder, especially for one particular conformation of the B5H + molecule.This helped rationalize the experimentally-observed disorder and difficulty in growing crystals that were suitable for diffraction.
Finally, a typical solid-form screen might consider multiple different possible co-formers, potentially multiplying the number of CSPs that may need to be performed.Sugden and coworkers recently demonstrated one clever approach for simplifying this task. 191Their standard CSP approach employs pre-tted local approximate potentials to describe important intramolecular conformational exibility in their molecules, [192][193][194] and generating those models from quantum mechanical calculations requires non-trivial effort.However, by creating a library of these conformational energy models for commonly-used co-former species in advance, they can quickly run a CSP for a given API with a whole suite of potential coformers.Aer tting the local approximate models for the new API, they can run a CSP to screen each API + co-former combination in just ∼2-3 days on a moderately sized cluster.Testing on three different drug molecules found that these relatively fast CSPs proved sufficiently accurate to rule out coformer candidates that were unlikely to form co-crystals experimentally, even if additional effort might be to rene the predictions for the most promising co-formers.

ROY and the impacts of DFT delocalization error
While CSP is increasingly successful, factors such as DFT delocalization error can still lead to incorrect predictions.The ROY molecule, so named for its vibrant red, orange, and yellow crystals, 195 is a classic example of polymorphism and holds the current world record with 12 fully-characterized polymorphs, [196][197][198][199][200][201][202][203][204] plus a thirteenth incompletely characterized form. 139,2051][202][203][204] Despite the importance of this system to the eld of polymorphism, predicted crystal energy landscapes of ROY were highly inconsistent with experimental polymorph stabilities rankings 139,200,206 until very recently.The conformational exibility of the ROY molecule (Fig. 8) is the primary factor behind ROY's colors [207][208][209][210] and its propensity for polymorphism, but it also caused problems for CSP.][140] These systematic biases found for GGA and hybrid density functionals 140 can be larger than the total experimental energy range spanned by the polymorphs. 195rtunately, correcting the ROY intramolecular conformational energy contribution to the lattice energy using correlated wave function methods 126,211 or density-corrected DFT 141 dramatically improves the crystal energy rankings relative to experiment. 129As shown in Fig. 8, the resulting landscape reveals that the nine of the twelve lowest-energy candidate structures on the ROY landscape have already been crystallized experimentally.The four higher-energy forms (including the proposed-but-unconrmed structure of the RPL polymorph 139 ) are known to be metastable and/or were difficult to crystallize, suggesting that their less stable lattice energies are plausible.Interestingly, the calculations also suggest that the rank #15 structure on the landscape becomes the most stable form at high-pressure.While previous experimental high-pressure studies have not discovered any new polymorphs, 210,212 this prediction suggests further efforts to produce high-pressure forms may be worthwhile.
Overall, the ROY system highlights how, despite many successful DFT-driven structure predictions, there are cases where the most frequently-used DFT-D functionals are inadequate.Similar problematic DFT delocalization error issues occur with conventional DFT functionals for the anti-cancer drugs galunisertib (Section 4.1) and axitinib, 126 the photomechanical materials discussed in Section 4.5, and a number of other examples mentioned in Section 3. Fortunately, these errors can be overcome through intramolecular energy corrections of the sort used for ROY or the selection of a suitable density functional. 97

Discovery of new porous organic materials
Porous materials are useful for gas storage and separations, but rationally designing porous organic crystals is difficult.Beyond the usual difficulty in intuiting the relationship between Fig. 8 After addressing the DFT delocalization error issues, the predicted crystal energy landscape of ROY shows that the lowest-energy polymorphs have already been discovered experimentally (red).Interestingly, the hypothetical rank #15 structure in blue is predicted to become the most stable structure near 10 GPa.

Chemical Science
Perspective molecular structure and crystal packing, porous organic crystals are exceptional because they counter the general thermodynamic preference toward dense crystal packing motifs.However, Pulido et al. 213 demonstrated how CSP and energystructure-function maps could be used to drive experimental discovery of new porous organic crystals.They began by performing CSP on a series of candidate molecular building blocks.
As expected, the lowest-energy structures were densely packed and non-porous.However, they identied several interesting "spikes" higher on the crystal energy landscape which corresponded to unusually stable porous structures (Fig. 9).While these porous structures lay tens of kJ mol −1 above the global minimum-much too energetically unstable to crystallize on their own-the authors recognized that these putative porous structures would be dramatically stabilized by guest solvent molecules adsorbed within the pores.Next, they computationally characterized the "function" of every structure on their predicted crystal energy landscapestheir methane storage capacity and their potential for hydrocarbon separations.From this combined understanding of a molecule's propensity to form porous structures and the resulting functional properties, they identied the molecule T2 (Fig. 9) as a promising candidate for new experimental crystallization screening studies.They discovered three new porous polymorphs of T2 in addition to a previously known one.The new g form of this molecule set a record for ultra-high-surfacearea organic materials (3425 m 2 g −1 ).The experiments conrmed several predicted properties of these crystals, including surface area, gas storage capacities, and some gas separation properties.In select cases, however, adsorption of larger molecules during the gas separation testing induced experimental phase transitions that would have been difficult to anticipate computationally.Nevertheless, this study highlights how structure prediction and energy-structure-function maps can drive experimental discovery. 214Subsequent applications of these or similar concepts to porous materials 8,215-217 and organic semi-conductors [218][219][220][221] have further conrmed the role of CSP in the design and discovery of new organic materials.
While CSP oen strives to accurately predict the most stable crystal structure using high-quality energy models, such detail is not always required to establish useful design principles for organic materials.Following on work that showed how the combined efforts of computation and experiments could lead to the rapid discovery of new porous organic cage materials for gas separation and storage applications, 222,223 Wolpert and Jelfs demonstrated that a simple coarse-grained model could give meaningful insights into how porous organic cage molecules pack in the solid state. 224Rather than perform traditional, expensive CSP on each new large organic cage molecule, their simplied model represents the cages as octahedra with "patches" on each face that distinguish between whether they contain either an arene group or an open pore.Aer expressing the intermolecular interactions between patches via a simple model Hamiltonian, they explored the types of packing motifs that were preferred across a range of interaction parameter strengths.They developed a general understanding for how the chemical features of the cage molecules translate into the resulting crystal packing.Moreover, they can determine the key patch parameter values using just gas-phase dimer DFT-D calculations, and then predict the preferred crystal packing for a given species.The simplicity of this approach makes it highly amenable to high-throughput screening.Coarse-grained approaches like this can signicantly narrow the molecular search space for new functional materials before performing more detailed CSP studies or experiments.

Establishing design principles for organic photomechanical engines
1][12] Particular interest lies in harnessing these structural transformations to do mechanical work, such as for light-driven actuators [225][226][227][228][229][230] or locomotion. 231,232Designing such materials requires understanding how molecular structure translates to crystal structure, how the crystal deforms due to the solid-state chemical reaction, and what anisotropic work is performed by that deformation.Characterizing these transformations experimentally is frequently challenging, since the solid-state chemical reactions may be incomplete, may be carried out in nanocrystals instead of bulk single crystals, and oen produce short-lived and highly metastable polymorphs.
These systems therefore represent an excellent opportunity for rst-principles structure prediction techniques.Until recently, the use of CSP in understanding solid-state reactions has been rare. 233We have now shown how CSP can be used as part of a strategy to predict or even design large photomechanical responses. 135The approach starts by predicting the crystal structures of the reactants and products (Fig. 10a).However, the bigger challenge lies in determining which of the many predicted product crystal structures is relevant.Because the solid-state reaction generates the product molecules within the crystal packing of the reactant, the product polymorph is typically thermodynamically metastable and oen lies outside the typical ∼10 kJ mol −1 energy window associated with viable polymorph crystallization.Since CSP normally focuses on identifying the most stable crystal form(s), we instead apply topochemical principles to predict the solid-state transformation (Fig. 10b) and to identify the correct structure on the crystal energy landscape (Fig. 10a).
Once the structural transformations are known, the amount of work performed can be predicted.In the spirit of gas heat engines, we dened an idealized photomechanical engine cycle 135 that enables computing the maximum work potentially performed by a given solid-state reaction (Fig. 10c).The engine model assumes the reaction occurs quickly and completely, thereby generating the product within the unit cell parameters of the reactant.Relaxation the crystal relieves the internal stress, deforms the crystal, and performs work.
Studies of solid-state [4 + 4] anthracene photodimerization 135,136 and diarylethene ring opening and closing 137 have revealed a number of important insights and design principles for organic photomechanical materials.First, the unique combination of high elastic modulus and large strains means that photomechanical organic crystals exhibit exceptional theoretical work densities up to at least 200 MJ m −3 .If these could be realized experimentally, they would be several orders of magnitude larger than the work densities of elastomers or inorganic piezoelectrics.Second, the crystal packing proves crucial: for one diarylethene derivative studied, the maximum anisotropic work density differs 40-fold between two crystal forms.Based on the modest number of cases studied thus far, the range of work capacities across different polymorphs is broader than the range found for different photochromic species/reactions.This suggests that researchers should increase their emphasis on crystal engineering in selecting their target photochrome, especially since packing has a much larger impact on the mechanical response than do minor photochrome modications (e.g.halogenation). 136arallel alignments of the molecules in the crystal generally produce the especially large anisotropic deformations and work densities.Finally, the research has identied how the "memory" of the reactant crystal packing throughout the photochemical engine transformations biases them to produce greater work in the forward stroke direction than in the reverse one, enabling net work to be accomplished.
The need to understand solid-state chemical reactions extends beyond photomechanical systems.For example, solidstate photochemical degradation is a signicant issue for pharmaceuticals, and crystal packing impacts photostability. 2346][237] Solid-state oxidations, reductions, isomerizations, bonds formations/cleavages, and many other reactions are also possible. 238Inducing solid-state reactions via mechanical grinding or milling (mechanochemistry) creates further synthetic possibilities. 239,240CSP can help engineer crystal packing motifs that either facilitate or inhibit solid-state reactions.

Rational design of organic semi-conductors
The ultimate promise of structure prediction lies in the complete rational design of new materials, starting from the molecular building blocks.Because the relationships between molecular structure and materials properties frequently cannot be intuited a priori, this rational design will likely require highthroughput crystal structure prediction to map these relationships across many candidate species.
True CSP-driven rational design of organic materials is still in its infancy, but Cheng et al. provided an intriguing peek at the future possibilities with their evolutionary exploration of

Chemical Science
Perspective organic semiconductor materials. 220Building on earlier work that had manually examined small numbers of species, 218,219 they searched a chemical space of ∼68 000 aza-substituted pentacene isomers with (nearly) arbitrary connections between the ve aromatic rings.Predicting crystal structures for every one of these molecules would be utterly impractical.Instead, they rst screened candidate species using simple molecular-based tness functions optimized via an evolutionary algorithm.Next, crystal structure prediction was performed on the most promising molecular structures.In any medium-or highthroughput scenario, the models used to predict crystal structures will likely sacrice some accuracy to achieve faster throughput.As a result, the predicted global minimum crystal structure for each species is less likely to correspond to the actual experimental crystal structures.To compensate, they evaluated the performance of each species by computing both the electron mobility of the most stable structure and a Boltzmann-weighted mean/standard deviation electron mobility across all low-energy candidate crystal structures.This latter metric identies species whose crystal structures have consistently good mobilities, rather than those that happen to have a high mobility for one particular polymorph which may or may not occur experimentally.
In the end, the strong dependence of mobility on crystal packing meant that many candidates exhibited overlapping landscape-averaged mobilities and deviations (Fig. 11).On the other hand, the best candidate identied here proved to have only a single, low-energy structure which also had high mobility, thereby providing condence that the predicted global minimum structure would likely match experiment.Overall, the evolutionary algorithm search identied candidates with good landscape-average and global minimum mobilities.The species identied by the evolutionary algorithm compared fairly well against several species that had previously been identied by human experts 241 who used a mixture of computation and chemical intuition.Moving forward, incorporating solid-state properties directly into the molecular design search, rather than at the end, will likely to prove important for design of materials whose properties are very sensitive to crystal packing.
Separately, a CSP study correctly predicted the experimental crystal structure for chiral [6]helicene and rationalized the preference for the enantiopure crystal over the racemate. 221er this study demonstrated good but crystal packingdependent semi-conducting properties, a follow-up CSP study then examined the impact of derivatizing [6]helicene molecules.Dimer screening was used to investigate over 1300 substituted helicenes, aer which CSP was performed on the most promising candidates.In the end, they identied a set of derivatives which had predicted electron mobilities three times that of a previously characterized helicene. 242

NMR crystallography
The combination of crystal structure prediction with spectroscopic experiments can be particularly helpful in solving challenging crystal structures.One can frequently identify the experimental crystal structure by predicting candidate crystal structures, simulating the relevant spectroscopic observables for each one, and comparing the results with experiment.While this general idea has been applied to various spectroscopic characterization experiments, including powder X-ray diffraction, [243][244][245][246][247] Raman spectroscopy 153,248,249 and transmission electron microscopy, 250 nuclear magnetic resonance (NMR) crystallography represents the most widely-used combination. 39,251,2524][255][256][257][258][259][260][261][262] One appealing feature of these approaches is that the spectroscopic observables provide a second metric for assessing candidate structures that is "orthogonal" to the lattice energy typically used in CSP.NMR crystallography can frequently solve the experimental crystal structure even when that structure is not the most stable one predicted by the CSP.Unsurprisingly, models which predict chemical shis more accurately [263][264][265][266] enhance the discrimination between correct and incorrect structural assignments in NMR crystallography. 267Further synergies between experiment and computation are also possible in NMR crystallography.For example, structural constraints inferred from the NMR experiments can accelerate the structure determination by reducing the size of the CSP search space. 258achine learning models for predicting chemical shis [268][269][270][271][272][273][274][275][276][277][278][279][280][281][282][283][284] promise to accelerate NMR crystallography even further.The traditional NMR crystallography protocol involves rst predicting a set of candidate crystal structures, computing the NMR spectra for each one, and comparing them against experiment, Fig. 11 Predicted electron mobilities for various organic semi-conducting material candidates discovered through an evolutionary search. 220Red dots indicate the mobility for the most stable predicted structure of each species, while blue dots/error bars indicate the landscape-averaged mean mobility and standard deviation for the ensemble of low-energy structures.Three molecules with the highest landscape-averaged mobilities are shown.

Perspective
Chemical Science all of which can require substantial computational effort.Recently, however, Balodis et al. demonstrated how NMR crystallography can directly rene the crystal structure against the experimental solid-state NMR spectrum. 261The authors used Monte Carlo techniques to optimize an objective function which combined a weighted mixture of the structure's lattice energy and the error between its computed chemical shis and experiment.These quantities can be evaluated inexpensively by using density functional tight binding (DFTB) for the energies and the machine-learning ShiML model for the chemical shis. 272,273Minimizing this objective function successfully produced the correct structures for several difficult crystals, despite the "moderate" accuracies of DFTB and ShiML compared to DFT.The ability to rene crystal structures directly from NMR will be especially benecial for large, highly exible molecules, for which traditional crystal structure prediction can be very expensive.For example, Balodis et al. solved the crystal structure of ampicillin (Fig. 12), 261 despite the molecule adopting a high-energy intramolecular conformation in the solid-state that could easily be missed in a typical CSP search protocol.Separately, the rapid chemical shi prediction enabled by ShiML also facilitated the structural characterization of amorphous drug candidate AZD5718. 260Advances like these are likely to signicantly increase the effectiveness of NMR crystallography over the next few years.

Predicting solubility and mechanical properties
As crystal structure prediction becomes more routine, the focus will increasingly shi to computing the chemical and physical properties of the predicted crystals.In many cases, these properties will depend sensitively on the 3-D structure, making it important to account for the impacts of nite temperature on crystal structure and stability. 40Examples of computing properties related to gas separation and storage, photomechanical response, semi-conducting, and spectroscopy have already been discussed.Here, two more relevant properties are briey discussed.
First, the solubility of organic species matters in many chemical applications, but it is especially signicant in the pharmaceutical industry, where a large fraction of drugs in development suffer from low solubility.Because experimental measurements of solubility are time-consuming and resource-intensive, there has been considerable interest in predicting solubilities computationally.7][288][289] Although such models can be very effective, they typically omit explicit solid-state contributions and therefore cannot account for how changes in crystal packing will impact solubility, for example.
There are on-going efforts to develop accurate physics-based models which could overcome this limitation.While some approaches simulate the solid-liquid coexistence directly, 290 the more common approach employs a thermodynamic cycle that expresses the Gibbs free energy of dissolution from the free energies of sublimation and solvation, The dissolution free energy can then be related to the solubility S 0 via the temperature T, molar volume of the crystal V m , and the ideal gas constant R. The sublimation free energy can be computed from periodic DFT calculations on the crystal lattice (and either approximating 291,292 or explicitly computing the phonon contributions 293 ) or MD simulations. 294The solvation free energy can be computed inexpensively via an implicit solvent model 292 or more elaborately with explicit MD simulations.While developments in this space continue apace, increasing reliance on higher-quality quantum mechanical models in computing the free energy contributions is bringing the errors to accuracies that are already approaching the best informatics models.
Predicting relative solubility difference between two crystal forms is arguably easier, since that requires only the free energy difference between the two solid forms, avoiding the need to compute solvation free energies.For the drug rotigotine, for example, Mortazavi et al. predicted an 8.3-fold difference in solubility between the two polymorphs with DFT-D, in excellent agreement with the 8.1-fold difference measured experimentally. 172econd, knowledge of the mechanical properties of a material provides valuable insights into its durability and potential applications. 295Predicting the elastic constants enables one to screen materials in silico or to link features of the crystal packing to its mechanical response properties.For example, elastic constant predictions can be used to rationalize differences in how pharmaceuticals behave under tableting conditions.They helped explain the better tableting properties of paracetamol form II 296 and several co-crystals 297 compared to form I, of oxalic acid dihydrate over anhydrous oxalic acid, 298 and of co-crystals of celecoxib. 299They showed that form II aspirin is mechanically stable, 300 contrary to an earlier suggestion based on nano-indentation experiments that did not fully characterize the anisotropy of the crystal.Elastic calculations also helped conrm and rationalize the surprisingly large Young's moduli of amino acid crystals 301 and a nucleic acidbased supramolecular assembly, 302 and they gave insights into the negative linear compressibility in several organic acids. 303ee a recent review for additional details. 295g. 12 Overlay of the ampicillin crystal structures determined from Xray diffraction experiments 285 (blue) and the direct NMR crystallography solution (red) which employed a combination of DFTB energies and ShiftML chemical shifts. 261arly force-eld predictions demonstrated the ability to compute elastic constants within ∼40-50% of experiment, 304 with the neglect of thermal expansion of the crystal being a signicant source of error.The accuracy with which these properties is predicted can be improved using quantum mechanical treatments and by accounting for thermal expansion (since mechanical properties are sensitive to molar volume).For example, the combination of accurate electronic structure methods and the quasi-harmonic approximation 143,305 enables quantitative prediction of the mechanical properties of simple compounds such as carbon dioxide 147 or deuteroammonia. 93Such techniques are also quite effective in organic compounds such as urea, 145 organic semi-conductors, 306 and energetic materials. 307While DFT-D has become the most commonly-used approach, good-quality elastic constants can be obtained at lower computational cost.Spackman et al. 308 curated a large data set of experimental elastic constants, demonstrated good consistency between experiment and the semi-empirical s-HF-3c model, and they even identied several suspicious experimental measurements based on large discrepancies between the experimental and computed results.
4.9 Can the predicted structures actually be crystallized?
The computational prediction of a new polymorph can sometimes drive its subsequent experimental discovery, as exempli-ed in the case of the cholesteryl ester transfer protein inhibitor drug Dalcetrapib. 27CSP on this drug correctly predicted the experimentally known form A and B polymorphs, which are closely related via a reversible temperature-dependent orderdisorder phase transition.Interestingly, however, it also predicted another experimentally-unknown packing motif that lay very close in energy to form B. Motivated by lattice energy calculations that predicted this new structure to become more stable at pressures above ∼0.2GPa, high-pressure crystallization experiments led to the discovery of a new form C. In the end, this new polymorph proved unstable at ambient conditions and is therefore unlikely to impact the pharmaceutical formulation.In this manner, CSP played an important role in managing the solid-form risks for this drug.Similarly, analysis of the CSP landscape and calculations of the free energies as a function of temperature and pressure led to the experimental discovery of a high-pressure polymorph of iproniazid. 171n a third example of CSP driving discovery, a study of the crystal energy landscapes of structurally-similar tolfenamic acid, mefanamic acid, and ufenamic acid identied many thermodynamically plausible isostructural crystal forms.Based on this analysis, the authors successfully templated two new thermodynamically metastable polymorphs of tolfenamic acid using crystals and solid solutions of the other species. 309Templating experiments informed by knowledge of predicted polymorphs similarly led to the discovery of new polymorphs of carbamazepine 310 and cyheptamide, 311 along with a new cocrystal of caffeine and benzoic acid. 312hese cases all represent success stories for CSP, but there are counter-examples.The CSP-predicted global minimum energy crystal structure of galunisertib has never been found experimentally, despite years of effort. 26This is probably due to a combination of poor crystallization kinetics stemming from its very unfavorable intramolecular conformation 26 and the fact that it is not actually the most stable structure (rather, it was articially stabilized in the CSP by DFT delocalization error). 126ore generally, identifying which putative CSP structures are likely to be crystallizable proves a major challenge.CSP usually predicts far more candidate crystal structures than are ever realized experimentally. 51For some of these structures, that simply means that the proper crystallization experiment has not yet been performed.More oen, however, it reects the limitations of CSP approaches, such as the focus on thermodynamic stability instead if kinetic crystallizability, the fact that many distinct lattice energy minima coalesce into a single free-energy basin at nite temperatures, and errors in the predicted energies (e.g.due to delocalization error biases).See ref. 51 for further discussion.Additional challenges in assessing synthesizablity and the challenges associated with theory-driven discovery of materials are discussed extensively in two recent reviews by Jelfs and co-workers. 32,33o improve our understanding of which predicted polymorphs will be experimentally relevant, on-going research efforts focus on reducing the number of crystal structures on the crystal energy landscape, on strategies for identifying crystals that are likely to be crystallizable, and on predicting the thermodynamic conditions under which different polymorphs are most likely to form.In 2005, Raiteri et al. demonstrated that metadynamics simulations dramatically simplied a benzene crystal energy landscape containing tens of lattice energy minima down to just a handful of structures on the free energy surface, most of which have been observed experimentally. 313he discarded structures are either labile, converting to different forms at nite temperature, or they correspond to different static structure representations from the same dynamic ensemble.Metadynamics has similarly been applied to reduce the crystal energy landscape of pigment red 179 (ref.314)  and, with less success, to 5-uorouracil. 315Metadynamics and other enhanced sampling methods have proved effective for searching crystal energy landscapes directly, [313][314][315][316][317][318][319][320][321][322] avoiding the need for post-hoc landscape reduction.
Recent efforts to systematize landscape reduction via a combination of MD, structure clustering, and metadynamics reduced the numbers of structures on the urea, succinic acid, and ibuprofen landscapes by ∼65-90%. 323,324The ibuprofen study is particularly impressive (Fig. 13), as its landscape containing 555 crystal structures is representative of the complexity of many real-world systems.Achieving this reduction has required signicant efforts toward developing automated approaches for ngerprinting, simulating, and clustering the large numbers of structures involved.
The combination of CSP and metadynamics also helped rationalize the discrepant crystallization behaviors of two "sul-owers."Experimentally, the original sulower molecule crystallizes readily, while the structurally-related persulferated coronene forms only an amorphous solid. 325Using molecular dynamics and metadynamics, Sugden et al. demonstrated that while sulower has a number of stable low-energy crystal forms (including the experimental crystal), all 20 of the lowest structures of persulferated coronene became disordered in the dynamics simulations, consistent with its amorphous behavior experimentally. 326 promising new, even simpler approach for landscape reduction based on the threshold method was demonstrated by Butler and Day. 327They coarsely estimate the energy barriers and structural relationships between predicted polymorphs via Monte Carlo moves that translate or rotate molecules and/or deform the unit cell parameter, accepting only moves that stay below a given energy threshold.In this manner, the approach identies structures that can interconvert within a chosen energy threshold. 328A 5 kJ mol −1 threshold reduced the number of structures on the crystal landscape by ∼65-99% for several small organics. 327andscape reduction does not guarantee that the remaining predicted crystal structures will be crystallizable.3][54][55] Instead, heuristic models are oen used to identify crystal forms that are likely to crystallize.As noted in discussing galunisertib, conformational strain of the molecules in either the gas phase or appropriate solvents is sometimes considered as a factor in the crystallizabiliy of predicted polymorphs. 26,46,330,331Montis et al. estabilished a relationship between low surface roughness (rugosity) and crystallizability that can be used to infer the relative likelihood of crystallizing various polymorphs on the crystal energy landscape. 332In systems such as ROY, some of the less-stable polymorphs observed experimentally are among the smoothest, suggesting that kinetics favors their formation.In contrast, several other pharmaceutical polymorphs that have been difficult to produce have both higher energies and high rugocities, suggesting that both thermodynamics and kinetics hinder crystallization.The data-driven generalized convex hull approach of Anelli et al. is another promising strategy for exploring which crystal structures might be experimentally synthesizable. 333inally, while predicting crystallization kinetics remains difficult, there has been progress in predicting the thermodynamic conditions under which a given polymorph will be preferred-i.e.polymorph phase diagrams.5][336][337][338] However, the situation is more difficult closer to ambient conditions, since the predicted phase transition temperatures can be extremely sensitive to small errors in the computed free energies. 339Despite these challenges, quite accurate temperature-dependent phase diagrams have been predicted for systems such as ice, 338 carbon dioxide, 336,337 methanol, 155 and resorcinol. 156Polymorph phase transition temperatures have also been predicted for more complicated drug species. 30,48However, one should bear in mind how important fortuitous error cancellation is in predicting phase boundaries.For example, a 1 kJ mol −1 error in the relative free energies between a and b methanol alters the predicted ambient-pressure phase-transition temperature by more than 200 K! 155 In the end, even if current CSP techniques cannot perfectly determine which crystal forms will be realized experimentally, they remain useful for assessing the polymorphic "risk" for a given species. 28,171This is particularly valuable for the pharmaceutical industry, as exemplied in the studies on galunisertib, 26 gandotinib, 25 hydrates, 340 and salts 177 discussed earlier.

Future outlook
Given the rapid developments in CSP over the last 5-10 years, it is interesting to speculate where new advances will occur over the next several years.First, it is likely that there will be increasing emphasis on using nite-temperature free energies instead of 0 K lattice energies for the nal rankings.This is already routinely being done in the pharmaceutical industry, where compute budgets are typically larger than in academia, and it is sometimes done in academic studies as well.Rapid improvements in machine learning potentials will likely also

Chemical Science Perspective
increase the accuracy with which those free energy simulations can be performed by enabling dynamics-based approaches to be used on potentials that approach quantum mechanical accuracy. 44,101Improved understanding of how the strengths and weaknesses of widely-used DFT-D methods (e.g.delocalization and other systematic errors) impact crystal energetics will make it easier to identify when crystal energy rankings are likely to be problematic.Second, interpretation of the crystal energy landscape will continue to gain in importance.Rapid developments in crystal energy landscape reduction are likely continue apace.Methods such as meta-dynamics or the threshold algorithm hopefully become much more widespread and routine.Once again, accurate, inexpensive potential energy models and structure clustering strategies based on ML should further improve the performance of these techniques. 44Improved uncertainty quantication for the computed structure energetics will also help users better assess the risks of predicted polymorphs on the landscape.Third, it is not unusual for a high-accuracy crystal structure prediction to cost one million CPU hours per species at present.Entering the era of CSP-driven rational design will place a greater emphasis on performing "reasonably reliable" CSPs that have orders of magnitude lower computational cost, such that candidate materials can be screened en masse.Such approaches could mean learning to extract useful information from imperfect crystal energy landscapes (as in the organic semi-conductor design study discussed in Section 4.6), developing new intermediate ranking and renement models (a.k.a.surrogate models 43 ) that more effectively lter structures to reduce the number of nal structures for which DFT-D calculations are needed, or even adopting entirely new data-driven topological approaches for generating short-lists of candidate structures quickly, without extensive hierarchical ltering algorithms. 44,341,342ourth, beyond merely predicting structures, rational design efforts will increase the emphasis on computing functional properties of the putative crystals.Examples for gas storage and separations, organic semi-conducting properties, and photomechanical responses were already mentioned above.Feng and co-workers recently computed the photoluminescence properties of ROY and co-crystals of 9-acetylanthracene to understand the interplay of intra-and intermolecular interactions. 208mproved ability to predict pharmaceutical solubilties (Section 4.8) or to assess the photostability of candidate formulations (Section 4.5) would be very useful as well.
Fih, as the applicability of CSP expands, there is also a clear need for the development of more user-friendly soware tools to democratize access to CSP.Current CSP is still almost exclusively performed by specialists.In academic research environments, CSP oen relies on a disjointed collection of soware packages and home-built scripts for passing structures between them and processing the results.CSP tools developed in industry are more user-friendly, though those companies oen cater more to larger budgets and computing capabilities of pharmaceutical companies than to smaller-scale academic research groups.Moreover, new method developments from different groups are not always widely/publicly available in the short-term.Of course, radical reductions in computational cost will also be needed to enable truly widespread use of CSP by non-expert practitioners.

Conclusions
In conclusion, crystal structure prediction has advanced dramatically to the point where experimental crystal structures can be predicted successfully much more oen than not.Applications of CSP have moved on from small-molecule benchmarks to real-world pharmaceutical formulations and functional organic materials.New frontiers are opening in areas such as the ability to use CSP to rationally design new materials with targetted properties or to model solid state chemical transformations.Identifying which predicted crystal structures can be made experimentally has been challenging, though good progress is being made there as well.Heinlein's dream of theory-driven materials design is quickly becoming reality, even if it is a couple decades late.

Chemical Science
Perspective Gregory J: O: BeranGregory Beran earned his PhD at the University of California Berkeley in 2005, working with Prof. Martin Head-Gordon, and performed postdoctoral research with Prof. William H. Green at the Massachusetts Institute of Technology 2005-2007.He started his independent career at the University of California Riverside in 2007, where he is currently a full professor.His research focuses on the development and application of electronic structure methods for non-covalent interactions and organic crystal structure prediction.

Fig. 3
Fig. 3 Some CSP procedures involve iterative cycles of force field fitting, structure prediction, and DFT-D structure refinement until the force field and DFT-D crystal energy landscapes are suitably consistent.Adapted with permission from ref. 48.Copyright 2020 American Chemical Society.

Fig. 5
Fig. 5 Free energy calculations on experimentally-known forms A and B of oxabispidine and several other predicted polymorphs find that form A only becomes the most stable form near room temperature.Adapted with permission from ref. 30.Copyright 2021 American Chemical Society.

Fig. 6
Fig.6Predicted crystal energy landscapes for (a) rotigotine 172 and (b) galunisertib.26Red points indicate experimentally-observed polymorphs.For rotigotine, a pair of static structures was identified for each of forms I and II which correspond to the two possible conformations of the disordered thiophene ring.The structures labeled "form III" for rotigotine and "GM" for galunisertib have not yet been found experimentally.
Fig.8After addressing the DFT delocalization error issues, the predicted crystal energy landscape of ROY shows that the lowest-energy polymorphs have already been discovered experimentally (red).Interestingly, the hypothetical rank #15 structure in blue is predicted to become the most stable structure near 10 GPa. Figure adapted from ref. 129.

Fig. 13
Fig. 13 Elimination of (a) lattice energy minima that are labile or effectively equivalent at (b) finite temperatures using molecular dynamics, clustering, and enhanced sampling techniques reduces the number of predicted ibuprofen crystal structures by 65%. Figure adapted from ref. 323.