Marcus A.
Neumann
* and
Jacco
van de Streek
Avant-garde Materials Simulation, Alte Str. 2, 79249 Merzhausen, Germany. E-mail: marcus.neumann@avmatsim.eu
First published on 17th May 2018
Based on a thorough and critical analysis of the commercial crystal structure prediction studies of 41 pharmaceutical compounds, we conclude that for between 15 and 45% of all small-molecule drugs currently on the market the most stable experimentally observed polymorph is not the thermodynamically most stable crystal structure and that the appearance of the latter is kinetically hindered.
Crystal structure prediction (CSP) attempts to give an overview of all relevant polymorphs based solely on their thermodynamic stability, i.e. irrespective of possible kinetic considerations. It therefore comes as no surprise that the pharmaceutical industry is adopting a combination of experimental and computational polymorph screens to prevent disappearing polymorph accidents in the future.4
This, then, suggests an obvious approach to our question “How many ritonavir cases are there still out there?”: use CSP to process a large number of small molecules with at least one known crystal structure and count in how many cases, according to the CSP results, the thermodynamically most stable crystal form has been missed.
It is not that straightforward, however. First, the answer to our question requires statistics over a significant number of compounds, not merely a case study. Second, the compounds must be relevant for the problem at hand: pharmaceutical compounds tend to be large and flexible, whereas most published CSP studies report results for small compounds of limited flexibility. Third, a thorough experimental polymorph screen must have been conducted for each of the compounds to ensure that missed polymorphs are due to consistently hindered kinetics and not merely a one-off. Fourth, the error bars on the lattice energies of the CSP studies must be known (or, as a minimum, a reasonable estimate must be available) to distinguish between those cases where the prediction of a thermodynamically more stable form is due to a missed experimental form and those cases where the prediction of a thermodynamically more stable form is merely caused by an incorrect assessment of the relative stabilities of the predicted crystal structures by the energy potential used. Fifth, the structure generation step must be complete enough to render statistics regarding the possibility of additional polymorphs meaningful.
To assess the feasibility of current CSP methodologies to answer the question in the title, it pays to take a closer look at the results of the periodically published crystal structure prediction blind tests. CSP blind tests have been organised in 1999, 2001, 2004, 2007, 2010 and 2015. The results can be found in the most recent publication5 and references therein. For each blind test, crystallographers are invited to submit unpublished experimental crystal structures, the molecular 2D diagrams for which are then sent out to the participants together with a deadline for the submission of their predictions. The target crystal structures are guaranteed to be free from disorder and certain limits on the chemical elements, space-group symmetry and number of molecules in the asymmetric unit are also specified. After the submission deadline, the experimental crystal structures are made public and the computational results are published in a joint publication. In earlier blind tests, the compounds were small, rigid, neutral molecules, but the compounds became increasingly more complex as time progressed (and methods improved). For the purpose of our discussion, we will focus on two aspects: the results for all small, rigid, neutral molecules and the results of the most recent (2015) blind test.5
In gathering our statistics for large, flexible molecules, we are hampered by three error sources that may be hard to disentangle: the error bars on the lattice energies, the completeness of the computational structure generation step and the completeness of the experimental polymorph screen (which we presume to be hindered by kinetics). When investigating the relationship between ease of crystallisation and molecular descriptors, it was found that the smaller and the more rigid a molecule, the easier it crystallises.6 For small, rigid, neutral molecules we can therefore expect both the computational and the experimental polymorph screens to be essentially complete, at least significantly more complete than those for large, flexible molecules. So as a first approximation we can make the assumption that the number of missed most stable polymorphs for small, rigid, neutral molecules (both computationally and experimentally) is equal to zero, and consequently any discrepancies between the experimental and the predicted results can be attributed solely to shortcomings in the energy potential. The results for all small, rigid, neutral CSP blind test molecules should therefore tell us something about the error bar on our energy potential. Focussing exclusively on the results obtained with our own method, the results are reported in four papers7–10 and in total provide data for 17 small molecules. All 17 experimental target structures were successfully located in the structure generation steps. Thirteen times an experimental structure is identified as the global minimum, three times an experimental structure is ranked second and once an experimental structure is fourth in the list of predicted structures. The energy differences between the predicted polymorphs are very small, of the order of 0.1 kcal mol−1, suggesting error bars for our energies of the order of 0.1 kcal mol−1 for small, rigid, neutral molecules. However, although these results provide a lower limit for the estimated standard deviation of our energy function, small, rigid, neutral molecules are not what we are interested in. In order to answer our question in the title, we must establish how current methods perform for compounds that reflect the complexities of pharmaceutically active molecules.
The most recent CSP blind test featured four target compounds of a complexity approaching that found in pharmaceutical compounds (Fig. 1). For targets XXIV, XXV and XXVI (the molecules have been numbered consecutively since the first blind test in 1999), one polymorph is known, each with one molecule in the asymmetric unit. For target XXIII, five polymorphs are known, of which two contain two molecules in the asymmetric unit. Twenty-five research groups participated in the blind test, providing a good overview of the state-of-the-art in molecular crystal structure prediction. First, let’s take a look at the results of the other 24 participants, keeping in mind the requirements that will lead us to our goal of using the state-of-the-art in molecular crystal structure prediction to answer the question in the title. Of these 24 participants, four did not have access to a structure generation tool of their own and relied on the list of generated structures of one of the other participants. Of the remaining 20 participating research groups (still excluding our own contribution), only seven attempted all four compounds, and only two attempted the difficult Z′ = 2 polymorphs. In other words, of the 24 research groups involved in the development of methods for crystal structure prediction, only two submitted candidate structures for target molecule XXIII with two molecules in the asymmetric unit. Adding together all polymorphs, eight distinct experimental structures should be found for these four compounds. The best results were obtained by the research group of Prof. Price, who were the only group to successfully predict four of the eight crystal structures, including one of the two Z′ = 2 polymorphs. The latest CSP blind test therefore paints a very sobering picture: large-scale prediction of the crystal structures of large, flexible molecules with one and two molecules in the asymmetric unit is hard and is currently beyond the capability of most CSP methods, either due to limitations in the available hardware or due to limitations in the available software. In addition, the blind test compounds, with the exception of molecule XXIII, have not been experimentally screened for polymorphism.
Fig. 1 The four large and flexible chemical compounds that were provided as targets in the last crystal structure prediction blind test. |
The results with our in-house developed software for crystal structure prediction and our in-house hardware were more promising: seven of the eight experimentally observed crystal structures were generated in our searches, missing out only one of the more challenging Z′ = 2 targets. Furthermore, the compounds we are sent to investigate have always been thoroughly experimentally screened. These results should stand us in good stead to use our crystal structure prediction technology to address the question in the title: “How many ritonavir cases are there still out there?”.
The last CSP blind test revealed that one group had succeeded in the development of a more accurate method for the energy ranking of molecular crystals.18 The new method, here called PBE(0) + MBD + Fvib, combines energy calculations with the PBE functional with a method for the calculation of Many-Body Dispersion (MBD), a PBE0 correction to include exact exchange and a calculation of the vibrational free energy in the harmonic approximation. By comparison with PBE(0) + MBD + Fvib we will assess the energy error of our own energy potential.
The crystal structure generation algorithm is described in some detail elsewhere.20 In brief, a parallel-tempering algorithm is used for a given number of space groups with a fixed number of molecules in the asymmetric unit. Crystal structures are generated in an energy window equal to four times the estimated standard deviation of the tailor-made force field with respect to the DFT-D energies. During the search, the completeness is estimated by counting how often each unique structure has been generated, and the search is continued until it is estimated that 99% of the structures in the search space have been generated.
Fig. 2 Two molecules (bicalutamide and crizotinib) illustrating the average size and flexibility of the 41 compounds used in this study. |
Fig. 4 Histogram of the number of non-trivial flexible torsion angles in the 41 compounds of our test set. Each flexible ring is counted as two flexible torsion angles. |
Fig. 5 Example energy landscape. Each dot represents a predicted crystal structure. Two predicted structures that have been matched with experimental structures are indicated by red circles. |
The distribution of the number of experimentally observed forms over the test set is shown in Fig. 6.
In order to construct a computational equivalent of the empirical histogram shown in Fig. 8 for a given set of parameters, we use a Monte Carlo procedure implemented in a computer program. The procedure generates 41 random energy landscapes (Fig. 5; the densities do not play a role and are ignored), which are combined into a histogram measuring the energy differences between the most stable predicted structure and the most stable experimental structure, analogous to Fig. 8 (see below in Results and discussion). For each set of parameter values, the random generation of 41 landscapes is repeated 100 times, generating 100 histograms. The 100 histograms are then averaged and the standard deviation for each is calculated. For each set of parameters, the average histogram is compared to the actual histogram resulting from our 41 crystal structure prediction studies (Fig. 8). To avoid confusion, we call the histogram in Fig. 8 the empirical histogram, as opposed to a single, randomly calculated histogram or the average calculated histogram for a set of parameters (we realise that the adjective “empirical” is somewhat ironic considering that the construction of the empirical histogram involved about six years of calculations on a high-performance computing cluster).
We now discuss the parameters of the model. The energy error, σ, is meant to be the root-mean-square deviation between the energies of lattice energy minima calculated at the DFT-D level and the true lattice free energies in the real world. The hindered fraction, h, is the fraction of lattice energy minima for which the appearance is strongly kinetically hindered. By definition, h is also the fraction of the thermodynamically most stable crystal structures that do not readily crystallise, i.e. the main quantity that we are trying to bracket in this paper. Finally, a parameter b is required that is the factor in the exponent of an exponential function describing the rapid increase of the effective number of lattice energy minima with increasing lattice energy.
The notion of the effective number of lattice energy minima, Neff, requires some further discussion. Crystal energy landscapes typically contain many structures that are too similar to be observed independently. We refer to such groups of similar structures as “families” and the effective number of lattice energy minima counts the number of families rather than the number of individual family members. Currently the decision whether two predicted crystal structures belong to the same family is taken based on manual examination of the structures. The differences between family members can be subtle, such as a different orientation of a terminal ethyl group, a small unit cell change or a small molecular displacement. Or there may be structures with two molecules per asymmetric unit that almost match a structure with only one molecule in the asymmetric unit. In such cases thermal motion will either average over the different configurations or result in rapid conversion to the most stable one. But family members may also only share a rod or sheet motif held together by strong interactions. In such cases, each family member can act as a nucleation substrate for all others and the crystal growth process will typically favour the thermodynamically most stable family member. Therefore, at a given pressure and temperature, only one family member will typically be observed experimentally. The stability ranking of family members may change as a function of pressure and temperature and more than one family member may be experimentally observable. The estimation of the average number of members per family will be discussed in the Results section. Our statistical analysis is based on the number of families rather than on the number of individual lattice energy minima for three reasons:
(1) The energy error (DFT-D versus reality) is typically smaller for members belonging to the same family than for members belonging to different families. In each family, structures are similar and error compensation is strong. The energy error σ can be expected to be dominated by the inter-family error.
(2) Many family members are too similar to be observed independently.
(3) Even if family members can be observed independently because they have different p–T stability domains, they must be expected to be either all kinetically hindered or not kinetically hindered, because of their structural similarity and the possibility of cross-nucleation.
The expectation value of the effective number of lattice energy minima per energy interval is assumed to be A·exp(b·E)·dE, were E is the energy per chemical unit. Without loss of generality, the prefactor A is adjusted to yield an expectation value of 10 in the energy interval from 0 to 1 kcal mol−1 (note that 0.0 kcal mol−1 does not correspond to the energy of the most stable structure). The energy range from −5.0 to 5.0 kcal mol−1 is subdivided into 100000 intervals. To each interval, a number of structures is randomly attributed following a Poisson distribution that yields the correct mean value according to the exponential function mentioned above. Structures are randomly marked to have kinetically hindered dynamics according to the hindered fraction h. The number of experimentally observed polymorphs, Np, is randomly chosen according to the experimental distribution for the 41 test cases shown in Fig. 6. The first Np non-hindered low energy structures in the 100000 intervals are marked as experimental. Finally, the energy of each structure is randomly shifted following a normal distribution with a standard deviation that corresponds to the energy error σ.
To measure the deviation between the empirical histogram and an average histogram calculated from 100 randomly generated histograms, we use a Figure Of Merit (FOM) that is the sum over the absolute differences between the number of structures in each bin:
(1) |
Fig. 7 Fluctuation of the 100 randomly calculated histograms for σ = 0.5 kcal mol−1, b = 1.609 mol kcal−1 and h = 0.3 (cf. Fig. 9b). |
(2) |
The results are presented in Table 1. Compound XXV was not included in the comparison because PBE(0) + MBD + Fvib energies were available for only 4 structures, some of which are co-crystals and others are salts. Our DFT-D method is known to suffer from an increased energy error when it comes to the comparison of co-crystal and salt structures, but this is irrelevant for the present analysis because none of the 41 test cases presents this kind of ambiguity.
Compound | N | σ rel | σ tot |
---|---|---|---|
XXIII | 46 | 0.44 | 0.51 |
XXIV | 8 | 0.57 | 0.62 |
XXVI | 8 | 0.26 | 0.36 |
The average total error for our method over compounds XXIII, XXIV and XXVI is 0.5 kcal mol−1. With 26 and 40 non-hydrogen atoms, respectively, compounds XXIII and XXVI match the size range of the 41 test compounds. The flexibility of both compounds is slightly below average. With only 11 non-hydrogen atoms, compound XXIV is much smaller than any compound in the test set, but being the hydrate of a chloride salt, lattice energy calculations for compound XXIV are much more challenging than for most other structures in the test set. Overall, it can be assumed that the average energy error obtained for compounds XXIII, XXIV and XXVI is fairly representative for our test set. It is important to note that the error σtot in Table 1 compares the energies of minima on DFT-D lattice energy surfaces to the true lattice free energies. Our average total error of 0.5 kcal mol−1 hence includes not only the inherent inaccuracy of DFT-D, but also the neglect of zero-point vibrations and entropic effects.
For an intuitive assessment of the number of missing thermodynamically stable forms, one may argue that, considering the low number of test cases, a more stable form has certainly been missed when the energy gap exceeds two standard deviations of the energy error. According to Fig. 8, this is the case for 6 out of 41 cases. For energy gaps of between one and two standard deviations, one may roughly assign half of the cases to the energy error, and half of the cases to a missing form problem. In the case of Fig. 8 that would add another 3.5 missing forms, resulting in a total of 9.5 out of 41 (23%). We will see in the next section that this value is in good agreement with a more careful statistical analysis.
In order to support our statement that energy gaps above two standard deviations of the energy error are likely to indicate missing, more stable forms, the energy differences between the first and the late discovered polymorphs of ritonavir and rotigotine are indicated in Fig. 8 by green arrows (no crystal structure prediction studies have been carried out). If an energy landscape had been calculated for these two compounds after the discovery of the first appearing polymorph but before the discovery of the late appearing form, the energy gap would have been as indicated in Fig. 8, or greater.
As an example, Fig. 9 shows the average calculated histograms for the two parameter sets presented in Table 2. Both parameter sets are in equally good agreement with the empirical histogram. However, in one case the energy error is 0.75 kcal mol−1 and the hindered fraction is 0.0, whereas in the other case the energy error is 0.5 kcal mol−1 and the hindered fraction is 0.3. We note that because disappearing polymorph cases such as rotigotine, ranitidine and ritonavir are known to exist, our a priori knowledge tells us that it is not possible for h to be 0.0. We have shown both cases as a warning that the estimation of the number of missing thermodynamically most stable forms derived in this paper strongly depends on the estimated energy accuracy.
Fig. 9 The two histograms corresponding to the parameter sets from Table 2. |
Set | σ [kcal mol−1] | b [mol kcal−1] | h | FOM |
---|---|---|---|---|
a | 0.75 | 1.609 | 0.0 | 5.5 |
b | 0.5 | 1.609 | 0.3 | 6.2 |
Fig. 7 shows the randomly calculated histograms for parameter set b. Some of these histograms actually have structures in the bin at 3.75 kcal mol−1, and what looks like an outlier in Fig. 8 and 9 is actually in agreement with the statistical model.
When the energy error σ and the exponential factor b go up, the hindered fraction has to go down in order to maintain the same level of agreement with the empirical histogram. Above we have already derived an estimate for the energy error. In order to estimate the hindered fraction h, we now also need to bracket the exponential factor b. The exponential factor b determines the effective number of lattice minima that are found on average in a 1 kcal mol−1 window above the most stable structure (see Table 3). We compare these values to the full number of lattice energy minima found for the 41 test compounds. Fig. 10 shows the distribution of the full number of lattice energy minima in a 1 kcal mol−1 window. The average value is 16.7 and goes down to about 14 if the outlier is not considered. The average full number of lattice energy minima, objectively determined from the computed crystal energy landscapes of the 41 test compounds, needs to be converted to an estimate of the effective number of lattice energy minima. This conversion is inevitably subjective at present, because there is no generally accepted measure of how similar structures need to be in order to belong to the same family. According to our experience, which is probably shared by other practitioners of CSP, families have on average about two members in the bottom 1 kcal mol−1 window, but certainly not less than 1.5 or more than 3. Using these last two values, we expect the effective number of lattice energy minima to be in the range of 5 to 10, which corresponds to exponential factors in the range from 1.61 to 2.30.
b [mol kcal−1] | N eff |
---|---|
0.693 | 2.3 |
0.125 | 3.5 |
1.61 | 5.0 |
2.01 | 7.5 |
2.30 | 10 |
2.70 | 15 |
3.00 | 20 |
Table 4 presents figures of merit for σ = 0.5 kcal mol−1 and various values of the exponential factor b and the hindered fraction h. Parameter sets for which the figure of merit is lower than are marked by a grey background in the table. Considering only the range from 1.61 to 2.30 for the exponential factor, we conclude that the hindered fraction h must be in the range from 0.15–0.45.
In addition, it is desirable to reassess the energy landscapes of the 41 test compounds at the PBE(0) + MBD + Fvib level of theory, and to benchmark both DFT-D and PBE(0) + MBD + Fvib against experimental free energy differences between crystal polymorphs for compounds that are similar to the test set in terms of size and flexibility.
This journal is © The Royal Society of Chemistry 2018 |