P.
Cysewski
Chair and Department of Physical Chemistry, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-950 Bydgoszcz, Poland. E-mail: piotr.cysewski@cm.umk.pl
First published on 13th October 2015
A computationally inexpensive screening tool was formulated allowing for rational pre-selection of components as the most promising candidates for synthesis of two-component cocrystals. The proposed procedure relies on the scoring function quantifying dissimilarity of 2D histograms, which describe distributions of heat of formation resulting from two counterparts of database consisting of 1599 cocrystals and 492 cases of pairs immiscible in the solid state. It has been observed that higher probability of cocrystal formation is to be expected if components differ by heat of formation (Hf) values and at least one of components is hydrophilic. This observation was also validated based on an additional 1590 cocrystals not included in the training set. On the contrary, the high probability of simple eutectic systems is expected for cases of high similarities and positive values of Hf. Based on the formulated scoring function, the regions of observed highest probability of cocrystallization can be identified. Although the proposed phenomenological approach cannot provide absolute prediction of cocrystal formation, it does offers practical and simple guidance for rationalizing the selection of potential components for real practical purposes.
According to one of the most useful definitions,29,30 a cocrystal is a homogeneous crystalline solid containing stoichiometric amounts of discrete neutral molecular species that are solids under ambient conditions. It is also worth mentioning that cocrystals are supposed to be more beneficial than salts especially in the cases with absence of ionizable functional groups. Additionally, the number of neutral components of potential use included in International Food Additive Database or Codex General Standard for Food Additives31 far more exceeds the number of available counterions.
In this context, it is quite understandable that a great diversity of experimental techniques has been developed for cocrystal synthesis starting from direct mechanochemical approach,32–34 sonification,35 solvent evaporation36,37 including microfluidic method,38 many variations of slurry methods39 or melting techniques.40 It is also worth mentioning that droplet evaporation and sample orientation41,42 can also offer some advantages of cocrystal screening.43 However, the selection of potential cocrystal formers is not obvious especially for pharmaceutically acceptable heteromolecular complexes.
Thus, it is essential to develop theoretical strategies for identifying the most promising candidates for cocrystallization before the actual synthesis. This was especially emphasized by formulating a practically sensible screening paradigm,4 emphasizing the necessity for a computationally inexpensive pre-screening tool that allows for rational selection of chemical species as the most promising candidates for cocrystal synthesis. For this purpose many theoretical approaches have been formulated. Especially, those ignoring the crystal lattice details seem to be valuable in this context, allowing for broad and relatively inexpensive scans of cocrystallization landscapes. For example, the electrostatic potential surface of the molecule was used for identification of the most likely contacts between components.44 Nowadays, the mixing enthalpy of supercooled coformers with a given stoichiometry was used for screening of API cocrystallization.45 Besides, the Hansen solubility parameters46–48 were also applied as they take advantage of only the knowledge of the chemical structure of interacting components. The concept of supramolecular phenomena49,50 has been also proven as valuable guide for practical application of crystal engineering principles51 revealing major driving forces toward molecular compound formation.52 For example, a synthon design approach demonstrated preference of heterosynthons over a homosynthons.53,54 Prediction of hydrogen-bonding propensities based on Cambridge Structural Database (CSD)55 statistics was proven to be very effective.56,57 Alternatively the semiquantitative model for predicting of cocrystallization chance was formulated58 in terms of statistical analysis of molecular descriptors distributions. The detailed analysis of known cocrystals, led to identification of shape and polarity similarities between components involved in structures deposited in the CSD.
Such methodology, relying on finding statistically significant patterns adopted by a particular set of structures, was the background of this paper. However, instead of focusing exclusively on cocrystals the attention was extended also on simple binary eutectic mixtures as negative cases of intermolecular complexation. Such a complementary approach can provide more details on similarities between formers of either systems. The application of structure-to-property approach often adopted in chemoinformatics requires the definition of the scoring function. This element, which the previous studies lacked, is addressed in this paper and the discriminating factors between either of the system types are quantified in order to guide the selection of the most promising candidates for screening of cocrystals or simple eutectics.
The positive counterpart of the database (further denoted as (+)) was built by addition of cocrystals found in the Cambridge Structural Database (CSD)55 for solved structures and also from other sources providing binary phase diagrams. Organometallic compounds were excluded along with solvates, hydrates, clathrates, ions or polymers. The positive part of the database comprises 1636 pairs and the final list of considered compounds reached 1001 constituents. They show great diversity of physicochemical properties, starting from non-polar species such as aromatic hydrocarbons, and ending on highly polar carboxylic acids, amides or alcohols.
All molecules were optimized using a PM7 semiempirical model59 using implicit COSMO model60 in water solution. These computations were performed with an aid of Mopac2012 program.61 Based on obtained geometries single-point computations were completed for getting various molecular descriptors. The full list of molecular descriptors derived from MOPAC2012,61 NBO62,63(G0964) or AIMALL65 outputs is provided in supplementary materials (Table S2, ESI†).
The obtained distributions were analyzed in terms of normalized 2D histograms for which contour maps were plotted using SigmaPlot (ver.12.3). It is worth mentioning that for making the description independent from the numbering of components the number of pairs was doubled with reversed order of constituents. Since all distributions usually failed in fulfilling the requirement of normality, the Spearman's nonparametric correlation coefficient was used as a measure of correlations. It is based on the ranking of values rather than mean and standard deviation as offered by the often used Pearson correlation coefficient.
For quantifying the difference between patterns provided by 2D histograms, characterizing either of the counterpart of the database, the root mean square deviation was computed and similarity parameter was computed according to the following formula: s = 100%/(1 + rmsd), for each of molecular descriptor distributions. Based on similarity values, the molecular indices were classified according to their potential of discriminating two-component cocrystals from simple binary eutectic mixtures.
![]() | ||
Fig. 1 The distributions of normalized values of heat of formation (Hf) characterizing compounds involved in cocrystallization (left) and not-forming the molecular complex in the solid state (right). |
However, the Spearman rank correlation coefficient for pairs involved in the simple eutectic mixtures is as high as σ(−) = 0.533 (P < 0.05), while for pairs of cocrystals it equals σ(+) = −0.258 (P < 0.05). It is interesting to notice the opposite sign of correlation coefficient in the latter case. For binary eutectic mixtures a quite moderate and positive trend suggests that similarity in terms of heat of formation is one of the most important factors contributing to immiscibility in the solid state. As presented in Fig. 1. there is quite a small range of Hf for the most often occurring pairs involved in binary eutectics. To the contrary, negative sign of Spearman rank correlation coefficient for cocrystals indicates rather the dissimilarity of cocrystal formers as a driving factor for cocrystallization. This statistical observation is further illustrated by patterns of the most frequently appearing pairs of cocrystals. On 2D histograms one can find a maximum located almost in the same place as for binary eutectics, characterizing hydrophobic nature of one of co-formers, while the value for the other component is off-diagonal and corresponds to a more hydrophilic character. Also, there are many cases of cocrystals for which both reagents have negative values of Hf. However, it is rather typical for such kind of systems in which the heat of formation values are different for both constituents. Thus, it seems that one of direct indicators of the possibility of cocrystallization is a sufficiently high affinity toward polar solution of at least one of compounds.
The second molecular descriptor being able to discriminate cocrystals from simple eutectics is the energy of the highest occupied molecular orbital, εHOMO. In Fig. 2 there are presented contour maps portraying such distributions for both sets of analyzed binary mixtures. All molecules considered here are characterized by εHOMO between −12.43 and −7.98 eV. Again, quite distinct patterns are observed for two counterparts of the database. Although the correlation between HOMO energies of compounds involved in simple eutectics is quite low, σ(−) = 0.289 (P < 0.05), there is a strong peak on the 2D histogram suggesting that there is a preferred range of εHOMO value close to −9.1 eV (εHOMO,N = 0.75). It is quite rare to see immiscibility in the solid phase associated with strong differences between both components expressed in terms of εHOMO. In the cases of interacting monomers involved in cocrystals the correlation between compounds is even weaker, σ(+) = −0.153 (P < 0.05), and the 2D pattern is quite complex. This indicates a great diversity of molecules exhibiting high affinity for intermolecular interactions. Since εHOMO quantifies the ability of electron donation and acting as a donor, it is quite expected that an opposite trend should occur in the cases of intermolecular compounds. However, generally it seems that εHOMO has a lower potential of distinguishing cocrystals from simple eutectics compared to Hf. Besides, there are also other molecular descriptors, which exhibit some abilities in predicting the chance of cocrystallization or simple eutectic formation. However, they are either correlated each to other or have similarities of corresponding 2D histograms higher than 85%. Illustrative plots of distributions of these alternative indices are provided in supporting material as Fig. S2 and S3 (ESI†). Furthermore, the application of two different descriptors on abscissa and ordinate was also considered. The use of εHOMO and M/H results in rmds = 0.183 and similarity s = 84.5%. Alternatively, selection of Hf and εHOMO provides rmds = 0.181 and similarity s = 84.7%. These plots are provided in supporting materials as Fig. S4 and S5 (ESI†). However, none of these combinations of molecular descriptors provides a better explanation of differences between both counterparts of the database than Hf. Hence, there is no gain in extending the number of parameters on 2D histograms, and heat of formation in aqueous solution as a single parameter description, can serve as quite an effective measure of components affinities.
![]() | ||
Fig. 2 The distributions of HOMO energies (εHOMO) characterizing compounds involved in cocrystallization (left) and not-forming the molecular complex in the solid state (right). |
The distributions of the heat of formation for both analyzed counterparts of the database can be used for formulation of the rule for pre-selection of candidates for cocrystallization. Regions of dissimilarities between 2D histograms prepared for positive and negative cases of cocrystallization were examined just by calculating the difference between distributions presented in Fig. 1. This leads to the matrix presented in Fig. 3., which defines the scoring function. Each square, representing a region of Hf with 5% × 5% resolution, comprises a Δp number which defines the relative probability of pairs occurrence within given range of Hf values. The graded intensities of the red color indicate an increasing chance for cocrystallization and the more intense the green color, the more probable is the formation of simple eutectics by a given pair of components. The scoring function provided in Fig. 3 can be used for selection of compounds with the highest chance for either molecular complex formation or immiscibility in the solid state. The applicability of the proposed rule is demonstrated in Fig. 4, where the percentage of pairs selected from the whole database is plotted against Δp. Changes of this criterion impose restrictions on components properties (Hf in this case) and can help in selecting compounds with a high chance of cocrystallization or simple binary eutectic mixture formation. The black line represents positive cases of cocrystallization, while grey color was used for denoting the negative counterpart of the database. In the cases of cocrystals it is expected to find regions for which not only the chance of success is maximized but also, concurrently the probability of failure is minimized. Let's represent the percentage of selected structures with respect to the whole database (total number of pairs in training set is equal to 492 + 1636 = 2128). Obviously, if no restrictions are imposed on pairs, the chance of randomly picking the cocrystal is equal to 1636/2128 = 76.9% and simultaneous chance of failure is 492/2128 = 23.1%. After application of the scoring function one can expect to find regions with as high as possible number of cocrystals and as low as practically available number of binary simple eutectics. As it is documented in Fig. 4, the application of the exclusion criterion, Δp, significantly affects the number of considered compounds and application of a too restrictive norm can exclude too many potential pairs. In general this is not very limiting since one can consider many other potential cocrystals.31 Considering cocrystal screening, one can assume that Δp ≥ 0.15% and in consequence the number of positive cases fulfilling this requirement drops down to 1018 cases. Correspondingly, the number of simple eutectic systems found in the same range of Hf values is reduced to only 83. Thus, the chance of failure was reduced from 23.1% down to 3.9%. This is considered as the uncertainty of the proposed pre-selection rule for screening of cocrystals. Thus, selecting pairs of co-formers, which adopt values of heat of formation in the range fulfilling requirement of Δp ≥ 0.15%, leads to quite an acceptable ratio of success with reasonably low chance of failure. Of course one can increase the Δp criterion, but the gain in reducing the number of simple eutectics will not be worth the exclusion of too many potential cocrystallizing pairs.
![]() | ||
Fig. 4 The percentage of structures selected from the database as a function of the relative probability of cocrystallization (Δp) after application of the scoring function defined in Fig. 3. Provided number of structures correspond to Δp = 0.15. |
It is interesting to see how the proposed selection rule will perform in selection of compounds not included in the training set. For this purpose there were selected such cocrystals deposited in the CSD that were not used in the definition of the scoring function. The following four categories were used, namely all available aromatic carboxylic acids (ArCOOH), aliphatic carboxylic acids (AlCOOH), aromatic amides (ArCONH2) and also 400 other systems randomly selected from all known 7688 binary cocrystals deposited in the CSD. The resulting distributions were presented in Fig. 5. First of all, the contour maps exhibit essentially the same patterns as observed for compounds constituting the training set. The percentage of structures predicted by the rule utilizing Hf is in average 62%. Although the (ArCONH2) set seems to be the worst among all others considered, it is also the smallest pool. Bearing in mind that the proposed pre-selection rule is very simple, the acceptable chance of success is surprisingly effective. It is also worth underlining that from a practical point of view the elimination of many positive cases from the analysis is not very limiting, since there is still a sufficient amount of possible pairs31 for which the screening can be performed with high efficiency.
![]() | ||
Fig. 5 The applications of selection rules for picking of components in regions of the highest probability of cocrystallization based on scoring function defined in Fig. 3. |
Based on the formulated scoring function the regions of observed highest probability of cocrystallization can be identified. The procedure proposed here has several advantages. First of all, it allows for direct and practical applications in the presented form, which will be documented in a forthcoming paper. The heat of formation computed on cheap quantum chemistry level can be easily estimated for a variety of even sizable chemical systems without sophisticated computational resources. Besides, the procedure is flexible and new molecular descriptors of probably even better discriminating potential can be used in a similar manner as the heat of formation in this case, which is also worth further exploration. Furthermore, the extension of the database will eventually lead to better tuning of the scoring function, which seems to be a natural extension of this study on structure-to-property relationships revealed through diversities/similarities of components. Such a strategy is commonly applied in chemoinformatics and might be helpful as the initial step before actual experiments. Since the collected database comprised 1636 cocrystals and 492 binary eutectic systems it seems to be quite representative. Interestingly, the extension of the number of cocrystals up to 3226 cases not included in the training set does not change the observed patterns of Hf distributions. Although the proposed model-based approach cannot provide absolute prediction of cocrystals or simple eutectic mixtures, it is of practical importance since it offers guidance for rationalizing the selection of potential cocrystal formers for real purposes.
Footnote |
† Electronic supplementary information (ESI) available: List of content of the negative counterpart of the database, definition of molecular descriptors and additional contour maps. See DOI: 10.1039/c5nj02013a |
This journal is © The Royal Society of Chemistry and the Centre National de la Recherche Scientifique 2016 |