Christina A.
Gatsiou
,
Claire S.
Adjiman
and
Constantinos C.
Pantelides
*
Molecular Systems Engineering Group, Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK. E-mail: c.pantelides@imperial.ac.uk
First published on 12th July 2018
In lattice energy models that combine ab initio and empirical components, it is important to ensure consistency between these components so that meaningful quantitative results are obtained. A method for deriving parameters of atom–atom repulsion dispersion potentials for crystals, tailored to different ab initio models, is presented. It is based on minimization of the sum of squared deviations between experimental and calculated structures and energies. The solution algorithm is designed to avoid convergence to local minima in the parameter space by combining a deterministic low-discrepancy sequence for the generation of multiple initial parameter guesses with an efficient local minimization algorithm. The proposed approach is applied to derive transferable exp-6 potential parameters suitable for use in conjunction with a distributed multipole electrostatics model derived from isolated molecule charge densities calculated at the M06/6-31G(d,p) level of theory. Data for hydrocarbons, azahydrocarbons, oxohydrocarbons, organosulphur compounds and chlorohydrocarbons are used for the estimation. A good fit is achieved for the new set of parameters with a mean absolute error in sublimation enthalpies of 4.1 kJ mol−1 and an average rmsd15 of 0.31 Å. The parameters are found to perform well on a separate cross-validation set of 39 compounds.
Notwithstanding this significant progress, the results of the blind test also demonstrated some outstanding challenges for CSP methods specifically relating to energy ranking. In some cases, experimentally-observed structures for a given organic compound were deemed to have too high a lattice energy to occur in nature, and were consequently discarded from further consideration. Even methods that are generally successful in identifying the experimentally-observed crystal structures often fail to predict the experimentally-observed relative stability order.2 While this may partly be due to the neglect of entropic contributions, the problem primarily arises from the insufficient accuracy of the lattice energy model in evaluating the energy differences between predicted structures, which can be as low as 1 kJ mol−1.3
A variety of models have been developed and used in CSP for the calculation of the lattice energy, ranging from empirical force fields to electronic structure methods. Dispersion-corrected Density Functional Theory (DFT) models4–6 have emerged as accurate methods for calculating energies,1 with benchmark studies suggesting an uncertainty of 3 kJ mol−1 for absolute lattice energies.7,8 However, the computational cost of these models precludes their use for the minimization of large numbers (tens or hundreds of thousands) of structures, a critical step in current CSP methodologies. On the other hand generic force fields such as COMPASS,9 CVFF10 and others are computationally efficient but of limited accuracy.11,12 In view of these factors, the development of force fields that bridge the gap between these extremes has been an important step for all successful CSP methods. In this context, the models used in the sixth blind test included tailor-made potentials fitted to periodic ab initio data,13 potentials derived from symmetry-adapted perturbation theory based on DFT (SAPT(DFT)) calculations,14 and hybrid models combining isolated-molecule electronic structure calculations with empirical repulsion–dispersion potentials.15–17 Further details of the different lattice energy models that have been used in CSP through the years and their performance can be found in the blind test papers18–22 and review papers.11,23
In the CSP approach developed in our group,24 different energy models are used for the global search step in which thousands of possible structures are generated (CrystalPredictor16,17,25,26) and in the refinement step in which the ranking of structures is finalized (CrystalOptimizer27,28). Recent developments in the model for the global search16,17 have led to significant improvements in the accuracy of the force fields for large flexible molecules (such as molecule XXVI in the sixth blind test); in general, all experimentally-relevant structures are typically identified within 20 kJ mol−1 of the global minimum and carried through to the refinement stage.16,17 The latter requires a higher degree of accuracy in the lattice energy. The intramolecular energy is calculated using local approximate models derived from the isolated-molecule ab initio energy27 computed at a chosen level of theory. The intermolecular energy is derived by combining electrostatic interactions represented by distributed multipoles15,29,30 derived from the isolated molecule charge density calculated at the same level of theory, and repulsion–dispersion interactions modelled by an empirical exp-6 potential with parameters taken from the literature such as the FIT potential,31–37 W01 (ref. 38) or other parameterizations.39
The values of the repulsion–dispersion potential parameters used in most current methodologies were originally estimated from experimental crystal data analysed using an atomic charge representation of electrostatics, with charges fitted to the HF/6-31G(d,p) charge density; intramolecular energy contributions were neglected.10,18,39–41 As a result, such parameters are not necessarily consistent with more sophisticated electrostatic models such as those based on multipoles. In addition, only parameters relating to the most common atoms (C, N, O, H, Cl, F) were derived from extensive datasets. For atoms such as sulphur or bromine, parameters from a variety of literature sources are currently being used, a practice that is likely to result in inaccuracies. Furthermore, even within the set of atoms considered in the FIT and W01 parameterizations, some cross interactions (e.g., O–N) were not fitted to experimental data; instead, they were approximated using geometric combining rules that are known to be of limited validity and may lead to inaccurate predictions.42
Overall, as has been stated elsewhere,13 force fields should be carefully parameterized so that the different terms complement each other. In practical terms, repulsion–dispersion potentials need to be reparameterized for the specific electrostatics and intramolecular energy models with which they are intended to be used. There has been a recent effort42 in this direction through the partial reparameterization of the FIT set using the B3LYP/6-31G** and B3LYP/6-311G** levels of theory and multipole expansions for the electrostatics. Focusing on O and N parameters, the work led to encouraging results, with experimental unit cells being reproduced to within 3% and hydrogen bond geometries and sublimation enthalpies between 7.4 and 9.0%.
In this paper, we present a systematic approach for the reparameterization of the repulsion–dispersion term using experimental structural and sublimation enthalpy data. The proposed algorithm can easily be used to derive empirical parameters consistent with different electrostatic models or levels of theory. Here, its applicability is demonstrated by deriving a set of exp-6 repulsion–dispersion potential parameters for C, H, N, O, S and Cl consistent with electrostatic interactions described by distributed multipoles derived from the M06/6-31G(d,p) level of theory. For the first time, this leads to parameters for sulphur that are consistent with the remaining parameter set and this is shown to lead to a significant improvement in performance.
• The unit cell lengths and angles, collectively denoted by X.
• The positions of the centres of mass and the orientation of chemical entities within the unit cell, collectively denoted by β.
• The molecular conformational degrees of freedom (CDFs), i.e. bond lengths, angles and torsions, collectively denoted by θ.
It is also useful to introduce two dependent quantities, i.e. variables that are functions of the independent variables:
• Q(θ) denotes the set of parameters in the distributed multipoles43 describing the molecule’s electrostatic field for a given molecular conformation θ.
• Y(θ,β) denotes the Cartesian coordinates of all atoms in the unit cell, which depend on the molecular positions and orientations β and the molecular conformations θ.
Ulatt(X,β,θ) = ΔUintra(θ) + Ue(Y,X,Q) + Urd(Y,X) | (1) |
The intramolecular contribution ΔUintra(θ) in the above expression is typically computed via a QM evaluation of the conformational energy at given θ.
The intermolecular electrostatic model is based on a distributed multipole representation of electrostatics, placing an expansion up to the hexadecapole moment at each atomic position (as in DMACRYS15 and hence CrystalOptimizer27). The GDMA program30 is used to derive the parameters Q(θ) from the isolated molecule wavefunction;29 the latter is also determined by the QM calculation used for the computation of ΔUintra(θ) at given θ. The intermolecular electrostatic energy, Ue, is calculated as the sum of interactions between multipoles on atomic sites of different molecules. Such a representation of electrostatics has been shown to be successful in predicting highly-directional (anisotropic) lone-pair interactions, π–π stacking in aromatic rings and hydrogen bond geometries in organic crystals.44–46
The repulsion–dispersion component of the intermolecular energy, Urd, is calculated as the sum of all repulsion–dispersion interactions within a predefined cutoff. Each interaction, Urdij, between two atoms of types i and j occurring in two different molecules and separated by a distance r is described through an isotropic atom–atom Buckingham potential:
(2) |
(3) |
Finally, in calculating Ue and Urd, summations of intermolecular atom–atom interactions between the central unit cell and other cells in the crystal must be computed. This is done via a combination of direct and Ewald summations within a predefined cutoff which is set to 15 Å by default.15
Ulatt(X,β;θgas) = Ue(Y,X,Q(θgas)) + Urd(Y,X,p) | (4) |
To assess the validity of the lattice energy model, and in particular of the parameter values p, we seek a local energy minimum close to the given experimental structure, (Xexp,βexp). The lattice energy computed for a given p, denoted by , is given by:
(5) |
The quantities and (X*,β*) form the basis for comparison between predicted and experimental structures, as discussed in the next section.
In order to minimize the potential effects of experimental errors, our study applied the following criteria in selecting appropriate experimental crystallographic data from the CSD:
1. Relatively rigid molecules should be chosen.
2. Solvate and hydrate crystals are avoided.
3. Structures at high pressures (above ambient pressure) and above room temperature are rejected.
4. Single-crystal diffraction data are preferable over powder diffraction data.
5. Structures resolved via neutron scattering are preferred when available.
6. Structures resolved with XRD with an X-ray discrepancy factor (R) exceeding 5% are normally rejected, unless their sublimation enthalpies are also available or other criteria are met.
Several distance metrics are available for quantifying the deviation between two given crystal structures.51 In our work, the deviation between experimental and predicted structures for a crystal i is described by a metric Gi(p) that consists of two contributions. The first contribution measures the relative deviations between the experimental and predicted unit cell parameter vectors, Xexpi and respectively:
(6) |
(7) |
The contribution of atomic positions to the deviation measure Gi(p) between experimental and predicted structures for crystal i is then given by:
(8) |
The calculation of the relative position vectors is described in detail in the ESI.† In order to avoid introducing an inherent offset in the deviation metric Gi(p), the fractional atomic coordinates in the experimental crystal are computed based on the predicted (rather than the experimental) molecular conformation. This is appropriate provided that the differences between θgas and θexp are small, a condition that needs to be verified when constructing the experimental dataset.
Overall, the function Gi(p) used as a measure of similarity between experimental and predicted structures is defined as:
(9) |
It is worth noting that the above expression aims to balance the contributions of various terms via the application of appropriate scaling factors. Thus, relative (rather than absolute) deviations are used for the unit cell parameters. No such scaling is needed for the position vectors as they are already normalized. Moreover, since the two summations may comprise very different numbers of terms, the first one is divided by the number of unit cell parameters not constrained by symmetry, NXi, and the second one by the number of fractional coordinates describing the asymmetric unit, 3(Nasymi−1).
The quantity Gi(p) is directly incorporated in the objective function of the parameter estimation problem. In analysing the quality of the results of the parameter estimation, the root mean squared deviation of the 15 molecule coordination sphere, rmsd15,52 is used as an additional measure of similarity between predicted and experimental crystals.
A calculated enthalpy of sublimation at temperature T and pressure P, , can be obtained from the predicted lattice energy by the following commonly-used approximation:56–58
(10) |
In previous studies by Arnautova et al.59 and Gavezzotti,60 computed lattice energies were directly compared to experimental sublimation enthalpies without any correction, an approach justified by the authors on the basis of uncertainty in the sublimation enthalpy data. We prefer to apply the correction shown in eqn (10) in order to reduce any systematic error in the comparison between predicted and experimental values. Thus, we employ the following deviation function as a measure of the difference between the experimental and predicted sublimation enthalpies for a given crystal i:
(11) |
(12) |
In general, we seek to establish the values of p that lie within given lower and upper bounds, pl and pu respectively, and lead to a globally minimal value of R(p). To compute R(p) at a given set of parameter values p, we need to perform the lattice energy minimization (5) for each crystal structure i ∈ IG starting from the corresponding experimental structure. Once the optimal solution is obtained, we can compute the geometry deviation Gi(p) and, if i ∈ IE, the sublimation enthalpy deviation Ei(p). The quantities Gi(p) and Ei(p) can then be used to compute R(p) viaeqn (12). Overall, this is a relatively expensive computation as it involves a large number of crystal structure minimizations, but these can be carried out in parallel.
A more serious challenge for the reliable solution of problem (12) is that the objective function R(p) may have multiple local minima, in which case the solution obtained via the application of standard local minimization algorithms may depend on the starting points (i.e. the initial guesses) for the parameters p. In order to increase the likelihood of identifying the globally optimal parameter vector, we adopt a step-wise approach involving the generation of many starting points in the space of parameters p, followed by a gradient-based local minimization from selected starting points:
1. Compute the reference quantity R(pFIT) where pFIT are the values of the parameters in the existing FIT parameterization.
2. Generate N points pk, k = 1, …, N, in the parameter space by means of a Sobol’ sequence61,62 within the defined bounds [pl,pu]. The advantages of low-discrepancy (such as Sobol’) sequences over other methods (e.g. Monte Carlo sampling or uniform grids) for efficiently searching multivariable domains are well established.63
3. For each point pk, evaluate the corresponding R(pk) as described above; discard all points for which R(pk) exceeds the reference value R(pFIT) by more than a specified factor α (e.g. 5) as these are unlikely to yield parameter estimates that are better than the FIT one.
4. Starting from pk as the initial guess, perform the local minimization (eqn (12)) to obtain a (locally) optimal parameter estimate p*,k with a corresponding objective function value R(p*,k). As described in the ESI,† a sequential quadratic programming (SQP) algorithm with a Broyden–Fletcher–Goldfarb–Shanno (BFGS)64 update of the Hessian matrix is used for this purpose.
5. Choose the solution p*,k with the lowest corresponding value R(p*,k) as the best parameter estimate p*.
A flowchart of the overall parameter estimation methodology is shown in Fig. 1.
The quality of geometry reproduction is assessed based on the usual rmsd15,i measure between an experimental and predicted crystal i calculated using the COMPACK52 code. We also consider the rmsd15 averaged over a set of crystal structures.
For assessing the quality of sublimation enthalpy prediction for each individual crystal i, we use the Absolute Deviation (ADi) defined as:
(13) |
(14) |
(15) |
We generally aim to derive parameter estimates for which the average values AAD and rmsd15 do not exceed 5 kJ mol−1 and 1 Å respectively.22
An approach commonly used for reducing the parameter space is to use combining rules for the parameters describing interactions between atoms of different types. The use of such combining rules for the dispersion coefficient does have some physical justification at the level of the Unsöld approximation to the rij−6 dispersion. However, other rules are known to be limited in their accuracy,65 and their use may lead to lower quality of predictions in crystal structure calculations.32,42,66 Therefore, in this study we prefer not to use combining rules, relying instead on the estimation of the cross-interaction parameters directly from experimental data.
On the other hand, and in common with the approach adopted by Pyzer-Knapp et al.,42 we do hold the exponential Bij parameters fixed at the FIT values: the shape of the potential is very sensitive to even small changes in these parameters, which renders their numerical handling as part of the parameter estimation procedure problematic.
Finally, in order to make the problem computationally manageable and ensure compatibility and transferability of the different sets of parameters, the parameter estimation is carried out in a sequential manner as shown schematically in Fig. 2. Similarly to the work of Sun,67 optimal parameters for carbon and hydrogen (Hc) are determined first based on experimental data for hydrocarbons. These are then fixed for the subsequent steps, and interaction parameters for other atom pairs are estimated from data on appropriate compounds.
We then perform an isolated-molecule gas-phase QM calculation for each molecule using Gaussian09 (ref. 68) with the M06 (ref. 69) functional and a 6-31G(d,p) basis set. The resulting wavefunctions are used to generate multipole moments via the distributed multipole analysis method43 as implemented in the GDMA 2.2 program.43
We now fix the three Bij parameters to their FIT values, and aim to determine optimal estimates for the 6 Aij and Cij parameters; the latter constitute our parameter vector p. For the purposes of this estimation, we will allow p to vary within ±40% of the FIT values. As explained in Section 3.3, in order to increase the probability of obtaining globally optimal parameter estimates, we use a Sobol’ sequence to generate 1000 starting points pk, k = 1, …, 1000.
Before proceeding with actually performing local minimizations of function R(p) (cf.eqn (12)) starting from some of the above points, it is instructive to gain some insight as to the appropriate choices of the weighting factors wGi and wEi. For simplicity, we use the same weight values (wG or wE) for all 19 structures i. We consider 15 distinct combinations of these weights as shown in Table 1, and evaluate the objective function R(pk) for each such combination. The 19000 local lattice energy minimizations required for evaluating R(pk), k = 1, …, 1000 are performed with DMACRYS at 0 Pa pressure.
w G | w E |
---|---|
100 | 1 |
50 | 1 |
20 | 1 |
15 | 1 |
10 | 1 |
5 | 1 |
2 | 1 |
1 | 1 |
1 | 2 |
1 | 5 |
1 | 10 |
1 | 15 |
1 | 20 |
1 | 50 |
1 | 100 |
For each weight combination, having evaluated the objective function R(pk) at the 1000 Sobol’ points pk, we select the Sobol’ point with the lowest value of the objective function. This serves as an approximation to the optimal parameter estimate p* that might be obtained using that particular combination of weights. The solutions arising from different weight combinations are compared in Fig. 3 based on the geometry and sublimation deviation functions, G(p*) and E(p*), averaged over the 19 crystal structures under consideration.
The maximum and minimum values of Gi(p*) across all 19 crystal structures i are also depicted on the horizontal axis of Fig. 3 while the maximum and minimum Ei(p*) are shown against the vertical axis. We note that several weight ratios lead to the same solution; in fact, only 4 distinct solutions are obtained, as depicted by the solid symbols in Fig. 3. Based on these results, the weights wG = 1 and wE = 1 appear to offer both the best trade-off between geometry and energy errors, and the smallest variation of these errors across the different structures being considered. Accordingly, these values are chosen to be used throughout our parameter estimation methodology in this paper.
The values of R(pk) for the weights wG = wE = 1 at the 1000 points pk are plotted in Fig. 4. The corresponding reference value R(pFIT) for the FIT parameter values pFIT is also shown. We note that the FIT parameter point, pFIT, lies among the lowest points in this figure, which suggests that the FIT parameter values are already good estimates, a result that was expected given the similarity between the training set used in this work for hydrocarbons and that used by Williams et al.31
The FIT reference value R(pFIT) allows us to prune out any Sobol’ points that are unlikely to result in an improved fit of the experimental data over the already available FIT parameters (cf. step 5 of the algorithm in Fig. 1). Here we apply a cutoff ratio α of approximately 5, which excludes all but 250 of the original Sobol’ points from further consideration. We then solve the optimization problem expressed by eqn (12) starting from each of the remaining 250 points. The initial objective function values R(pk) and the final objective function values after local minimization, R(p*,k), are shown in Fig. 5. It can be seen that the minimization does result in a significant reduction of the objective function R(p). The results suggest a “flat” objective function with many broad and shallow minima; however, it is also possible that some of the points are not true local minima, but simply arise due to premature termination of the challenging local minimization.
The parameter values p* resulting in the lowest objective function value are given in Table 2 where they are also compared with the original FIT parameters. A comparison of the corresponding objective function values is presented in Table 3. The objective function value for p* is 63% lower than the corresponding value for the FIT parameters. The new parameters also achieve 23% and 72% reductions in the geometry and energy deviation functions respectively. Overall, the new parameters yield a significantly better reproduction of both the geometries and energies of the crystals in the training set even in the case of such a relatively simple and well-studied parameter set. Further details of the quality of fit for each individual structure are presented in Table S4 of the ESI.†
Gas-phase molecular conformations are computed at the M06/6-31G(d,p) level of theory. The rigidity of the molecules is ascertained through rmsd1 comparisons between computed and experimental conformations with COMPACK, giving an average rmsd1 of only 0.05 Å. In a previous study by Arnautova et al.,59 the C–H bond lengths of the experimental conformations were adjusted to the average experimental value from neutron diffraction data to overcome issues of unreliable hydrogen positions determined by X-ray diffraction. We found the C–H bond lengths in the gas-phase conformations to be approximately 1.087 Å, which is very close to the average experimental value of 1.089 Å derived from neutron data given in the CSD. Therefore, no modification was applied to the C–H bond lengths.
Experimentally determined enthalpies of sublimation were identified for 53 of the 106 molecules, which is significantly more than the number used in the parameterization of FIT. The sublimation enthalpies used in this work are given in Table S1 of the ESI.†
Atom type | FITa | This work | |||||
---|---|---|---|---|---|---|---|
i | j | A ij /eV | B ij /Å | C ij /eV Å6 | A ij /eV | B ij /Å | C ij /eV Å6 |
a Parameters from (ref. 37). | |||||||
C | C | 3832.147 | 0.278 | 25.287 | 2299.288 | 0.278 | 18.020 |
C | HC | 689.537 | 0.272 | 5.979 | 449.3172 | 0.272 | 6.265 |
HC | HC | 124.072 | 0.267 | 1.414 | 173.559 | 0.267 | 1.957 |
C | N | 3179.515 | 0.271 | 19.007 | 3355.322 | 0.271 | 24.171 |
N | N | 2638.029 | 0.265 | 14.286 | 3687.092 | 0.265 | 17.577 |
HC | N | 572.105 | 0.266 | 4.494 | 516.587 | 0.266 | 2.696 |
C | O | 3022.85 | 0.265 | 17.16 | 1352.853 | 0.265 | 10.164 |
HC | O | 543.916 | 0.260 | 4.057 | 304.084 | 0.260 | 2.481 |
O | O | 2384.466 | 0.253 | 11.645 | 963.846 | 0.253 | 18.632 |
C | S | 3990.989 | 0.290 | 38.957 | 2819.374 | 0.290 | 46.499 |
N | S | 3311.305 | 0.282 | 29.281 | 3760.816 | 0.282 | 31.148 |
HC | S | 718.118 | 0.284 | 9.211 | 802.131 | 0.284 | 9.891 |
S | S | 4156.415 | 0.303 | 60.016 | 3401.359 | 0.303 | 95.872 |
C | HN | 446.952 | 0.242 | 2.374 | 478.038 | 0.242 | 3.561 |
HC | HN | 80.422 | 0.238 | 0.561 | 40.211 | 0.238 | 0.842 |
N | HN | 370.834 | 0.237 | 1.784 | 297.752 | 0.237 | 0.892 |
HN | HN | 52.129 | 0.215 | 0.223 | 78.194 | 0.215 | 0.111 |
HO | C | 446.952 | 0.242 | 2.374 | 519.271 | 0.242 | 2.488 |
HO | HC | 80.422 | 0.238 | 0.561 | 128.676 | 0.238 | 0.224 |
HO | O | 352.562 | 0.232 | 1.611 | 231.169 | 0.232 | 0.644 |
HO | HO | 52.129 | 0.215 | 0.223 | 61.708 | 0.215 | 0.089 |
C | Cl | 6060.173 | 0.281 | 45.040 | 2627.877 | 0.290 | 39.868 |
HC | Cl | 1090.436 | 0.276 | 10.650 | 1371.320 | 0.276 | 17.708 |
Cl | Cl | 9583.584 | 0.285 | 80.224 | 7940.587 | 0.285 | 67.687 |
Atom type | S-Ab | S-No | |||||
---|---|---|---|---|---|---|---|
i | j | A ij /eV | B ij /Å | C ij /eV Å6 | A ij /eV | B ij /Å | C ij /eV Å6 |
C | S | 2623.650 | 0.308 | 55.295 | 3064.776 | 0.308 | 46.911 |
N | S | 2176.830 | 0.299 | 41.562 | 2542.830 | 0.299 | 35.260 |
HC | S | 472.092 | 0.301 | 13.074 | 551.467 | 0.301 | 11.092 |
S | S | 1796.306 | 0.345 | 120.918 | 2451.128 | 0.345 | 87.029 |
The optimized parameters differ significantly from the original set. However, because of the high degree of correlation between the Aij and Cij parameters, large changes in their individual values do not necessarily result in a large change in the potential. Fig. 6 shows the differences in equilibrium distances and well depths of the potentials. For most pairwise interactions, these differences are relatively small. A striking exception is the O⋯O interaction for which we observe a significant change in the potential well depth and location of the minimum (cf.Fig. 7a), with the new parameters indicating stronger and shorter-range interactions. On the other hand, the differences between our potentials for the interactions of oxygen with other atoms and the corresponding FIT potentials are not very pronounced. This is an indication of the inaccuracies that may be introduced by relying on combining rules for estimating cross-interaction parameters.
Fig. 6 Comparison of equilibrium distances and well depths for 24 interactions as calculated with FIT (light grey bars) and our new parameter set (dark grey bars). The data along with the results for the S-Ab and S-No parameter sets are given in Table S3 of the ESI.† |
Fig. 7 Repulsion–dispersion potential curves for (a) O⋯O, (b) S⋯S and (c) C⋯S interactions. The optimized potential curves (continuous curves) are compared with the FIT potential curves (dashed curves) and with S-No70 () and S-Ab71 (). |
Shorter equilibrium distances and deeper wells are also observed for all S interactions apart from the HC⋯S one (cf.Fig. 6 and 7b and c). We note that significant differences also exist among the FIT, S-Ab and S-No parameter sets as shown in Fig. S3 of the ESI.†
Both oxygen interactions and sulphur interactions were fitted to a number of sublimation enthalpy data. We note that no such data were used in the derivation of the FIT set. The resulting improvement in sublimation enthalpy prediction demonstrates the impact of using energetic data to fit the potentials.
Training set | Absolute sublimation enthalpy error (kJ mol−1) | |||||||
---|---|---|---|---|---|---|---|---|
FIT | This work | S-Ab | S-No | |||||
AAD | maxAD | AAD | maxAD | AAD | maxAD | AAD | maxAD | |
Hydrocarbons | 2.66 | 4.83 | 1.47 | 4.08 | ||||
Azahydrocarbons | 3.43 | 6.80 | 1.28 | 3.56 | ||||
Oxohydrocarbons | 9.46 | 18.22 | 6.88 | 16.66 | ||||
Organosulphur compounds | 20.97 | 38.71 | 5.29 | 10.44 | 7.94 | 16.74 | 25.73 | 46.85 |
Amines, imidazoles | 11.00 | 19.40 | 9.17 | 13.67 | ||||
Carboxylic acids, alcohols | 13.98 | 24.93 | 5.94 | 10.50 | ||||
Chlorohydrocarbons | 7.30 | 8.90 | 0.15 | 0.46 | ||||
Overall | 8.78 | 38.71 | 4.11 | 16.66 |
Absolute rmsd15 error (Å) | ||||||||
---|---|---|---|---|---|---|---|---|
FIT | This work | S-Ab | S-No | |||||
Average | Max | Average | Max | Average | Max | Average | Max | |
Hydrocarbons | 0.349 | 0.922 | 0.285 | 0.985 | ||||
Azahydrocarbons | 0.223 | 0.747 | 0.231 | 0.631 | ||||
Oxohydrocarbons | 0.342 | 0.728 | 0.306 | 0.808 | ||||
Organosulphur compounds | 0.321 | 1.047 | 0.326 | 1.047 | 0.298 | 0.996 | 0.430 | 1.195 |
Amines, imidazoles | 0.289 | 0.581 | 0.291 | 0.640 | ||||
Carboxylic acids, alcohols | 0.336 | 0.868 | 0.434 | 0.875 | ||||
Chlorohydrocarbons | 0.246 | 0.529 | 0.263 | 0.735 | ||||
Overall | 0.305 | 1.047 | 0.309 | 1.047 |
In addition to the training set, we employ a cross-validation set consisting of 39 crystal structures for representative molecules for all the classes of compounds considered here (see Table S2 in the ESI†). For six of these structures, the sublimation enthalpy is also available. The criteria for the choice of the cross-validation set were similar to those for the training set (cf. Section 3.1) although, due to the limited availability of such data for rigid molecules, the requirements regarding experimental error were less stringent in some cases. The sublimation enthalpy results reported in Table 8 indicate similar improvements to those for the training set (cf.Table 7). However, the sublimation enthalpy AAD of 7.87 kJ mol−1 is larger than the corresponding value for the training set. This may be due to the relative sparsity of energetic data, which leads to reduced statistical significance for the parameters. We also note that some of the experimental sublimation enthalpy values in the cross-validation set are less reliable than the enthalpies reported for the compounds in the training set.
Training set | Absolute sublimation enthalpy error (kJ mol−1) | |||||||
---|---|---|---|---|---|---|---|---|
FIT | This work | S-Ab | S-No | |||||
AAD | maxAD | AAD | maxAD | AAD | maxAD | AAD | maxAD | |
Hydrocarbons | 8.65 | 13.49 | 8.31 | 16.82 | ||||
Azahydrocarbons | 9.51 | 9.51 | 7.49 | 7.49 | ||||
Amines, imidazoles | 11.45 | 11.45 | 4.91 | 4.91 | ||||
Carboxylic acids, alcohols | 23.26 | 23.26 | 9.87 | 9.87 | ||||
Overall | 11.69 | 23.26 | 7.87 | 16.82 |
Absolute rmsd15 error (Å) | ||||||||
---|---|---|---|---|---|---|---|---|
FIT | This work | S-Ab | S-No | |||||
Average | Max | Average | Max | Average | Max | Average | Max | |
Hydrocarbons | 0.201 | 0.286 | 0.169 | 0.285 | ||||
Azahydrocarbons | 0.402 | 0.747 | 0.317 | 0.433 | ||||
Oxohydrocarbons | 0.147 | 0.188 | 0.197 | 0.289 | ||||
Organosulphur compounds | 0.239 | 0.340 | 0.275 | 0.499 | 0.221 | 0.390 | 0.276 | 0.450 |
Amines, imidazoles | 0.327 | 0.574 | 0.275 | 0.377 | ||||
Carboxylic acids, alcohols | 0.526 | 1.828 | 0.579 | 1.537 | ||||
Chlorohydrocarbons | 0.246 | 0.316 | 0.235 | 0.311 | ||||
Overall | 0.294 | 1.828 | 0.282 | 1.537 |
Fig. 8a shows parity plots between experimental and computed sublimation enthalpies for both the training and cross-validation sets. A significant improvement over the FIT set is evident across both sets. Fig. 8b focuses on the organosulphur compounds. It is clear that the accuracy of sublimation enthalpy predictions varies substantially across the previously derived parameter sets (FIT, S-No, and S-Ab). The improved predictive accuracy of the new parameters is important as the study of molecular crystals containing sulphur has so far been hampered by the limited availability of reliable transferable parameters.
As indicated by the entries in the last rows of Tables 7 and 8, our new parameter set does not have a clear overall advantage over FIT in terms of rmsd15. This is partly explained by the fact that we used a different metric for geometry in our objective function. In our methodology, improved geometry reproduction could be achieved by setting wG ≫ wE), thereby placing greater importance on the geometry terms than the energy terms. However, the results presented here demonstrate that using equal weights leads to a significant increase in the accuracy of the computed sublimation enthalpies while maintaining good accuracy in geometry reproduction. In particular, as shown in Fig. 9, the new parameters lead to fewer points showing large errors in the sublimation enthalpy (>5 kJ mol−1) or/and in the rmsd15 (>0.5 Å).
The estimation of the repulsion–dispersion parameters in the lattice energy model has been formulated mathematically as a weighted least-squares minimization of the deviation between the computed and experimentally measured crystal geometries and sublimation enthalpies. Important aspects of our formulation are the weighting and scaling schemes used for balancing the contributions of different elements of the experimental dataset. The proposed solution algorithm aims to mitigate the large computational cost associated with this problem and to reduce the risk of the minimization being trapped in local minima.
The feasibility and effectiveness of the proposed approach were demonstrated by the estimation of a set of atom–atom interaction parameters for the isotropic Buckingham potential, consistent with distributed multipole electrostatics derived from charge densities computed via isolated-molecule quantum mechanical calculations at the M06/6-31G(d,p) level of theory.|| Interaction parameters between unlike atom types were also included directly in the estimation instead of being approximated via combining rules. The study reported here focused on organic molecular crystals with N, O, S and Cl atoms, and made use of a training dataset containing 106 crystal geometries and the sublimation enthalpies for 53 of these structures. A separate set of 39 structures (including 6 sublimation enthalpy values) was used for cross-validation of the derived parameter estimates. Overall, compared to the commonly used FIT parameter set, the new parameter values were found to result in significantly improved sublimation enthalpy predictions while maintaining a comparable quality of geometry reproduction.
In the work reported here, the Bij parameters of the Buckingham potential were kept fixed at the corresponding FIT values. This was mainly done in order to avoid the numerical problems caused by the unphysical predictions of this potential at low interatomic separation distances. Including these parameters in the estimation set could result in improved predictive accuracy. However, with the availability of systematic ways of estimating reliable and consistent parameter values for any form of repulsion–dispersion potential, it may be worth considering alternative potential forms that do not exhibit such unphysical behavior in the first place.
Footnotes |
† Electronic supplementary information (ESI) available: Further information on calculation methods, training and cross-validation data sets, detailed description of the new potentials and results of crystal packing calculations with all parameter sets. See DOI: 10.1039/c8fd00064f |
‡ Fractional coordinates can be computed directly from the Cartesian coordinates Y and the unit cell parameters X, i.e. Y′ = Y′(X,Y). |
§ The first heavy atom listed in the SHELX file. |
¶ And, in the case of flexible molecules, the model used for intramolecular energy contributions. |
|| Parameter values consistent with QM calculations at the HF/6-31G(d,p) and MP2/6-31G(d,p) levels of theory are reported elsewhere.72 |
This journal is © The Royal Society of Chemistry 2018 |