Sadra
Kashefolgheta
a,
Shuzhe
Wang
a,
William E.
Acree
Jr.
b and
Philippe H.
Hünenberger
*a
aLaboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland. E-mail: phil@igc.phys.chem.ethz.ch; Tel: +41 44 632 55 03
bDepartment of Chemistry, University of North Texas, 1155 Union Circle Drive #305070, Denton, Texas 76203, USA
First published on 30th April 2021
Experimental solvation free energies are nowadays commonly included as target properties in the validation of condensed-phase force fields, sometimes even in their calibration. In a previous article [Kashefolgheta et al., J. Chem. Theory. Comput., 2020, 16, 7556–7580], we showed how the involved comparison between experimental and simulation results could be made more systematic by considering a full matrix of cross-solvation free energies . For a set of N molecules that are all in the liquid state under ambient conditions, such a matrix encompasses N × N entries for
considering each of the N molecules either as solute (A) or as solvent (B). In the quoted study, a cross-solvation matrix of 25 × 25 experimental
value was introduced, considering 25 small molecules representative for alkanes, chloroalkanes, ethers, ketones, esters, alcohols, amines, and amides. This experimental data was used to compare the relative accuracies of four popular condensed-phase force fields, namely GROMOS-2016H66, OPLS-AA, AMBER-GAFF, and CHARMM-CGenFF. In the present work, the comparison is extended to five additional force fields, namely GROMOS-54A7, GROMOS-ATB, OPLS-LBCC, AMBER-GAFF2, and OpenFF. Considering these nine force fields, the correlation coefficients between experimental values and simulation results range from 0.76 to 0.88, the root-mean-square errors (RMSEs) from 2.9 to 4.8 kJ mol−1, and average errors (AVEEs) from −1.5 to +1.0 kJ mol−1. In terms of RMSEs, GROMOS-2016H66 and OPLS-AA present the best accuracy (2.9 kJ mol−1), followed by OPLS-LBCC, AMBER-GAFF2, AMBER-GAFF, and OpenFF (3.3 to 3.6 kJ mol−1), and then by GROMOS-54A7, CHARM-CGenFF, and GROMOS-ATB (4.0 to 4.8 kJ mol−1). These differences are statistically significant but not very pronounced, and are distributed rather heterogeneously over the set of compounds within the different force fields.
In a recent article,43 we reported on an attempt to make this approach more systematic, by introducing a full matrix of standard Gibbs cross-solvation free energies . For a set of N molecules that are all in the liquid state under ambient conditions, such a matrix encompasses N × N entries for
considering each of the N molecules either as solute (A) or as solvent (B). The point-to-point or Ben–Naim standard-state convention44–46 was adopted, which implies that the same reference molar volumes are employed for the ideal-gas and the ideal-solution states. In this convention,
corresponds to the reversible work for transferring one molecule of A from a fixed point in vacuum (infinitesimal-pressure limit) to a fixed point in the bulk of the solvent B (infinite-dilution limit), expressed on a per-mole basis. The transfer is performed at a constant temperature T− = 298.15 K (both phases) and at a constant pressure Po = 1 bar (solution phase). Note that these cross-solvation free energies include the self-solvation free energies
as special cases, along the diagonal of the matrix.
A set of N = 25 organic molecules were considered, shown in Fig. 1 along with the acronyms used to refer to them in the article (see also Table 1 in ref. 43 for key properties of these compounds). These molecules involve one to seven carbon atoms and are representative for alkanes, chloroalkanes, ethers, ketones, esters, alcohols, amines, and amides. The set is divided into three categories based on the molecule polarity (as estimated by its molecular dipole moment), namely low-polarity (LP), medium-polarity (MP), or high-polarity (HP). Based on seven experimental data sources,47–53 and after careful data curation (see Appendix A in ref. 43), a complete matrix of 625 entries was constructed, which is shown in Fig. 2a along with standard deviations σ over the available experimental estimates in Fig. 2b (see Table S1 in ref. 43 for the numerical values; the corresponding data files, labelled version 1.1, are also freely downloadable from the net under ref. 54).
![]() | ||
Fig. 1 Molecular structures, acronyms, and identifiers of the 25 organic molecules considered in this work. The three- to five-letter acronyms are used to refer to each molecule in the text. For water, the acronym H2O is sometimes replaced by the specification of a water model, namely the simple point charge water model116 (SPC) or the three-point transferable intermolecular potential TIP3P model117,118 (TP3). The identifiers are the International Union of Pure and Applied Chemistry (IUPAC) name and the Chemical Abstract Service (CAS) registry number. Some key experimental properties (molecular dipole moment μ; melting temperature Tm, and boiling temperature Tb at 1 bar; liquid density ρliq, vaporization enthalpy ΔHvap, and static relative dielectric permittivity ε at 298.15 K and 1 bar) can be found in Table 1 of ref. 43. The molecules are listed in order of decreasing polarity, as estimated by μ, and assigned to three categories of polarities, labelled high (HP), medium (MP), or low (LP) polarity. |
![]() | ||
Fig. 2 Recommended experimental values ![]() |
In the previous article,43 this matrix of experimental values was used to compare the relative accuracies of four popular condensed-phase force fields, namely GROMOS-2016H66 (ref. 32), CHARMM-CGenFF (ref. 55 and 56), OPLS-AA (ref. 57–64), and AMBER-GAFF (ref. 65 and 66). In broad terms, and in spite of very different functional-form choices and parametrization strategies, the four force fields were found to perform similarly well. Relative to the experimental values, the root-mean-square errors (RMSEs) ranged between 2.9 and 4.0 kJ mol−1 (lowest value of 2.9 for GROMOS-2016H66 and OPLS-AA), and the average errors (AVEEs) ranged between −0.8 and +1.0 kJ mol−1 (lowest magnitude of 0.2 for CHARMM-CGenFF and AMBER-GAFF). These differences are statistically significant but not very pronounced, especially considering the influence of outliers, some of which possibly caused by inaccurate experimental data.
In the present study, we extend the comparison to five additional parameter sets, namely GROMOS-54A7 (ref. 27), GROMOS-ATB (ref. 36 and 67), OPLS-LBCC (ref. 57, 58 and 68; the OPLS-AA force field57–64 with 1.14 × CM1A-LBCC charges68), AMBER-GAFF2 (ref. 66 and 69; GAFF2 as distributed within the Antechamber package70 in AmberTools16;71 see also ref. 37 and 72–75 for recent validation work), and OpenFF (ref. 72, 76 and 77; also accessible under ref. 78). The three GROMOS sets rely on a united-atom representation of the aliphatic groups, whereas all the other sets correspond to all-atom force fields.
Fam. | Set | AA | New | EL Meth. | Charge Deriv. | 3rd Nei. EL | LJ Meth. | Comb. Rules | 3rd Nei. LJ | H2O | NS | N FULL | N COMP | N CONF |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GROMOS | 2016H66 | — | — | RF 1.4 nm | Fit to exp. | Normal | Cutoff 1.4 nm | Geometric Mean | Spec. Set | SPC | — | 25 × 25 | 22 × 22 − 2 | 340 |
No Corr. | ||||||||||||||
GROMOS | 54A7 | — | × | RF 1.4 nm | Fit to exp. | Normal | Cutoff 1.4 nm | Geometric Mean | Spec. Set | SPC | C2CL2 | 22 × 22 | 20 × 20 − 1 | 300 |
No Corr. | DAMD, PYRI | |||||||||||||
GROMOS | ATB | — | × | RF 1.4 nm | Fit to QM | Normal | Cutoff 1.4 nm | Geometric Mean | Spec. Set | SPC | — | 25 × 25 | 22 × 22 − 2 | 340 |
No Corr. | ||||||||||||||
CHARMM | CGenFF | × | — | PME | Fit to QM | Normal | Cutoff 1.2 nm + | Lorentz-Berthelot | Spec. Set | TP3 | CHCL3, | 23 × 23 | 22 × 22 − 2 | 340 |
Tail Corr. + Force Swi. 0.2 nm | CCL4 | |||||||||||||
OPLS | AA | × | — | PME | Fit to exp. | ×0.5 | Cutoff 1.1 nm + | Geometric Mean | ×0.5 | TP3 | — | 25 × 25 | 22 × 22 − 2 | 340 |
Tail Corr. + Pot. Swi. 0.05 nm | ||||||||||||||
OPLS | LBCC | × | × | PME | Fit to QM | ×0.5 | Cutoff 1.0 nm + Tail Corr. + Pot. Swi. 0.05 nm | Geometric Mean | ×0.5 | TP3 | CCL4 | 24 × 24 | 22 × 22 − 2 | 340 |
AMBER | GAFF | × | — | PME | Fit to QM | ×0.833 | Cutoff 1.1 nm + Tail Corr. + Pot. Swi. 0.1 nm | Lorentz–Berthelot | ×0.5 | TP3 | — | 25 × 25 | 22 × 22 − 2 | 340 |
AMBER | GAFF2 | × | × | PME | Fit to QM | ×0.833 | Cutoff 1.1 nm + Tail Corr. + Pot. Swi. 0.1 nm | Lorentz–Berthelot | ×0.5 | TP3 | — | 25 × 25 | 22 × 22 − 2 | 340 |
OpenFF | S99F1.0 | × | × | PME | Fit to QM | ×0.833 | Cutoff 1.1 nm + Tail Corr. + Pot. Swi. 0.1 nm | Lorentz–Berthelot | ×0.5 | TP3 | — | 25 × 25 | 22 × 22 − 2 | 340 |
Except for GROMOS, all the force fields under comparison rely on a lattice-sum representation of the electrostatic interactions based on the particle-mesh Ewald79,80 (PME) algorithm. In contrast, GROMOS uses a charge-group cutoff of 1.4 nm along with a reaction-field81–84 (RF) approximation for the omitted electrostatic interactions beyond this distance.
The atomic partial charges in GROMOS-2016H66, GROMOS-54A7 and OPLS-AA are derived by optimization against experimental pure-liquid properties (and, possibly, solvation free energies) considering small organic molecules. In GROMOS-ATB, the charges are fitted to reproduce the quantum-mechanical (QM) electrostatic potential (ESP) using the Merz–Kollman (MK) scheme,85,86 based on structures optimized at the B3LYP level of theory87–90 with a 6-31G* basis set91 and a polarizable continuum model92 (PCM) for implicit solvation. The charges of chemically equivalent atoms are subsequently equalized.36 In CHARMM-CGenFF, initial values for the charges are estimated either by analogy with similar groups in the CHARMM force field, or based on calculations at the second-order Møller–Plesser (MP2) level of theory93 with a 6-31G(d) basis set91 and the MK scheme.85,86 They are then further optimized to reproduce the QM gas-phase dipole moment and the interaction energy with a water molecule in different positions and orientations.55,56 In OPLS-LBCC, the charges correspond to 1.14 × CM1A-LBCC charges, i.e. CM1A charges94 amplified by an empirical factor95,96 of 1.14 and further adjusted by localized bond-charge corrections68 (BCCs). The CM1A charges are themselves derived following an empirical scheme94 based on a Mulliken population analysis97 of the electron density calculated at the AM1 semi-empirical level.98 In AMBER-GAFF and AMBER-GAFF2, the charges are calculated using the restricted ESP (RESP) fitting protocol,99 based on structures optimized at the Hartree–Fock (HF) level of theory100,101 with a 6-31G(d) basis set91 in vacuum. Finally, in OpenFF, AM1-BCC charges102,103 are used. These are derived from a Mulliken population analysis97 performed at the AM1 level98 by application of BCCs, with the goal of approximating HF/6-31G* RESP charges.
All the force fields under comparison are based on the same description of the van der Waals interactions relying on the Lennard-Jones function104
![]() | (1) |
The CHARMM, AMBER and OpenFF force fields rely on a strict Lorentz–Berthelot combination rule111,112 for the Lennard-Jones interaction parameters. In contrast, the GROMOS and OPLS force fields rely on a geometric-mean combination rule.113,114 For GROMOS, the application of this rule admits two exceptions. First, the repulsive coefficients C12,ii and C12,jj used to define C12,ij can be taken from three alternative sets based on the types of the involved atoms, depending whether they are non-hydrogen-bonding (type I), uncharged hydrogen-bonding (type II), or oppositely charged (type III, for the negative species only). Only types I and II are relevant for the neutral molecules considered here. The second exception is that pair-specific parameter are used for the GROMOS-2016H66 and GROMOS-54A7 CHCL3 model.115
For all the force fields considered, first and second covalent neighbors are excluded from any non-bonded interaction. In GROMOS and CHARMM, the electrostatic interaction between third covalent neighbors is unaltered, and the Lennard-Jones interaction is defined based on a special set of third-neighbor parameters. For OPLS, both the electrostatic and the Lennard-Jones interactions are scaled by a factor of 0.5 for third neighbors. For AMBER and OpenFF, the electrostatic and Lennard-Jones interactions involving third neighbors are scaled by factors of 0.833 and 0.5, respectively.
Note that the Lennard-Jones interaction parameters of OpenFF relevant for the molecules considered here are in large part imported from AMBER-GAFF, so that the main difference between the two force fields resides in the covalent interaction parameters and the charge-derivation procedure. Similarly, since AMBER-GAFF and AMBER-GAFF2 have the same atomic partial charges, the only difference between the two force fields resides in the covalent and Lennard-Jones interactions parameters that were reoptimized for AMBER-GAFF2.
For each of the nine force fields, the compatible water model as well as a small subset of organic molecules possibly omitted from the simulations are also indicated in Table 1. The GROMOS force field is compatible with the simple point charge (SPC) water model.116 All the other force fields rely on the three-point transferable intermolecular potential (TIP3P) model,117,118 further labelled TP3. This model is also adopted as a usual choice for OPLS-AA, although the force field was originally parametrized using the four-site TIP4P water model117,118 instead61,119,120 (unlike OPLS-LBCC, which directly121 relied on TIP3P). Note also that the CHARMM simulations actually rely on a slightly modified mTIP3P water model,122 involving non-zero Lennard-Jones interaction parameters on the hydrogen atoms. For simplicity, the same acronym TP3 is used to refer to this model as well. For each force field, the compatible water model is the only one considered in the present simulations (see ref. 43 for results using CHARMM-CGenFF, OPLS-AA, and AMBER-GAFF along with the SPC water model).
In GROMOS-2016H66 and GROMOS-54A7, the models for CHCL3 and CCL4 are taken from ref. 115 and 123, respectively. The model for C2CL2 in GROMOS-2016H66 is taken from ref. 124 (see ref. 43 for results with alternative CHCL3 and CCl4 models from ref. 124). The OPLS model for CHCL3 is taken from ref. 125. The simulations involving DAMD, PYRI and C2CL2 are not performed with GROMOS-54A7, as this set does not encompass building blocks for these molecules. Similarly, the simulations involving CHCL3 and CCL4 with CHARMM-CGenFF are omitted, as the corresponding parameters could not be obtained from the CHARMM-GUI server.126–129 The same applies to CCL4 in OPLS-LBCC, where the parameters could not be derived using the LigParGen server.121 For each of the nine force fields, the resulting number NFULL of calculated entries in the cross-solvation matrix are also reported in Table 1.
All simulations involved a single solute molecule in a cubic computational box containing 512 solvent molecules, except for the solvent water (1000 molecules). They were performed under periodic boundary conditions in the isothermal–isobaric (NPT) ensemble at Po = 1 bar and T− = 298.15 K. The temperature was maintained close to T− by application of stochastic dynamics (SD) with a friction coefficient set to 10 ps−1, except for the solvent water with GROMOS (91 ps−1). The pressure was maintained close to Po by application of a weak-coupling barostat137 with a coupling time of 1 ps and an isothermal compressibility of 4.5 × 10−5 bar−1, except for GROMOS (0.5 ps and 4.575 × 10−4 kJ mol−1 nm3). The Langevin equation of motion was integrated using the leap-frog algorithm138 (SD variant139) with a timestep of 2 fs. The initial solute–solvent configurations were generated using the Packmol software,140 and equilibrated at Po and T− during 0.2 ns, resulting in box edges ranging between 3.1 and 5.0 nm.
To calculate the solvation free energy, the solute–solvent Lennard-Jones and electrostatic interactions were gradually turned off in the Hamiltonian according to a coupling parameter λ, changing from zero (fully coupled) to one (fully decoupled). Note that the use of SD alleviates possible issues related to the lack of kinetic-energy exchange between solute and solvent close to the decoupled state. All calculations relied on simulations at fixed successive λ-values, each involving a sampling time of 3 ns after at least 0.1 ns equilibration.
For the GROMOS force field, these free-energy calculations were performed using the GROMOS software.141–144 They relied on thermodynamic integration145 with Simpson quadrature146–148 (Kepler's wine barrel method149) considering 21 equispaced λ-points. The solute–solvent electrostatic and Lennard-Jones interactions were decoupled simultaneously using a soft-core scheme150 with the parameters αLJ = 0.5 and αC = 0.5 nm2. The electrostatic interactions were calculated using a twin-range cutoff approach151 with short- and long-range cutoff distances set to 0.8 and 1.4 nm, respectively, and a frequency of 5 timesteps for the update of the short-range pairlist and intermediate-range interactions. A RF correction81–84 was applied to account for the mean effect of the omitted electrostatic interactions beyond the long-range cutoff distance, using the permittivities listed in Table 1 of ref. 43, which correspond to experimental values except for water (permittivity of the SPC model). No correction was used for the corresponding long-range Lennard-Jones interactions. The SHAKE algorithm152 was applied to constrain all bond-lengths with a relative geometric tolerance of 10−4. For water (in all three GROMOS variants), as well as CHCL3 and CCL4 (in GROMOS-2016H66 and GROMOS-54A7), distance constraints were applied as well to keep the bond-angles rigid. The bond-angles in all the other molecules considered were treated as flexible. Note that GROMOS-2016H66 and GROMOS-54A7 rely on pair-specific Lennard-Jones interaction parameters for the GROMOS CHCL3 model.115 The center of mass translation of the computational box was removed every 2 ps. In a separate set of calculations, the electrostatic component ΔGELE of the solvation free energy was calculated by turning off only the electrostatic solute–solvent interactions, using a linear coupling scheme and 21 equispaced λ-points. The Lennard-Jones component ΔGVDW was then deduced by subtracting ΔGELE from the total solvation free energy.
For CHARMM, OPLS, AMBER, and OpenFF, the free-energy calculations were performed using the GROMACS software133,134 (version 5.0.2). They relied on the Bennett acceptance ratio153 (BAR) as estimator considering a series of successive λ-points. The electrostatic and Lennard-Jones interactions were decoupled in two steps. In a first step, the electrostatic component ΔGELE was calculated by turning off the solute–solvent electrostatic interactions, using a linear coupling scheme and 21 equispaced λ-points. In a second step, the Lennard-Jones component ΔGVDW was calculated by switching off the solute–solvent Lennard-Jones interactions of the uncharged solute, using a soft-core coupling scheme150 with αLJ = 0.5 and 25 λ-points at 0.00, 0.06, 0.12, 0.18, 0.24, 0.30, 0.36, 0.42, 0.46, 0.50, 0.52, 0.54, 0.56, 0.58, 0.60, 0.64, 0.68, 0.72, 0.76, 0.80, 0.84, 0.88, 0.92, 0.96 and 1.00. The electrostatic interactions were calculated using the PME scheme,79,80 with an interpolation order of 6 and a grid spacing of 0.12 nm. A long-range Lennard-Jones correction was included in the calculation of the energy and virial.105 The LINCS algorithm154 was applied to constrain all bond-lengths with an order of 12, except for water (SETTLE algorithm155 to enforce full rigidity; for the self-solvation of water, the LINCS algorithm was applied for the solute water, as GROMACS does not permit the application of SETTLE to more than one type of molecules). Except for water, the bond-angles in all the molecules considered were treated as flexible (i.e. not constrained). The center of mass translation was removed every 0.2 ps.
The cross-solvation free energies calculated using the above protocols automatically match the point-to-point or Ben-Naim standard-state convention adopted for . An investigation of the sensitivity of the results to the simulation time, box size, and number of λ-points is provided in Section S4 of ref. 43, and suggests that the uncertainties affecting the calculated
values are on the order of 1–2 kJ mol−1. The mean of the purely statistical error (estimated by block averaging) over all solute–solvent pairs evaluates to 1.0–1.1 kJ mol−1 for the GROMOS calculations and to 0.2–0.3 kJ mol−1 for the GROMACS calculations. This difference reflects the use of a two-step calculation procedure in GROMACS, involving about twice as much sampling time.
For comparing the force fields, the COMP set is considered in the first place, as it contains exactly the same number of points for eight of the force fields, and only slightly fewer for GROMOS-54A7. The propagation of the errors affecting the NCOMP individual results (estimated to 1–2 kJ mol−1 based on Section S4 of ref. 43) onto corresponding errors affecting the RMSE, AVUE and AVEE values involves a scaling by NCOMP−1/2. Differences between force fields that are larger than about 0.1 kJ mol−1 are thus significant.
![]() | ||
Fig. 3 Correlation between experimental and calculated cross-solvation free energies for the nine force fields considered. The individual results (points), the linear-regression line (solid red line), the identity line (solid black line), and the deviation lines (identity ±2kBT−, dashed black lines) are displayed for each of the nine force fields (Table 1). The ![]() |
Fam. | Set | FULL R | FULL RMSE | FULL AVUE | FULL AVEE | COMP R | COMP RMSE | COMP AVUE | COMP AVEE | N COMP | N L-DevCOMP | N M-DevCOMP | N H-DevCOMP | CONF R | CONF RMSE | CONF AVUE | CONF AVEE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[kJ mol−1] | [kJ mol−1] | [%] | [%] | [%] | [kJ mol−1] | ||||||||||||
GROMOS | 2016H66 | 0.84 | 3.2 | 2.4 | −0.2 | 0.88 | 2.9 | 2.2 | −0.8 | 482 | 66.4 | 24.9 | 8.7 | 0.92 | 2.4 | 1.9 | −0.8 |
GROMOS | 54A7 | 0.85 | 3.9 | 2.9 | −1.0 | 0.87 | 4.0 | 3.0 | −1.5 | 399 | 54.6 | 24.6 | 20.8 | 0.90 | 3.8 | 2.8 | −1.8 |
GROMOS | ATB | 0.75 | 4.6 | 3.3 | −0.9 | 0.76 | 4.8 | 3.4 | −1.0 | 482 | 49.4 | 30.3 | 20.3 | 0.83 | 3.7 | 2.8 | −0.6 |
CHARMM | CGenFF | 0.81 | 4.1 | 2.5 | +0.4 | 0.82 | 4.0 | 2.4 | +0.2 | 482 | 70.7 | 17.6 | 11.6 | 0.92 | 2.6 | 1.5 | +0.5 |
OPLS | AA | 0.87 | 3.0 | 2.1 | +1.0 | 0.88 | 2.9 | 2.1 | +1.0 | 482 | 66.4 | 24.9 | 8.7 | 0.92 | 2.6 | 1.9 | +1.2 |
OPLS | LBCC | 0.81 | 3.7 | 2.7 | −0.3 | 0.85 | 3.3 | 2.4 | +0.2 | 482 | 65.3 | 24.1 | 10.6 | 0.91 | 2.5 | 2.0 | +0.5 |
AMBER | GAFF | 0.85 | 3.5 | 2.5 | −0.0 | 0.86 | 3.6 | 2.7 | −0.2 | 482 | 58.3 | 27.4 | 14.3 | 0.91 | 3.0 | 2.2 | −0.1 |
AMBER | GAFF2 | 0.85 | 3.5 | 2.5 | +0.0 | 0.86 | 3.4 | 2.6 | −0.3 | 482 | 61.6 | 26.4 | 12.0 | 0.90 | 2.8 | 2.1 | −0.2 |
OpenFF | S99F1.0 | 0.82 | 3.6 | 2.5 | −0.1 | 0.82 | 3.6 | 2.5 | −0.3 | 482 | 62.2 | 25.3 | 12.5 | 0.88 | 2.8 | 2.1 | +0.3 |
Single-value | — | — | 5.3 | 3.7 | 0.0 | — | 5.4 | 3.9 | 0.0 | 482 | 43.6 | 33.0 | 23.4 | — | 5.4 | 3.8 | 0.0 |
Two-solvent | — | 0.73 | 3.2 | 2.5 | 0.0 | 0.76 | 3.1 | 2.5 | −0.2 | 438 | 57.5 | 34.0 | 8.5 | 0.76 | 3.0 | 2.5 | −0.3 |
The four force fields already considered in ref. 43 were found to perform comparably well in reproducing the experimental data. The GROMOS-2016H66 and the OPLS-AA force fields have smaller RMSEs (2.9 kJ mol−1) compared to AMBER-GAFF and CHARMM-CGenFF (3.6 and 4.0 kJ mol−1, respectively). However, the AVEEs are slightly smaller in magnitude for AMBER-GAFF and CHARMM-CGenFF (−0.2 and +0.2 kJ mol−1, respectively) compared to GROMOS-2016H66 and OPLS-AA (−0.8 and +1.0 kJ mol−1, respectively). Although GROMOS-2016H66 and OPLS-AA have very similar distributions of errors, AMBER-GAFF has a higher proportion of M-Dev and H-Dev (fewer L-Dev), while CHARMM-CGenFF has a higher proportion of L-Dev and H-Dev (fewer M-Dev). Extending this global comparison to the five additional force fields, the differences remain limited.
Relative to GROMOS-2016H66, both GROMOS-54A7 and GROMOS-ATB present lower correlation coefficients R (0.87 and 0.76, respectively, compared to 0.88) and higher RMSEs (4.0 and 4.8 kJ mol−1, respectively, compared to 2.9 kJ mol−1). The AVEEs of the three sets are negative and of similar magnitudes (−1.5 to −0.8 kJ mol−1), i.e. the GROMOS force fields tend to slightly overestimate the magnitudes of the solvation free energies. Relative to GROMOS-2016H66, GROMOS-54A7 has a significantly higher proportion of H-Dev (fewer L-Dev, similar M-Dev), and GROMOS-ATB a significantly higher proportion of M-Dev and H-Dev (fewer L-Dev).
Relative to OPLS-AA, OPLS-LBCC presents a slightly lower correlation (0.85 compared to 0.88) and a somewhat higher RMSE (3.3 compared to 2.9 kJ mol−1). The distribution of errors is very similar to that of OPLS-AA, with a slightly higher proportion of H-Dev (fewer L-Dev and M-Dev). Whereas the AVEE of OPLS-AA was positive (+1.0 kJ mol−1), suggesting a tendency to slightly underestimate the magnitudes of the solvation free energies, the AVEE for OPLS-LBCC is very close to zero (+0.2 kJ mol−1).
Relative to AMBER-GAFF, AMBER-GAFF2 performs very similarly or slightly better. The correlation coefficient is the same (0.86), the RMSE is marginally lower (3.4 compared to 3.6 kJ mol−1), and the AVEE is also very close to zero (−0.3 compared to −0.2 kJ mol−1). The error distribution is similar as well, with a slightly lower proportion of M-Dev and H-Dev (more L-Dev). Finally, OpenFF presents a lower correlation compared to the AMBER force fields (0.82), together with comparable RMSE and AVEE (3.6 and −0.3 kJ mol−1, respectively), and a similar distribution of errors. This is not entirely surprising, considering the similarity between the Lennard-Jones interaction parameters of OpenFF and AMBER-GAFF for the molecules considered here.
The single-value model (all cross-solvation free energies assumed equal to −18.4 kJ mol−1) presents a RMSE of 5.4 kJ mol−1, larger than the corresponding value for any of the nine force fields (range 2.9–4.8 kJ mol−1). This is reassuring, as it shows that physics-based modeling outperforms this extremely primitive prediction scheme, albeit more or less pronouncedly for the different force fields. The two-solvent model (cross-solvation free energies calculated based on an optimized linear combination of the values for A and B in H2O and CHE) presents a lower correlation (0.76) compared to the nine force fields considered, but the corresponding RMSE (3.1 kJ mol−1) is actually lower compared to most of the force fields (except GROMOS-2016H66 and OPLS-AA). This observation supports the idea that solvation free energies calculated in two solvent only, one polar and one non-polar, may encompass roughly the same information as a full cross-solvation matrix. The consideration of such a unique solvent pair during force-field refinement, as nowadays commonly done,32 may thus represent a reasonable approximation in the absence of a full cross-solvation matrix. See Section S3 (ESI†) for more details on the calculations involving these two simple models.
Considering the reduced set CONF of highest-confidence experimental data points instead of the COMP set, the correlation with experiment is increased (from the range 0.76–0.88 to the range 0.83–0.92) and the RMSEs are reduced (from the range 2.9–4.8 to the range 2.4–3.8 kJ mol−1). The improvement is most pronounced for CHARMM-CGenFF (4.0 to 2.6 kJ mol−1) and GROMOS-ATB (4.8 to 3.7 kJ mol−1), and least pronounced for GROMOS-54A7 (4.0 to 3.8 kJ mol−1) and OPLS-AA (2.9 to 2.6 kJ mol−1). The trend is systematic for the nine force fields, which suggests that the excluded entries (single value found in the experimental data sources considered) may encompass a higher proportion of less reliable experimental values.
![]() | ||
Fig. 4 Deviations between experimental and calculated cross-solvation free energies for the nine force fields, considering the set and individual compounds as solute or as solvent. The quantities displayed are (a) the root-mean-square error (RMSE) and (b) the average error (AVEE). Each molecule is considered as solute (blue bars) or as solvent (green bars). The force fields considered are listed in Table 1. The molecules considered and their acronyms are shown in Fig. 1. They are listed in order of decreasing polarity, as estimated by the molecular dipole moment μ. The numerical values can be found in Section S2 (Table S4, ESI†), along with a corresponding comparison considering the FULL set (Table S5 and Fig. S5, ESI†). |
The RMSEs of the different compounds considered as solutes or considered as solvents are typically of comparable magnitudes. However, in CHARMM-CGenFF and OpenFF, the RMSEs of the compounds considered as solvents are larger for most molecules. In CHARMM-CGenFF, the AVEEs of both types are also noticeably smaller in magnitude compared to the eight other force fields, except for a number of outliers (most prominently DAMD, DEAN, and TEAN as solutes). The RMSEs of both types are tendentially larger in GROMOS-ATB and, to a lesser extent, GROMOS-54A7 compared to GROMOS-2016H66. The two sets also present a more pronounced (though non-systematic) sign bias towards negative AVEEs, i.e. indicative of solvation free energies that tend to be overestimated in magnitude. In the opposite, OPLS-AA presents a (non-systematic) sign bias towards positive AVEEs, i.e. indicative of solvation free energies that tend to be underestimated in magnitude. This trend is no longer visible for OPLS-LBCC. Note, however, that the absence of sign bias in the global AVEE for OPLS-LBCC, AMBER-GAFF, AMBER-GAFF2, and OpenFF, is in part fortuitous, as it results from the partial cancellation of larger positive and negative AVEEs over the present selection of compounds.
Dimethylacetamide (DAMD) as solute consistently presents large RMSEs (4.7 to 13.6 kJ mol−1) along with negative AVEEs (−13.2 to −4.2 kJ mol−1). The solvation free energy of this molecule is thus tendentially overestimated in magnitude for all solvents and force fields. The same applies to a lesser extent to ethyl acetate (EAE), which also presents relatively large RMSEs (2.0 to 8.7 kJ mol−1) and tendentially negative AVEEs (−8.3 to +0.3 kJ mol−1). Note, however, that the RMSEs for these two compounds as solvents are not anomalously high, and present no significant sign bias. For DAMD as a solute, these discrepancies might result in part from less accurate experimental data (for this compound as solute one has either a single experimental value or multiple values with a large spread, see Fig. 2b).
Water (H2O) is associated with comparable and relatively large RMSEs as solute (3.8 to 5.5 kJ mol−1) and as solvent (2.7 to 6.4 kJ mol−1). The corresponding AVEEs are nearly systematically positive (+0.4 to +3.0 kJ mol−1 as solute, −0.3 to +3.7 kJ mol−1 as solvent), i.e. the solvation free energies involving H2O as solute or as solvent are tendentially underestimated in magnitude by all force fields. For H2O as solvent, the best agreement with experiment is obtained for the GROMOS-2016H66 force field, with values of 2.7 and +0.8 kJ mol−1 for the RMSE and AVEE, respectively. This is in line with the fact that hydration free energies have been included as target during the force-field calibration.32 In contrast, GROMOS-54A7, GROMOS-ATB, and CHARMM-CGenFF present the largest discrepancies, with RMSEs of 5.6, 6.0, and 6.4 kJ mol−1, respectively. The corresponding AVEEs are positive (+1.2 to +2.2 kJ mol−1), i.e. the magnitudes of the hydration free energies tend to be underestimated by these three force fields. For AMBER-GAFF2 and OpenFF, the RMSEs are similar to those for AMBER-GAFF (5.4 and 5.2 kJ mol−1, respectively, compared to 5.5 kJ mol−1). The corresponding AVEEs are also positive (+2.1 to +2.9 kJ mol−1), i.e. the magnitudes of the hydration free energies tend also here to be underestimated. Finally, for OPLS-LBCC, the RMSE is smaller than for OPLS-AA (3.2 compared to 4.6 kJ mol−1), and the AVEE is now very close to zero (−0.3 compared to +3.7 kJ mol−1). Thus, although the magnitudes of the hydration free energies tend to be underestimated by OPLS-AA, this bias is removed in OPLS-LBCC. This is in line with the fact that the 1.14 × CM1A-LBCC charge-derivation scheme involves an empirical upscaling of the charges by a factor of 1.14 and the use of localized bond-charge corrections which are optimized precisely to improve agreement with the experimental hydration free energies.68,95,96
The amines (ANLN, DEAN, and TEAN), which can be challenging compounds in terms of force-field design,61,120,156,157 also tend to have relatively large errors compared to experiment, both as solutes and as solvents. However, the extent of disagreement varies from force field to force field, with AVEEs of different signs. For these three compounds, the RMSEs of both types are generally somewhat larger in GROMOS-54A7 and GROMOS-ATB (3.3 to 6.6 kJ mol−1; see, however, ref. 157 for a recent GROMOS-compatible reparametrization) compared to GROMOS-2016H66 (2.7 to 5.6 kJ mol−1). They are also large in CHARMM-CGenFF (2.7 to 8.4 kJ mol−1), especially for DEAN. The smallest RMSEs for the three compounds are obtained for OPLS-LBCC (2.2 to 4.4 kJ mol−1) and OpenFF (2.3 to 4.6 kJ mol−1). Here, OPLS-LBCC performs slightly better than OPLS-AA (2.3 to 5.5 kJ mol−1), and OpenFF slightly better than AMBER-GAFF and AMBER-GAFF2 (3.7 to 5.9 kJ mol−1). For DEAN as solvent and, to a lesser extent, TEAN as solvent, these discrepancies might result in part from less accurate experimental data (for the different solutes in these two solvents, one has mostly or exclusively a single experimental value, see Fig. 2b).
For the alcohols (PTL, MTL, ETL, BTL, 2M2P, 1PL, and 2PL), both as solutes and as solvents, four sets are affected by comparatively large errors. The RMSEs are larger in GROMOS-ATB (2.3 to 7.0 kJ mol−1) and, to a lesser extent, in GROMOS-54A7, CHARMM-CGenFF and OpenFF (1.6 to 5.2 kJ mol−1) compared to the five other parameter sets (1.4 to 4.4 kJ mol−1). The signs and magnitudes of the corresponding AVEEs differ significantly and non-systematically depending on the compound, its consideration as solute or as solvent, and the chosen force field.
For the chlorinated compounds (CHCL3, CCL4, and C2CL2), which were omitted from the COMP set, the FULL set must be considered (Fig. S5, ESI†), keeping in mind that some of these molecules are unavailable in specific force fields (Table 1). For this class of compounds, considered both as solutes and as solvents, OPLS-LBCC stands out as presenting comparatively large RMSEs (3.5 to 6.3 kJ mol−1) along with tendentially negative AVEEs (−5.5 to −0.8 kJ mol−1). The eight other force fields present smaller RMSEs (1.8 to 4.1 kJ mol−1) and less sign bias in the AVEEs (−2.6 to +2.9 kJ mol−1).
The GROMOS-2016H66 parameters set32 is the only one that has included solvation free energies in an apolar solvent, cyclohexane (CHE), as a calibration target. Interestingly, however, the eight other force fields reproduce similarly well the solvation free energies in CHE, with RMSEs ranging between 1.9 and 3.2 kJ mol−1, compared to 2.3 kJ mol−1 for GROMOS-2016H66. In the nine force fields, the solvation free energies in the two other aliphatic non-polar solvents hexane and heptane (HXE and HPE) show similar deviations compared to CHE, with RMSEs between 1.8 and 3.5 kJ mol−1. The solvation free energies in the non-polar aromatic solvents benzene and toluene (BZN and TOL) also present relatively small RMSEs in the nine force fields, ranging between 2.0 and 4.3 kJ mol−1. Note that although the errors affecting the solvation free energies in the five non-polar solvents (green bars for TOL-CHE in Fig. 4) show little variations across solvents and force fields, the variability is significantly more pronounced when these non-polar molecules are considered as solutes (corresponding blue bars).
![]() | ||
Fig. 5 Deviations between experimental and calculated cross-solvation free energies for the nine force fields considered. The ![]() |
Considering Fig. 5, the compound DAMD as a solute clearly represents a challenging case for most force fields (except GROMOS-54A7, which includes no parametrization for this compound). In GROMOS-ATB and, to a lesser extent, CHARMM-CGenFF, the magnitude of the solvation free energy of this compound in all solvents is dramatically overestimated, with errors that range between −6.6 and −18.9 kJ mol−1, and are particularly large in the MP solvents. For OPLS-LBCC, AMBER-GAFF, AMBER-GAFF2, and OpenFF, important deviations are observed as well in the MP solvents, with errors that can reach −18.0 kJ mol−1. In GROMOS-ATB, the negative error is mainly due to an overly negative ΔGVDW. In the five other cases, it is related to an overly negative ΔGELE, resulting from high negative partial charges on the nitrogen and oxygen atoms, more pronouncedly so for CHARMM-CGenFF, OPLS-LBCC, and OpenFF. Interestingly, all the above force fields rely on atomic partial charges derived based on QM calculations. In contrast, GROMOS-2016H66 and OPLS-AA, in which the charges are empirically fitted to reproduce experimental data, lead to smaller (yet still sizeable) errors of at most −8.0 and −8.7 kJ mol−1, respectively. Note that the above discrepancies might also result in part from less accurate experimental data for DAMD as a solute (see Fig. 2b). The solute EAE also tends to present large and negative errors in GROMOS-54A7, GROMOS-ATB, AMBER-GAFF, and OpenFF with deviations that can reach −12.3, −10.2, −8.3, and −9.3 kJ mol−1, respectively. However, this solute does not appear to be particularly problematic in the five other force fields.
Considering water as a solvent, GROMOS-2016H66 and OPLS-LBCC present the best agreement with experiment, with deviations of at most 7.0 and 5.5 kJ mol−1 in magnitude, respectively (excluding the solute DAMD for OPLS-LBCC), in line with the consideration of hydration free energies as targets during the parametrization of these two force fields.32,68,95,96 The seven other force fields also reproduce the hydration free energies reasonably well, with a number of exceptions: (i) OPLS-AA nearly systematically presents positive deviations, a feature that disappears when using the higher charges of OPLS-LBCC; (ii) GROMOS-54A7, GROMOS-ATB, CHARMM-CGenFF, OPLS-AA, AMBER-GAFF, and AMBER-GAFF2 present markedly too positive values for the aliphatic amines (DEAN and TEAN), mainly due to low atomic partial charges in these molecules; (iii) GROMOS-2016H66, GROMOS-54A7, OPLS-AA, AMBER-GAFF, AMBER-GAFF2, and OpenFF present large positive deviations for CHCL3 (all) and DEE (all except GROMOS-2016H66); (iv) GROMOS-54A7 and OPLS-AA present large deviations (of opposite signs) for ANLN; (v) The values for the alcohols are slightly less negative in AMBER-GAFF2 (deviations from −0.5 to +5.4 kJ mol−1) and markedly more positive in OpenFF (from +5.9 to +8.4 kJ mol−1), compared to AMBER-GAFF (−1.2 to +3.5 kJ mol−1).
The solvation free energies of alcohols in alcohols are most accurately reproduced by CHARMM-CGenFF (deviation from −1.6 to +1.5 kJ mol−1) and, to a lesser extent, GROMOS-2016H66 (−4.3 to +4.3 kJ mol−1) and AMBER-GAFF2 (−3.7 to +4.1 kJ mol−1). The same applies for GROMOS-ATB as well (from −4.2 and +4.8 kJ mol−1) if one excepts the secondary (2PL) and tertiary (2M2P) alcohols as solutes (deviations of −9.9 and −14.7 kJ mol−1, respectively). In contrast, the values are tendentially affected by negative errors in GROMOS-54A7 and AMBER-GAFF (down to −7.5 kJ mol−1), and by positive errors in OPLS-AA, OPLS-LBCC, and OpenFF (up to +5.7 kJ mol−1). The difference between these two groups of force fields for the alcohols can be explained by considering the Lennard-Jones interaction parameters. Both the oxygen atom and the alpha carbon atom have smaller σ and larger ε values in AMBER-GAFF compared to OPLS-AA and OPLS-LBCC. Similarly, the oxygen atom has a smaller repulsive C12 and a larger attractive C6 in GROMOS-54A7 compared to GROMOS-2016H66, along with a slightly more negative charge. These changes enhance the attractive solute–solvent interactions directly, but also indirectly via an enhancement of the hydrogen-bonding interactions due to shorter donor–acceptor distances. A similar change of Lennard-Jones interaction parameters (oxygen atom more repulsive with a larger σ and a smaller ε, aliphatic hydrogen atom more attractive with a smaller σ and a slightly larger ε) also explains why AMBER-GAFF2 performs better here than AMBER-GAFF. In OpenFF, the partial charges on the oxygen atom, the hydroxyl hydrogen and the alpha carbon atom are too small compared to AMBER-GAFF. The resulting less attractive electrostatic interactions between the alcohols explains the positive deviations in the OpenFF force field.
The amines (DEAN, TEAN, and ANLN) are associated with large deviations from experiment in specific parameter sets. This is in particular the case for the solvation free energies in ANLN considering the solutes PYRI and TEAN (and, to a lesser extent, DEAN), which are associated with large positive errors in all force fields (up to 17.2 kJ mol−1), and considering the MP solutes in GROMOS-2016H66, GROMOS-54A7, and AMBER-GAFF, which are associated with predominantly negative errors (down to −14.2 kJ mol−1). The solvation free energies in DEAN and TEAN are also characterized by particularly large deviations in GROMOS-54A7, GROMOS-ATB, CHARMM-CGenFF, AMBER-GAFF, and AMBER-GAFF2 (up to 14.1 kJ mol−1 in magnitude, tendentially more positive for TEAN compared to DEAN). These issues do not affect the four other force fields as significantly (errors up to 9.2 kJ mol−1 in magnitude). Note that the above discrepancies might also result in part from less accurate experimental data for DEAN as solvent and, to a lesser extent, TEAN as solvent (see Fig. 2b). Finally, the solvation free energies involving DEAN and TEAN as solutes are associated with large positive errors in CHARMM-CGenFF (up to 16.1 kJ mol−1, excepting the solvent H2O), and negative errors in GROMOS-ATB (down to −8.1 kJ mol−1), especially for TEAN and in the LP solvents. These issues do not affect the seven other force fields as significantly (errors up to 6.8 kJ mol−1 in magnitude, excepting the solvents H2O and ANLN).
Except in combination with some of the exceptional solutes and solvents mentioned above, the solvation free energies involving apolar molecules (TOL, BZN, HPE, HXE, CCL4 and CHE), both as solute and as solvent, present less pronounced discrepancies in all nine force fields. The corresponding errors do not exceed 7.2 kJ mol−1 in magnitude, excluding the solutes DAMD, H2O, EAE, CHCL3, and TEAN, as well as the solvents DEE and PYRI.
It is interesting to note that the force-field adjustments made from OPLS-AA to OPLS-LBCC (different charge set) or from AMBER-GAFF to AMBER-GAFF2 (adjusted covalent terms and Lennard-Jones parameters) induce conflicting effects on different solute–solvent pairs. For example, the change from OPLS-AA to OPLS-LBCC noticeably improves the accuracy of the hydration free energies (increase in magnitude due to higher solute charges), but also results in a tendency to overestimate the solvation free energies of polar solutes in other polar solvents. This affects in particular the solutes DAMD, PYRI, C2CL2, EAE, and CHCL3 in the non-aqueous polar solvents and MP solvents and, to a lesser extent, the solutes PPN and BTN in the MP solvents. Similarly, the accuracy of the solvation free energies of alcohols in alcohols (and in a few other solvents as well) is improved by the change from AMBER-GAFF to AMBER-GAFF2 (more repulsive oxygen atom, more attractive aliphatic hydrogen atom), but the change also induces positive shifts in the solvation free energies of MTL and ETL in CHE, CCL4, HXE, and PYRI (deviations from +5.1 to +7.2 kJ mol−1vs. +2.1 to +4.6 kJ mol−1 for AMBER-GAFF), and negative shifts in the solvation free energies of the aliphatic apolar solutes HPE, HXE, and CHE in all solvents (from −6.2 to 0.0 kJ mol−1vs. −2.8 to +3.2 kJ mol−1 for AMBER-GAFF).
![]() | ||
Fig. 6 Root-mean-square error relative to experiment for the transfer free energies of all solutes C between any pair of solvents A and B considering the COMP set. For each solvent pair, the RMSE of ![]() |
The solvent pairs involving ANLN and DEE present high RMSEs for all nine force fields (up to 8.9 kJ mol−1), possibly hinting at larger errors affecting the corresponding experimental values. Besides these two solvents, GROMOS-54A7, GROMOS-ATB, CHARMM-CGenFF, AMBER-GAFF, AMBER-GAFF2, and OpenFF are affected by large RMSEs for the pairs involving TEAN (up to 7.7 kJ mol−1, excluding pairs with ANLN and DEE). For GROMOS-ATB and CHARMM-CGenFF, the same applies to the pairs involving DEAN (up to 7.0 kJ mol−1). In GROMOS-2016H66 and OPLS-LBCC (and, to lesser extent, OPLS-AA), the RMSEs for the pairs involving H2O present smaller errors (generally below 5.0 kJ mol−1) compared to the six other force fields (generally above 5.0 kJ mol−1).
Two examples of error compensation within are the following. First, AMBER-GAFF has slightly larger errors compared to AMBER-GAFF2 in terms
for the solvation of the various solutes in the alcohols (Fig. 5). However, in AMBER-GAFF, the errors evidence more similar deviation patterns along the solute series when comparing the different alcohol solvents. As a result, AMBER-GAFF and AMBER-GAFF2 still perform similarly well in reproducing the transfer free energies between alcohols. Second, although GROMOS-ATB presents larger errors than all the other force fields in terms of
(Table 2 and Fig. 5), it still reproduces the
with an accuracy comparable to that of five other force fields (the average RMSE of the
matrix for this force field is 3.6 kJ mol−1). These observations emphasize the importance of considering both solvation free energies
and transfer free energies
when evaluating the relative accuracies of force fields.
![]() | ||
Fig. 7 The consensus root-mean-square error matrix, defined as the root-mean-square deviation from experiment considering the nine force fields simultanously (a), the consensus minimum-error matrix, defined as the smallest deviation (in magnitude) from experiment achieved by any one of the nine force fields considered (b), and the consensus average-error matrix, defined as the deviation from experiment of the average over the nine force fields (c). The force fields considered are listed in Table 1. The molecules considered and their acronyms are shown in Fig. 1. They are listed along the rows and columns in order of decreasing polarity, as estimated by the molecular dipole moment μ. The deviations between experimental and calculated cross-solvation free energies for each of the nine force fields are displayed in Fig. 5. The corresponding numerical values can be found in Section S8 of ref. 43 along with Section S1 (Table S3, ESI†) of the present article. |
The consensus minimum-error matrix, defined by the smallest deviation (in magnitude) from experiment achieved by any one of the nine force fields considered, is shown in Fig. 7b. This matrix has an RMSE of 1.4 kJ mol−1 over all its entries (expectedly smaller than the value of 1.6 kJ mol−1 obtained previously43 when considering only four force fields). Thus, by choosing the best possible force field on a case to case basis, one can very accurately reproduce the vast majority of the experimental solvation free energies. However, there are several instances where the calculated cross-solvation free energy of a given solute–solvent pair substantially deviates from experiment regardless of the force field employed. The fact that each of the nine force fields fails to reproduce the experimental value might be an indication of an error in the experimental data, although it may also result from an inaccuracy of the molecular model underlying all these force fields (e.g. classical-mechanics approximation or absence of explicit electronic polarization). The solvation free energies of H2O in 2M2P and of PYRI in ANLN are two examples with possibly large experimental errors, both of which were excluded from the COMP set (see Appendix A of ref. 43 for a possible alternative experimental value in the case of H2O:2M2P). There are 11 other cases with errors of 2kBT− or larger in this matrix (expectedly fewer than the 18 cases observed previously43 when considering only four force fields), namely DAMD:PYRI, DAMD:MTL, DAMD:ETL, DAMD:BTL, DAMD:1PL, DAMD:DEAN, PYRI:DEE, H2O:HXE, H2O:CCL4, TEAN:ANLN, and BZN:DEE. In the majority of these cases, the experimental data has either a single value reported in the literature, or significant experimental discrepancies among the different values reported. The validity of this experimental data can therefore indeed be questioned.
Finally, the consensus average-error matrix, defined by the deviation from experiment of the average solvation free energy over the nine force fields, is displayed in Fig. 7c. The matrix shows the extent of error cancellation upon averaging the results over the force fields. This matrix has an RMSE of 3.0 kJ mol−1 over all its entries, expectedly larger than the corresponding value of 1.4 kJ mol−1 for the minimum-error matrix. However, it is also larger than the lowest single force-field RMSEs of 2.9 kJ mol−1 obtained for GROMOS-2016H66 and OPLS-AA (Table 2). Thus, it remains better to choose a force field expected to present the highest possible accuracy for the compound class of interest than to average the results over a set of arbitrary force fields.
In our previous article,43 a cross-solvation matrix of 25 × 25 experimental value was introduced by collecting and curating data from seven literature sources,47–53 including compounds from one to seven carbon atoms representative for alkanes, chloroalkanes, ethers, ketones, esters, alcohols, amines, and amides. This experimental data was used to compare the relative accuracies of four popular condensed-phase force fields, namely GROMOS-2016H66, OPLS-AA, AMBER-GAFF, and CHARMM-CGenFF. In the present work, this comparison is extended to five additional force fields, namely GROMOS-54A7, GROMOS-ATB, OPLS-LBCC, AMBER-GAFF2, and OpenFF.
Considering these nine force fields, the correlation coefficients between experimental values and simulated results range from 0.76 to 0.88, the RMSEs from 2.9 to 4.8 kJ mol−1, and the AVEEs from −1.5 to +1.0 kJ mol−1. These differences are statistically significant, but not very pronounced, especially considering the very different functional-form choices and parametrization strategies of the nine force fields as well as the influence of outliers, some of which possibly caused by inaccurate experimental data. Considering the RMSEs, GROMOS-2016H66 and OPLS-AA present the best accuracy (2.9 kJ mol−1), followed by OPLS-LBCC, AMBER-GAFF2, AMBER-GAFF, and OpenFF (3.3 to 3.6 kJ mol−1), and then by GROMOS-54A7, CHARM-CGenFF, and GROMOS-ATB (4.0 to 4.8 kJ mol−1). Considering the AVEEs, the three GROMOS force fields tend to slightly overestimate the magnitudes of the solvation free energies (−1.5 to −0.8 kJ mol−1). In the opposite, the OPLS-AA force field shows a slight tendency to underestimate the magnitudes of the solvation free energies (+1.0 kJ mol−1). The five other force fields show no significant sign bias (AVEEs between −0.3 and +0.2).
As shown by further analysis of the RMSEs and AVEEs in terms of solutes, solvents, and solute–solvent pairs, these global errors are distributed somewhat heterogeneously over the different compounds within the different force fields. In particular, the above trends in terms of global AVEEs may result at least in part from the fortuitous cancellation of larger positive and negative errors over the present selection of compounds. It is also observed that GROMOS-2016H66 and OPLS-LBCC present smaller errors in the hydration free energies, in line with the fact that these parameter sets included hydration free energies as targets in their calibration.
The errors affecting the solvation free energies in the five non-polar solvents show remarkably little variations across solvents and force fields. However, this observation does not imply that the nine force fields are essentially equivalent with respect to their description of the Lennard-Jones interactions, and that variations of the solvation free energy in more polar solvents only arise from differences in the atomic charges. First, the variability is significantly more pronounced when the non-polar molecules are considered as solutes. Second, for polar molecules, the repulsive Lennard-Jones interaction parameters are largely correlated with the charges, because it is their balance that determines the geometry and strength of favorable polar interactions (e.g. hydrogen bonds). Furthermore, the Lennard-Jones parameters of the non-polar and polar molecules are connected by the application of combination rules, as well as correlated with other simulation choices (e.g. cutoff distances, treatment of the long-range non-bonded interactions). A consequence of these observations is that the simultaneous optimization of Lennard-Jones parameters and atomic charges in force-field refinement is to be preferred over a sequential one.32,124,160
The present study also underlines three points related to the extensive comparison between experiment and simulation results in terms of solvation free energies. First, the importance of considering both solvation free energies and transfer free energies
when evaluating the relative accuracies of force fields, so as to avoid biases related to error enhancement or compensation. Second, the fact that choosing a force field expected to present the highest possible accuracy for the compound class of interest remains a better strategy compared to averaging the results obtained using an arbitrary set of force fields. And finally, that such systematic comparisons between simulation and experiment and, in particular, the minimum-error matrix over all force fields considered, may be extremely useful for detecting suspicious (i.e. possibly erroneous) experimental values.
Cross-solvation matrices are expected to represent a powerful tool not only for force-field validation/comparison, but also for force-field refinement/calibration. This is because such a matrix includes a large number of observables which probe all the pair-type interactions relevant for the force field in a balanced fashion. Owing to these two features, this target data may be particularly well suited for assessing the relative merits of different van der Waals combination rules and charge-derivation schemes, and the possibility of bypassing them when necessary using pair-specific parameters or empirical scaling factors.
In the near future, this work related to cross-solvation matrices will be expanded along three main directions: (i) further expansion and curation of the experimental data set; (ii) use of cross-solvation matrices in force-field calibration (e.g. in the context of the systematic CombiFF force-field refinement approach124,160); (iii) extension to liquid–liquid mixing properties at finite concentration.
Footnote |
† Electronic supplementary information (ESI) available: (i) the detailed simulation results for all compound pairs obtained using the five force fields newly considered here (see Section S8 of ref. 43 for corresponding results with the four force fields previously considered); (ii) the detailed simulation results for all individual compounds obtained using the nine force fields (for the four force fields previously considered, the material from Section S9 of ref. 43 is repeated); (iii) the detailed method description and calculation results for the single-value and two-solvent models. Corresponding files, labelled version 1.1 (same experimental data as version 1.0, simulation results for the nine force fields, now including molecular-topology and initial-configuration files as well), are also freely downloadable from the net under ref. 54. See DOI: 10.1039/d1cp00215e |
This journal is © the Owner Societies 2021 |