Developing force fields when experimental data is sparse : AMBER / GAFF-compatible parameters for inorganic and alkyl oxoanions †

We present a set of Lennard-Jones parameters for classical, all-atom models of acetate and various alkylated and non-alkylated forms of sulfate, sulfonate and phosphate ions, optimized to reproduce their interactions with water and with the physiologically relevant sodium, ammonium and methylammonium cations. The parameters are internally consistent and are fully compatible with the Generalized Amber Force Field (GAFF), the AMBER force field for proteins, the accompanying TIP3P water model and the sodium model of Joung and Cheatham. The parameters were developed primarily relying on experimental information – hydration free energies and solution activity derivatives at 0.5 m concentration – with ab initio, gas phase calculations being used for the cases where experimental information is missing. The ab initio parameterization scheme presented here is distinct from other approaches because it explicitly connects gas phase binding energies to intermolecular interactions in solution. We demonstrate that the original GAFF/AMBER parameters often overestimate anion–cation interactions, leading to an excessive number of contact ion pairs in solutions of carboxylate ions, and to aggregation in solutions of divalent ions. GAFF/AMBER parameters lead to excessive numbers of salt bridges in proteins and of contact ion pairs between sodium and acidic protein groups, issues that are resolved by using the optimized parameters presented here.


Introduction
Monoatomic and small polyatomic ions are key players in many biological processes, such as DNA folding, 1 blood coagulation and anti-coagulation, 2,3 protein stability 4 and protein crystallization. 5 The molecular mechanisms by which ions act are currently incompletely understood, in part because inferring molecular scale details from the signals detected in experiment cannot be unambiguously done. In contrast, molecular scale details are directly accessible from simulations and atomistic models, making the development of such models of critical importance.
Because proteins and other biomolecules are typically large (50-3000 kDa), only simulations using classical, non-polarizable force fields give access to the long length and time scales relevant for many processes taking place in bio-systems. To be useful, force fields must strike the correct balance between the various intermolecular interactions: biomolecule-water, biomolecule-biomolecule, water-water, water-ion, ion-counterion and ion-biomolecule interactions; in this last case, the main contribution would come from the interaction between the ions in solution and the charged sites of the biomolecules. Classical, fixed-charge force fields for biomolecules in water have been under evolution for decades and are now quite successful at predicting binding affinities 6 and the folded structure of small proteins, 7-9 the self-assembly of phospholipid bilayers, 10 the importance of electrostatics in DNA, and DNA-protein interactions. 11,12 These force fields were parameterized against experimental observables that reflect the balance between ionwater, biomolecule-water and water-water interactions (e.g., free energies of hydration). 13,14 More recently, force fields for aqueous solutions of alkali, alkali-earth and halide ions were developed that also show reasonable anion-cation interactions. They are thus able to reproduce experimental osmotic pressures or solution activities at salt concentrations below 1 M, [15][16][17][18][19] and in a few cases up to much higher concentrations. 20 Simulations based on these force fields have already offered unprecedented insight into the molecular scale origin of ion-specific effects in biological systems. 21 In contrast to the advanced state of development of classical molecular models for alkali and halide ions for explicit water simulations, there are currently only a few models of small, polyatomic anions that have been parameterized to reflect the correct balance between their interactions with water, with cationic amino acids or with commonly used counterions. [22][23][24] Such models are of interest because polyatomic anions such as sulfates, phosphates or acetates are present in the physiological context in inorganic and/or alkylated form, and the molecular scale details regarding their interaction with biomolecules are not yet understood. [25][26][27] Simulations of biomolecular systems performed so far with these ions have simply used general parameters based on commonly used biomolecular force fields such as CHARMM or AMBER: 13 the partial charges are obtained from low level quantum mechanical calculations and RESP fitting, 28 and the Lennard-Jones parameters for each atom type are the same as those used in the chosen biomolecular force field. Parameters obtained this way are typically used for simulations without further tests -either comparison with experimental data or with ab initio calculations -under the assumption that the ion parameters must be reliable because they were obtained with the established protocol used to develop the biomolecular force field. However, these protocols have only been proven reasonable for neutral biomolecules 13,14 and they are known to fail for charged species, e.g., they yield hydration free energies substantially more negative than those experimentally measured. 13 In this work, we aim to overcome the above limitations of existing classical, fixed-charge models of polyatomic ions for simulations of biomolecular systems in explicit water. We focus specifically on the polyatomic anions methyl-sulfonate (CH 3  ). The parameterization approach we present ensures that the ion parameters reflect the correct balance of the ion interactions with water, with the commonly used counterion Na + and with analogues (methylammonium, CH 3 NH 3 + , and ammonium, NH 4 + ) of the amino acid lysine, while ensuring that the parameters for all the ions remain internally consistent. The parameters developed here are compatible with widely used force fields for molecular simulations: the AMBER 13 force field for proteins, the accompanying TIP3P 29 water model, and the sodium model of Joung and Cheatham which is often used as counterion. 15 Intramolecular parameters for the anions already exist; 13,30-32 the main issue is to parameterize their intermolecular interactions. In the context of the AMBER 13 force field for proteins, intermolecular interactions depend on the Lennard-Jones parameters and on the partial charges of the atoms composing the interacting species. We obtain the partial charges of the ions following the standard procedure used in AMBER 13 -low level ab initio calculations followed by RESP fitting -so only the Lennard-Jones parameters of the ions are optimized. ‡

Difficulties optimizing Lennard-Jones parameters for polyatomic anions
To parameterize intermolecular potentials, it would be desirable to reproduce high quality thermodynamic data that strongly depends on intermolecular interactions: e.g., free energies of hydration, which reflect ion-water interactions, or solution activities, which reflect anion-cation interactions. This approach was followed by several groups to develop better models of alkali halides. 15,16,19 However, experimental data exists only for a few of the cases of interest here. For example, hydration free energies exist for CH 3 COO À , but not for CH 3 33 Osmotic pressure or solution activity data for solutions containing the cationic amino acids arginine and lysine, or suitable small-molecule analogues, and polyatomic anions are also not available. For solutions involving polyatomic anions and Na + , data exists only for CH 3 SO 3 À and CH 3 COO À .
Given that there is no complete set of experimental information that we can use to obtain an internally consistent set of parameters for all the anions mentioned above, an alternative would be to develop classical models solely based on ab initio calculations. One possibility would be to develop parameters to match observables (e.g., the potential of mean force between two ions) calculated from ab initio molecular dynamics (MD) in explicit solvent. The main shortcoming of this approach is that, currently, ab initio MD can only access timescales of tens of picoseconds, so the calculated free energy profiles have high statistical uncertainty. 34,35 A second possible shortcoming could be that a low level of theory must necessarily be used in these calculations, which could introduce systematic errors. Recent results, however, indicate that DFT with appropriate dispersion corrections yields reasonable results: structural properties of iodate in water calculated with MD-DFT simulations agree well with results from X-ray absorption fine structure spectroscopy (XAFS). 36 Another possibility would be to develop parameters based solely on single-point ab initio calculations, for example water-ion dimers, dry ion pairs or even small hydrated systems (e.g., an ion-pair or a single ion surrounded by a first hydration layer); dry ion pairs or water-ion dimers are here referred to as systems in the gas-phase. Multiple studies, however, 14,37 show that these approaches fail to produce parameters that are adequate for simulations of aqueous systems: the gas-phase intermolecular interaction energies calculated using ab initio methods differ significantly from those calculated using classical models optimized to reproduce properties of aqueous systems. How, then, can we obtain an internally consistent set of parameters for the ions in question, given the existing limitations in experimental data and in ab initio methods? ‡ In principle, the opposite approach -keeping GAFF Lennard-Jones parameters and tuning the charge distribution of the ions to reproduce the desired target properties -would also be possible. Doing so, however, would require establishing a new and well-defined procedure to obtain partial charges for polyatomic ions, which would be more time-consuming than the present approach and does not present obvious advantages.

Proposed approach
We propose an approach to develop non-bonded parameters for ionic species in aqueous systems, that combines experimental data and ab initio simulations. This combination is done in a manner that yields an internally consistent set of parameters for the anion-cation and ion-water interactions.
Here we briefly described the main points behind the approach; in the Methods section we present a detailed algorithm that may be followed by interested users. Let us consider species A, B and B 0 . B and B 0 are similar, i.e., they have identical charge and similar composition (e.g., CH 3 SO 4 À and CH 3 COO À ), yet different enough that they may not necessarily share identical parameters (e.g., the oxygen Lennard-Jones parameters in CH 3 SO 4 À and CH 3 COO À are not necessarily identical). AB and AB 0 denote dimers of these species interacting non-covalently. Lennard-Jones parameters for the intermolecular interactions in AB in water have been developed based on experimental data (Fig. 1, step 1), but analogous experimental data for the AB 0 system does not exist. Using high level ab initio calculations, we calculate the intermolecular energies in the AB and AB 0 systems in the gas phase, as a function of the intermolecular distance ( Fig. 1, step 2). We also perform the analogous potential energy scan for the AB system using the classical, optimized parameters. As illustrated in cartoon form in Fig. 1 (step 2), the interaction energy for system AB from ab initio calculations differs from that calculated using the classical model. This difference reflects the fact that the classical parameters were optimized for the intermolecular interactions of AB in water, and that the polarization state of molecules or ions in a dimer in the gas phase is very different from their polarization state in water. 14,38 Therefore, the AB parameters offer a poor description of the intermolecular interactions of this dimer in the gas phase. We then assume that the polarization state of similar species should be perturbed to a similar extent when they are immersed in water. If this assumption is true, then the difference in the minimum interaction energy between the classical and ab initio methods for AB, E cl,AB (R min ) À E QM,AB (R min ), should be very similar to the same difference for the AB 0 system, E cl,AB 0 (R min ) À E QM,AB 0 (R min ). In this article we show that this assumption indeed holds. This finding implies that classical parameters for the intermolecular interactions of the AB 0 system can be determined by requiring that as illustrated in Fig. 1 (step 3). In this expression, is the difference in the energy minimum of the indicated dimer calculated using ab initio and that calculated using the optimized classical parameters, as illustrated in Fig. 1 (step 2). This parameterization approaches uses the energy difference, DE AB,QM-cl (R min ) at the energy minimum only, because this parameter is expected to dominate system properties. A similar approach has proven reasonable to develop parameters for neutral compounds like alcohols 14,37 but until now has not been applied to charged species. We use this approach to develop a set of classical parameters for polyatomic anions which is internally consistent and which is fully compatible with the AMBER 13 force field for proteins, the TIP3P 29 water model and the sodium ion model of Joung and Cheatham. 15 The experimental properties we use for parameterization are the solution activity derivatives and the hydration free energies. Solution activities exist only for NaCH 3 SO 3 , NH 4 CH 3 SO 3 , NaCH 3 COO, Na 2 SO 4 and (NH 4 ) 2 SO 4 ; reliable hydration free energies exist only for HSO 4 À , CH 3 COO À and SO 4 2À . We optimize the parameters for the ions based on this experimental information, and then use ab initio calculations and the approach described above to obtain parameters for the remaining cases.

Molecular dynamics simulations
2.1.1 Force field and parameterization details. The TIP3P water model 29 is used in all simulations. The Lennard-Jones (LJ) parameters of the TIP3P oxygen are e OO = 0.6364 kJ mol À1 and s OO = 0.315061 nm, with the partial charge of q O = À0.8340e. 29 The partial charge for each of the two hydrogens is q H = +0.4170e; the LJ parameters assigned to the two hydrogens in TIP3P water equal zero. 29 Fig. 1 The parameterization approach in this study.
For sodium (Na + ) and chloride (Cl À ) we use Joung and Cheatham parameters, 15 which are shown to reproduce reasonably well solvation free energies and ion-water binding energies when used in combination with TIP3P water. 29 We introduce a general notation for the anions, A nÀ , here investigated: ( The e and s parameters determine the Lennard-Jones interaction energy of two species i and j as Unless otherwise stated, the Lorentz-Berthelot combination rules are used to obtain e ij and s ij from the self-interaction parameters, denoted by the subscripts ii and jj, as follows: The Lennard-Jones parameters of the GAFF 13,30-32 force field are used in most cases; the only exceptions we propose occur in interactions involving the anion oxygens. For several of the anions, the self-interaction parameters of the oxygens are optimized to better reproduce their interactions with water; these optimized self-interaction parameters, given in the Results section, should be used with eqn (4) to obtain all intermolecular interaction parameters except in the few cases indicated here, for which better estimates are found. We test whether Lorentz-Berthelot combination rules hold for interactions between the anions and Na + , NH 4 + or CH 3 NH 3 + , as detailed below and in the Results section; for most of these cases we find that they do not, so we propose optimized parameters for interactions between the anion-oxygens and Na + (O Á Á Á Na þ ), and between the anionoxygens and nitrogen and H N atoms in NH 4 + or CH 3 NH 3 , that should be used instead of eqn (4). The optimization of the anion-oxygen Lennard-Jones parameters proceeded in multiple stages. These are now described in the form of a road map that may be followed when applying our approach to parameterize intermolecular interactions for any species in water: (1) Select a starting parameter set for the species to be parameterized. Intramolecular parameters are indispensable because the approach we propose applies only to non-bonded interactions. GAFF provided good initial parameters for the oxoanions.
(2) Select which interactions should be optimized; select which experimental target properties exist for parameterization. In our case, the difference in the hydration free energies between the anions and a reference ion (Cl À ) is selected to optimize ion-water interactions: We choose to optimize parameters to reproduce the experimental value of DDG solv instead of DG solv because of the lower uncertainty 39 associated with experimental values of DDG solv relative to DG solv , and because we aim to create a force field compatible with the Na + and Cl À ions developed by Joung and Cheatham. 15 Details of the free energy calculations are given in Section 2.1.2 and in the ESI † (SI 2.1). The solution activity derivative at 0.5 m is selected to optimize anion-cation cation interactions. Details of these simulations are given in Sections 2.1.2 and in the ESI † (SI 2.2).
Ion-water interactions should be optimized before anioncation interactions; the remainder of this road map reflects this order.
(3) Evaluate the quality of the initial parameter set with respect to the primary experimental target property, and select which ion parameters will be optimized. We calculated the hydration free energies of the anions for which that information was available. We found that GAFF parameters failed in some cases, leading to DDG solv that differed from the target more than 2 kcal mol À1 , i.e., more than the uncertainty associated with the experimental target values. 33,39 For the ions for which GAFF parameters proved insufficient, we optimize ion-water interactions.
Because we only have one target experimental property -the hydration free energy -for parameterization of ion-water interactions, we opt to optimize only the self-interaction Lennard-Jones parameters associated with the anion oxygens. The oxygens form the outer part of the anions, and for that reason dominate the Lennard-Jones interactions of the anions with any nearby species. If multiple target properties exist, then parameters for multiple atoms may be optimized.
(4) Optimize self-interaction parameters based on hydration free energies, for the species for which these energies exist. The self-interaction e OO or s OO parameters of the oxygens in CH 3 COO À and SO 4 2À are optimized to reproduce the experimentally determined difference in solvation free energy between those anions and Cl À , DDG solv . Given that only one target property was selected for optimization, an infinite number of combinations of e OO and s OO values could yield equally good agreement with the target, possibly at the expense of the performance of the model regarding other observables. Because the original GAFF parameters are physically-based -they are derived to reproduce the density and heats of vaporization of organic liquids 13,30-32 -we retain them for either s OO or e OO , and optimize the remaining parameter only. We optimize s OO for the cases where the original GAFF 13,30-32 parameters led to deviations greater than 10 kcal mol À1 from the target DDG solv , but we optimize e OO for the cases where the deviation was lower than 10 kcal mol À1 . This threshold energy was chosen because a change of DDG solv 4 10 kcal mol À1 is approximately equal to half the DDG solv between Na + and K + , two ions which differ substantially in radius. § 33 To find the optimal parameters, we calculate the hydration free energies for a range of the scaling factor, f, applied to e OO or to s OO parameters; 4-6 different values of f ranging from 0.7 to 1.3 are used. We fit a linear function to the results. The optimal parameters are found by solving the linear function for the experimental SolvFE difference, DDG Exp solv . (5) Optimize the self-interaction parameters of the oxygen atoms of the remaining anions (for which reliable hydration free energies are not available) following the ab initio approach described in the introduction.
(a) For each species still to be parameterized (B 0 ), identify a similar species that was already parameterized (B). Similar species must have the same charge and must have similar general composition, primarily evaluated in terms of the characteristics (e.g., identity, hybridization, polarization) of the atoms being parameterized. Here we considered oxoanions of the same charge as similar species.
(b) For the water-B and water-B 0 dimers, select an initial relative orientation that is expected to yield similar and strong intermolecular interactions. For example, relative orientations where one of the water hydroxyl groups is pointing to the O2 oxygens in both B and B 0 are similar; relative orientations where the water hydroxyl groups point to an O2 oxygen in one of the species, but to an OH or OS oxygen in the other species are not similar. Strong intermolecular interactions refer to orientations that are expected to have particularly favourable interaction energies.
For the chosen initial structures, perform gas phase ab initio potential energy scans as described in Section 2.2.2. If water-B and water-B 0 differ in hyperconjugation effects (this situation did not happen in our systems), repeat the ab initio calculations with different relative orientations (identical for the two dimers) until one is found that yields similar hyperconjugation effects for both dimers. The importance of hyperconjugation effects in our parameterization approach is described in the context of anion-cation gas phase scans in Section 3.2.2.1.
(c) For the water-B dimer, perform a gas phase potential energy scan with the classical model and the optimized parameters, and using the same structures used in the ab initio scan. Calculate the difference, DE water-B,QM-cl (R min ), between the position of the energy minimum in the ab initio and the classical curves. This calculation is done by first fitting each set of data to a function of the form A exp(B Â R) + C(R À6 ) + (D/R), where A, B, C and D are fitting parameters, corresponding to the Buckingham potential form with an added electrostatic (1/R) term, and then identifying the minimum of the fitted functions.
(d) For the water-B 0 dimer, perform multiple gas phase potential energy scan with the classical model and multiple values of e OO or s OO parameters, until a parameter value is found for which DE water-B,QM-cl (R min ) = DE water-B 0 ,QM-cl (R min ) (eqn (1)) to within 0.5 kcal mol À1 . We tested values of e OO or s OO ranging from 0.6 to 1.3 times the original values, in intervals of 0.01 for e OO and 0.001 for s OO . For similar reasons as those indicated in point 4, we optimize e OO when the original DE water-B 0 ,QM-cl (R min ) differs less than 1 kcal mol À1 from the target, and optimize s OO otherwise.
(6) Evaluate whether the Lorentz-Berthelot combination rules (eqn (4)) together with the current parameter set -already including the optimized self-interaction parameters -allows the reproduction of experimental properties that strongly depend on anion-cation interactions; select which ion parameters will be optimized. We tested whether the current parameter set yields reasonable solution activity derivative at 0.5 m, for the salts for which experimental solution activities are available. The combination rules proved insufficient for all test cases, as described in the Results section, leading to deviations larger than 0.03 -the statistical uncertainty associated with our calculations -from the target values.
As for the case of the self-interaction parameters, we only have one target property against which to optimize anion-cation interactions -the solution activity derivative. For this reason, we opt to tune the Lennard-Jones e or s parameters corresponding to the interaction between the anion-oxygens and Na + , and between the anion-oxygens and N and H N in NH 4 + . Both parameters for N and for H N are optimized because in this case the buried atom, N, is expected to contribute significantly to the intermolecular interactions; hydrogen atoms always have weak or zero Lennard-Jones potentials.
(7) Optimize anion-cation interaction parameters based on experimental properties, for the salts for which they exist. The Lennard-Jones e or s parameters corresponding to the interaction between the anion-oxygens and Na + (O Á Á Á Na þ ), and between the anion-oxygens and N and H N in NH 4 parameterized to reproduce the activity derivative of 0.5 m solutions of salts for which that information was available. For similar reasons as those indicated in point 4, we optimize either e or s, and retain the parameters obtained via the combination rules (eqn (4)) for the remaining parameter. We optimize s when the initial solution activity derivative deviates from the target more than 0.1. and optimize e otherwise. The determination of the optimal parameters followed an analogous procedure as that described in point 4.
(8) Optimize the intermolecular interaction parameters for the remaining salts (for which experimental target properties are not available) following the ab initio approach described in the introduction. We optimized the Lennard-Jones parameters (a) For each anion-cation pair to be parameterized (AB 0 ), identify a similar pair for which intermolecular interactions are already parameterized (AB). The pairs must share a common ion (A), and B and B 0 must be similar species; see point 5a for the conditions defining similarity between B and B 0 .
(b) For the AB and AB 0 dimers, select an initial relative orientation that is expected to yield similar and strong intermediate interactions. Here we consider only orientations where the O2 oxygens face the cations. Perform gas phase, ab initio potential energy scans as described in point 5b, evaluate hyperconjugation effects, and find new orientations if necessary. A detailed discussion of the impact of hyperconjugation effects on anion-cation interactions is given in Section 3.2.2.1.
(c) For the AB dimer, perform a gas phase potential energy scan with the classical model and the optimized parameters, and with the same configurations used in the ab initio scan. Calculate DE AB,QM-cl (R min ) (see eqn (1)).
(d) For the AB 0 dimer, perform multiple gas phase potential energy scans with the classical model and multiple values of e or s for the O Á Á Á Na þ , O Á Á Á N and O Á Á Á H N interactions until a parameter value is found for which DE AB,QM-cl (R min ) = DE AB 0 ,QM-cl (R min ) to within 0.5 kcal mol À1 . Further details analogous to those in point 5d.
In the Results section we discuss further aspects concerning the application of this approach to our systems. The final parameters, expressed as a scaling factor relative to the original GAFF 13,30-32 parameters to facilitate comparisons, are shown in the Tables presented in the Results section, as well as in the ESI † in Gromacs-readable files.
2.1.2 General simulation details. We perform two types of molecular dynamics (MD) simulations using the GROMACS simulation package 40,41 and the TIP3P water model: 29 free energy perturbation simulations to calculate hydration free energies, and molecular dynamics simulations to calculate solution activity derivatives. All simulation boxes are cubic with periodic boundary conditions applied in the XYZ directions. We set a non-bonded potential cutoff at 1.2 nm distance. Beyond this cutoff distance, the electrostatic interactions are calculated with the particle mesh Ewald (PME) method. 42 Lennard-Jones interactions are smoothly shifted to zero between 1.0 and 1.2 nm. Long range dispersion corrections were applied to the energy and pressure. A leap-frog stochastic dynamics (SD) integrator 43 is used to integrate the equations of motion in all simulations with the temperature fixed at 298 K. For NPT simulations we maintained the pressure at 1 bar using the Berendsen barostat 44 with a coupling time of 1.0 ps. All bonds with H-atoms were restrained using the LINCS algorithm. 45 We used a 2 fs time step.
The solution activity derivatives are calculated according to the Kirkwood-Buff theory, 46 with expressions for the integrals formulated specifically for closed systems. 47 A summary of the most relevant aspects of this theory and how it was applied here is given in the ESI. † The simulation box for the calculation of the activity derivative was prepared by putting 72 anions and 72 cations (144 when using divalent anions) in a cubic box with dimensions 4.0 Â 4.0 Â 4.0 nm 3 and solvating the system by adding E8000 TIP3P 29 water molecules using editconf and solvate tools from Gromacs 5.0. 40,41 The salt concentration simulated is thus 0.5 m. The system is equilibrated for 10 ns in the NPT ensemble. We select three distinct configurations from the equilibration run, with a box volume similar to the average box volume obtained during the equilibration. Each of these configurations is then used as the initial state for a 50 ns simulation in the NVT ensemble. The total simulation time for each production run is thus 150 ns; this large simulation time is necessary to provide enough sampling to calculate activity derivatives. The statistical uncertainty associated with the calculated activity derivatives is 0.03, as shown in Table S4 of the ESI. † The starting configuration for the solvation free energy calculations is prepared by placing a single ion in a simulation box of dimensions B4.0 Â 4.0 Â 4.0 nm 3 and solvating it in TIP3P water. 29 The free energy is obtained along the path coordinates of l LJ and l C , which are the scaling factor to the Lennard-Jones and the electrostatic potentials, respectively. These scaling factors range from 0 to 1 (1 for the decoupled state and 0 for the fully coupled state). To ensure adequate ensemble overlap, we use 21 steps for each of l LJ and l C ; the steps are evenly spaced for l C ; for the l LJ spacing we use 21 unevenly spaced points, where l LJ A {0.000.060.120.180.240.300.360.420.460.500.520.540.560.580.600.64-0.68 0.72 0.76 0.80 1.00}. Soft core potentials are used during the LJ coupling. For each l we perform an equilibration of 100 ps followed by a production run of 1 ns in the NPT ensemble. The total solvation free energies are calculated from the partial results along the path coordinates using Bennett's acceptance ratio (BAR), 48 as implemented in Gromacs 5.0. 40,41 The estimated uncertainty associated with the calculated solvation free energies is 0.1 kcal mol À1 , as shown in Table S3 of the ESI. † This value is estimated by performing different simulations for the same systems, using 0.5 ns simulations for each 21 lambda values. For divalent anions 41 lambda spacing was used as well for comparision, and the result did not change from the one obtained from 21 lambdas. The solvation free energies calculated directly from simulation must be corrected before they can be compared with experiment. 39 In the ESI † (SI 2.1) we describe in detail which correction terms are included in the values here reported.

Ab initio calculations
2.2.1 Charge fitting. The classical charges for the new species are obtained consistently with the AMBER 13 force field for proteins. The molecules are first optimized using MP2/6-31G* level of theory to find the global minimum geometries. Thereafter we use HF/6-31G* level to reoptimize the structures and to calculate the Merz-Kollman 49 atomic charges fitted to the electrostatic potential. All calculations are performed using the Gaussian 03 software package. 50 The Gaussian output is later plugged into the Antechamber module of the AMBER 13 software suit to generate the classical atomic point charges using RESP 28 fitting.
2.2.2 Potential energy scan. To optimize the Lennard-Jones parameters for the interaction of the anions with water and with cations for the cases where experimental data is unavailable, we require rigid-body potential energy scans of anion-water or anion-cation systems in the gas phase, as described in the Introduction and illustrated schematically in Fig. 1. The binding potential energy curves as a function of the intermolecular distance, given in terms of the distance between the two closest nuclei of the two species, are obtained for each dimer using the MP2 51 level of theory with augmented-cc-pvtz 52 basis set in the Gaussian 03 package. 50 The structure of each isolated species is first optimized with the same level of theory. After optimization, we perform the gas phase 1-D potential energy while holding the structure of the molecules fixed during the scan and using counterpoise correction; details regarding this correction are given in the ESI † (SI 2.3). Typical configurations for the anioncation and anion-water potential energy scans performed are shown in Fig. 2. We do not relax the structures during the potential scans because doing so would result in structures that change as a function of the intermolecular distance, and that also differ between the ab initio and the classical calculations for the same intermolecular distance. Given that our aim is to compare the minimum intermolecular energy between ab initio and classical calculations, this comparison should be done for identical structures.

Results and discussion
The optimized parameters are presented in the ESI, † in.top/.itp format for gromacs users; the original values are shown as comments, to facilitate comparisons. Amber-format parameters are given in the same files in the form of comments.
3.1 Determining self-interaction parameters for the oxygens of the anions 3.1.1 Parameters developed based on solvation free energies. The computational solvation free energies (SolvFEs) are calculated as described in Section 2.1.2. We attempt to reproduce the experimental target data (DDG Exp solv ; see eqn (5)) by scanning a range of e O2-O2 or s O2-O2 . For HSO 4 À , scanning of the e OH-OH or s OH-OH parameters would in principle also be necessary but in practice was not done because the original GAFF 13,30-32 parameters proved adequate. The procedure followed at this step (point 4 in the road map in the Methods section) is illustrated in Fig. 3  . Because the uncertainty associated with experimentally determined solvation free energies of single ions 39 is at least of the order of this deviation, we do not further parameterize HSO 4 À against the SolvFE and we use the GAFF 13,30-32 Lennard-Jones parameters for this anion.
For acetate (CH 3 COO À ), the original e O2-O2 needs to be scaled by a factor of 0.77 to retrieve the experimental SolvFE for this ion. The optimized parameter thus reproduces the difference in hydration free energies between Cl À and CH 3 COO À better than the GAFF 13,30-32 parameters. We note, however, that the deviation from experiment using the GAFF 13,30-32 parameters is only +2.3 kcal mol À1 , so using GAFF parameters to investigate the interactions of this anion with water is acceptable. In contrast, GAFF 13,30-32 parameters yield a far too low hydration free energy for SO 4 2À : the deviation from experiment is À27.9 kcal mol À1 . A scaling factor of 1.17 to the original s O2-O2 parameter is required to reproduce the target DDG Exp solv of the sulfate. GAFF 13,30-32 parameters dramatically fail to reproduce the SolvFE of multivalent ions, emphasizing the importance of parameter optimization with respect to SolvFE for those anions.
3.1.2 Parameters developed using ab initio calculations. To optimize the oxygen parameters for CH 3  -water and CH 3 COO Àwater, are shown in Fig. S3 of the ESI; † the energy and interparticle separation at the energy minimum are shown in  We then proceed to optimize the self-interaction parameters as described in point 5 of the road map given in the Methods section. The optimized self-interaction parameters for the oxygens in all the anions are shown in Table 2. The same table also shows DR QM-cl (R min ), the difference in the position of the potential energy minimum calculated using ab initio and using classical parameters. For the divalent anions, the optimized parameters result in larger anion-water separation at the energy minimum of the dimer than the original GAFF 13,30-32 parameters. This increase has been observed in other parameterizations of multivalent oxoanions based on experimental data, 54 and illustrates the inadequacy of applying GAFF 13,30-32 parameters, derived for neutral molecules, to heavily charged species.
The original GAFF parameters for CH 3  also scaled by the same scaling factor applied to s O2 of that species. Applying these scaling factors to OH or OS is recommended, despite the fact that they should lead to relatively small changes in the overall interaction between these species and water; e.g., scaling or not the OH oxygen in HPO 4 2À leads to changes of B2 kcal mol À1 in SolvFE.
The calculated uncorrected (DG cav + DG chg ) SolvFE of HPO 4 2À are À311.8 kcal mol À1 and À285.8 kcal mol À1 , for GAFF 13,30-32 and our parameters, respectively. This large change in solvation free energy illustrates the critical importance of optimizing ionwater interactions for divalent anions. ¶  (5). Table 2 Differences between the classical and quantum potential energy scans for all anion-water dimers, and values of the optimized anion-water parameters for the anions based on ion-water interactions, we evaluate whether those parameters can be used with the combination rules shown in eqn (4) to obtain reasonable interactions between the anions and two cations of interest: the commonly used counterion Na + , and NH 4 + , which is often considered to be analogue to the terminal side chain group of the amino acid lysine. We assess the quality of those parameters by calculating, as described in Section 2, the activity derivatives of 0.5 m solutions of salts for which experimental solution activities exist. These results are given in Table 3, under column a Calc,b cc and are referred to in the text as arising from using unoptimized anion-cation parameters. For comparison, we also calculate activity derivatives using exclusively the original set of GAFF 13,30-32 parameters for the anions (denoted as a Calc,c cc in Table 3). The unoptimized anion-cation parameters fail, often dramatically, to adequately describe anion-cation interactions in all cases; the original set of GAFF 13,30-32 parameters also fails. These failures point to the need of specifically optimizing the anion-cation interactions. Modifications to the anion-cation parameters (see Table 3) are indispensable to attain agreement with experiment for solutions of Na 2 SO 4 or (NH 4 ) 2 SO 4 : for both salts, using either unoptimized anion-cation parameters or using the original GAFF 13,30-32 parameters led to dramatic aggregation, as illustrated in Fig. 4a for Na 2 SO 4 . Such large aggregation is clearly unphysical, because our simulations are done at 0.5 m, far below the solubility limit of Na 2 SO 4 (2.0 m at 25 1C), 57 and of (NH 4 ) 2 SO 4 (5.8 m at 25 1C). 58,59 As shown in Fig. 6, the optimized anion-cation parameters lead to very few contact ion-pairs formed per ion, with both ions forming primarily solvent shared ion-pairs, i.e., configurations where the ions share their first hydration layer.

Parameters developed based on solution activity derivatives. After finding optimal self-interaction parameters
For NaCH 3 COO, using unoptimized anion-cation parameters leads to a solution activity derivative, a Calc cc = 0.35, which is far lower than the target value of a Exp cc = 1.00; the original GAFF 13,30-32 parameters perform even worse, yielding a Calc cc = 0.22. To achieve good agreement with experiment, it is necessary to modify s ONa . The optimized parameters lead to only 1/5 of the contact ion pairs obtained using the original GAFF 13,30-32 parameters, as shown in Fig. 5a. Our results thus indicate that the interactions between Na + and acidic amino acids are markedly overestimated when using the sodium model of Joung and Cheatham 15 in the AMBER 13 force field with TIP3P water.
Our optimized potentials for Na + Á Á ÁCH 3 COO À agree much better with recent ab initio molecular dynamics results based on DFT methods with dispersion corrections 35 than the original GAFF potentials for TIP3P water: our potential of mean force calculations (ESI, † Fig. S5) for Na + Á Á ÁCH 3 COO À show that the difference in free energy between the minimum near 0.33 nm and that near 0.5 nm using our optimized parameters (E0.7 kcal mol À1 ) is comparable to the same free energy difference (E1 kcal mol À1 ) obtained using ab initio MD. 35 In contrast, for the original GAFF parameters in TIP3P water this difference is 1.5 kcal mol À1 . We note, however, that our classical   simulation results still show marked differences from those obtained using ab initio MD: our parameters yield a free energy minimum at 0.27 nm which is not seen in ab initio MD. In the absence of reliable experimental information reporting on the (non)existence of the ion pair at 0.27 nm separation, it remains unclear which of these descriptions is better. For NaCH 3 SO 3 and NH 4 CH 3 SO 3 , the activity derivative calculated without optimizing anion-cation interactions differs less than 5% from experiment. Despite this seemingly small difference, we provide optimized anion-cation parameters for these salts (see Table 3) because they lead to quantitative differences in the number of ion-pairs: Fig. 6 show that the optimized anion-cation parameters lead to 25% fewer contact ion-pairs -defined as ions in direct contact with each other -than the non-optimized ones.
3.2.2 Parameters developed using ab initio calculations. For several of the salts of interest here, the solution activities are not available in the literature or have high associated uncertainty. High uncertainty is often found for salts of amphiprotic species like phosphate, because solutions of those salts simultaneously contain H 2 PO 4 À , HPO 4 2À or PO 4 3À , making it difficult to deconvolute the contribution of each species to the solution activity. 56 For these salts, we optimize anion-cation interactions again following the approach described in the introduction, described in detail in point 8 of the road map given in the Methods section.

3.2.2.1
Determining optimal anion-cation orientations to be used in gas-phase potential energy scans. In ab initio calculations, contact ion-pairs in the gas phase benefit from extra delocalization stability due to the overlap of cation and anion orbitals; this extra stabilization is sometimes referred to as intermolecular hyperconjugation energy. [60][61][62][63] However, quantum effects such as electron delocalization and charge transfer between two ions are not expected to be dominant in aqueous solution; in this case the charge transfer preferentially occurs between the ions and the water molecules in the first hydration shell. [64][65][66] The fact that quantum effects stabilize ion-ion interactions in the gas phase but not in solution represents a problem when using our approach to develop classical parameters for ions in aqueous solution. Given that the magnitude of the hyperconjugation energy varies amongst contact ion-pairs -depending on the electronic structure and the relative orientation of the two ions -our ab initio approach might introduce artifacts if the two ionpairs being compared benefit from different hyper-conjugation stabilities. To resolve this issue, we must ensure that the anions for which we want to make the ab initio comparison have similar interactions with the same cation. We assess the similarity of interactions by performing natural bond orbital (NBO) 60 analysis. This analysis proved to well-capture the electron density transfers, 61-63 on the geometries shown in Fig. 2a-c and at distances corresponding to the minimum energy on the 1-D potential curves for each ion-pair. To perform this analysis we use the geometries from MP2 calculations and perform B3LYP 67-69 /aug-cc-pVTZ calculations to obtain the electron densities. Using the B3LYP method is necessary to obtain the well-defined one-electron densities required by NBO calculations. 61 The results of the NBO analysis are shown in Table 4. In ionpairs with Na + , the electron density transfers from the lone pair (LP) of the oxygen in the anion to the unfilled valence-shell (LP*) in Na + . The hyperconjugation energies vary in a very small range, between 6.5-10.0 kcal mol À1 for monovalent and between 11.0-11.5 kcal mol À1 for divalent anions, suggesting that the donor-acceptor interactions are similar for all ion-pairs involving Na + . Therefore, the configuration shown in Fig. 2b can be used for the ab initio calculations needed to derive optimized anioncation parameters. In contrast to the ion-pairs involving Na + , those involving NH 4 + in configuration I (see Fig. 2c) exhibit two different electron density transfers, one from the LP of the oxygen to the valence antibonding (BD*) orbital of (H-N) and the other from BD* of (S-O) to BD* of (H-N). The total hyperconjugation energies lie between 51.0-94.0 kcal mol À1 for monovalent and 130.0-283.5 kcal mol À1 for divalent ions. This wider range of hyperconjugation energies arises from different donor-acceptor interactions in these ion-pairs, making impossible the type of comparison necessary for our parameterization approach. To resolve this problem we searched for different relative orientations of the ions involved in NH 4 + ion pairs, to find an orientation in which hyperconjugation energies are both minimized and are similar for the various ion pairs. For orientation II (see Fig. 2d), the hyper-conjugation energies are in the narrow range of 4.5-6.0 kcal mol À1 for monovalent anions and 8.0-8.5 kcal mol À1 for divalent anions. We use this configuration to find the parameters for the ion-pairs containing ammonium and methyl ammonium.

Parameter optimization.
For parameter optimization, we require potential energy scans for reference species for which classical parameters already exist: we use NaCH 3 SO 3 as the reference for NaCH 3 SO 4 , NaH 2 PO 4 and Na(CH 3 ) 2 PO 4 ; Na 2 SO 4 as the reference for Na 2 CH 3 PO 4 ; NH 4 CH 3 SO 3 as the reference for NH 4 + Table S2 of the ESI † we show the energy and interparticle separation at the energy minimum for all cases.
The optimized anion-cation interaction parameters are shown in Tables 5 and 6. The anion-cation distances for monovalent anions are slightly larger using the optimized parameters than using GAFF. 13,[30][31][32] The same trend is observed for the divalent anions, although for those systems the increase in anion-cation distance is substantially larger than for the monovalent anions. Such an increase is not surprising: as mentioned above and as observed also by others, 54 it reflects the fact that Lennard-Jones parameters optimized for neutral species are not appropriate for charged species.
Our results show that optimum parameters for the NH 4 +anion interactions differ markedly from those for CH 3 NH 3 +anion interactions. This result indicates that transferring parameters determined for NH 4 + to other amines, which yields reasonable results for the interactions of these cations with water, 71 should be avoided for anion-cation interactions. The need for different NH 4 + anion and CH 3 NH 3 + -anion classical parameters that was detected with our approach illustrates the power of that approach to optimize intermolecular interactions: these two cations can be considered qualitatively similar (as defined in points 5a and 8a in the road map given in the Methods) for the purpose of our ab initio approach but, simultaneously, this approach is sensitive enough to detect how the interactions of the nitrogen and H N with other species quantitatively differ between NH 4 + and CH 3  Similarly to the systems for which parameters are optimized based on experimental data, we find that the optimized parameters reduce the net anion-cation attraction relative to GAFF 13,30-32 parameters (Fig. 6). These results suggest that GAFF 13,30-32 parameters in general overestimate cation-anion interactions and may lead to aggregation, particularly in solutions with higher salt concentration. The most dramatic changes are seen in solutions of divalent ions: GAFF 13,30-32 parameters lead to very rapid (o1 ns) aggregation of the ions and crystal formation, whereas our parameters yield solubilized systems. For all solutions containing monovalent anions, the optimized parameters decrease the number of contact ion pairs while leaving the number of solvent-shared ion-pairs unchanged. The overestimation of monovalent anion-cation interactions by GAFF is particularly marked for Na + Á Á ÁCH 3 COO À , NH 4 + Á Á Á(CH 3 ) 2 PO 4 À and Na + Á Á Á(CH 3 ) 2 PO 4 À .  GAFF 13,30-32 also particularly overestimates the interactions between CH 3 NH 3 + and CH 3 COO À . This effect is illustrated in Fig. 5b, by comparing the radial distribution of a 3 m solution of glycine zwitterion using GAFF 72 and using optimized parameters. Overestimation of interactions between anionic and cationic side-chains, i.e., salt-bridge formation, has also been observed by others. 73 This artifact can be prevented by using our optimized anion-cation interaction parameters.
3.3 Parameters obtained using the ab initio approach yield correct solution properties As described in the introduction, our approach to optimize parameters based on ab initio calculations relies on an assumption:  that, for similar anions, the difference, DE QM-cl (R min ), between the minimum binding energy of dimers from ab initio calculations and that calculated using optimized classical parameters should be similar for similar dimers (see Fig. 1). To evaluate whether this assumption holds, we compare the classical and ab initio gas phase potential energy scans for the CH 3 COO À Á Á ÁH 2 O and the HSO 4 À Á Á ÁH 2 O systems, and also for CH 3 SO 3 À Á Á ÁNa + and the CH 3 COO À Á Á ÁNa + systems. These are the only systems for which Lennard-Jones parameters were optimized based on experimental data and which contain.
For the anion-water case the scans are performed in the system configuration shown in Fig. 2a; these results are shown in Fig. 7. We find (see Table 2) that DE QM-cl (R min ) = À1.3 kcal mol À1 for HSO 4 À , and DE QM-cl (R min ) = À1.2 kcal mol À1 for CH 3 COO À ; the similarity in the DE AB,QM-cl (R min ) values for these two anions confirms that the ab initio approach proposed here holds for anionwater dimers. The scans also confirm that developing parameters by requiring that the classical potential energy curve overlaps with the ab initio is unsuitable. Parameters developed that way would greatly overestimate anion-water interactions.
For the anion-Na + systems the potential energy scans are performed in the configuration shown in Fig. 2b. We find, as shown in Table 5, that DE QM-cl (R min ) = +5.1 kcal mol À1 for CH 3 SO 3 À and DE QM-cl (R min ) = +5.3 kcal mol À1 for CH 3 COO À .
The two values are very similar, confirming that the approach holds also for anion-cation systems. Our parameters for the (CH 3 ) 2 PO 4 À Á Á ÁCH 3 NH 3 + interaction are very close to recently proposed parameters based that reproduce osmotic pressure measurements made on DNA arrays: we propose that s ON ¼ 1:04 Â s ON;GAFF , which corresponds to a +0.124 Å increase in the s ON value. This value is very close to the 0.14 Å increase proposed by Aksimentiev and Yoo. 22 As a further check of our ab initio approach, in Fig. 5b we also compare the anion-cation rdf for glycine obtained using our optimized parameters for CH 3 COO À Á Á ÁCH 3 NH 3 + interactions, with that obtained using parameters optimized from recent osmotic pressure measurements. 22 The similarity between our parameter values and those of Aksimentiev (see ref. 22, Table 2), and also of the radial distribution function further confirms the success of our approach. Finally, for a few of the salts for which the experimental solution activities are available, we calculate the densities of 0.5 m solutions and compare it against experimental data. The results, shown in Fig. 8, indicate that our parameters improve agreement with experiment, especially in the case of the divalent salts for which the densities obtained using the original GAFF parameters deviated substantially from the reference values.

Concluding remarks
We present an approach to develop classical Lennard-Jones parameters for ions, that can be used when experimental data for parameterization is sparse. The approach is based on MP2 level ab initio calculations but simultaneously integrates experimental data, to create a parameter set for multiple ions that is internally consistent and grounded on experiment. We chose the MP2 level of theory for the gas phase ab initio scans because MP2 reproduces relative binding energies in systems with hydrogen bonding or dispersive interactions well. 74 DFT methods that correct for the dispersion terms, such as DFT-D 75,76 and M06-2X, 77-79 however, might also yield reasonable results at a lower computational cost; this possibility remains to be tested.
Our tests show that our simple approach is very powerful, and is suitable to obtain both ion-water and anion-cation interaction parameters. The approach may be used to rapidly optimize parameters for one or multiple atom types in any species -mono-and polyatomic; mono-and polyvalent. When optimizing multiple atom types, however, it is desirable that multiple target experimental properties exist.
We use this parameterization approach to obtain a set of Lennard-Jones ion-water and anion-cation parameters for salts of various sulfate, sulfonate, phosphate and carboxylate ions and the cations Na + , NH 4 + and CH 3 NH 3 + , in TIP3P water. 29 The entire parameter set is compatible with the GAFF 13,30-32 force field and the sodium model of Joung and Cheatham. 15 GAFF 30 uses the same atom types as the AMBER 13 force field for proteins, so the parameters presented are also compatible with AMBER. The large range of anions tested and parameterized here and the compatibility with GAFF/AMBER should prove useful to investigate ion-specific effects in biology that arise from the presence of multivalent ions.   Our results show that the GAFF 13,30-32 Lennard-Jones parameters, which were developed based on data for neutral species, often yield incorrect ion-TIP3P water and anion-cation interactions; the results suggest that the same may occur for other similar, general, force fields. GAFF 13,30-32 divalent anions consistently interact far too favorably with TIP3P water; in contrast, ion-TIP3P water interactions for monovalent anions are typically well-represented in GAFF. 13,30-32 GAFF 13,30-32 parameters typically overestimate anion-cation attraction. Excessive anion-cation attraction is particularly marked for divalent anions, for which GAFF 13,30-32 parameters lead to complete aggregation of 0.5 m solutions within timescales of nanoseconds. GAFF 13,30-32 also markedly overestimates the attraction between acetate and the Na + ion of Joung and Cheatham, 15 as well as attraction between acetate and CH 3 NH 3 + ; these findings demonstrate that strong Na + -protein or salt bridges between cationic and anionic amino acids in proteins in simulations using AMBER 13 and the Na + ion from Joung and Cheatham 15 are unphysical. Lennard-Jones parameters developed for NH 4 + -anion interactions are often used for primary amines -and thus of the amino acid lysine. 22 Our results indicate, however, that that usage is flawed because the parameters governing the interaction of NH 4 + with the anions tested here are substantially different from those of CH 3 NH 3 + . Applying parameters derived for NH 4 +anion interactions to simulate CH 3 NH 3 + -anion systems will typically overestimate the attraction between the amine group and the anion. Parameters developed for oxygen atoms in sulfate and phosphate are often considered interchangeable. 22 We find this approximation holds for the divalent sulfate or phosphate ions we investigated here (SO 4 2À , CH 3 PO 4

2À
), but not for monovalent variants of those anions (CH 3 SO 4 À , CH 3 SO 3 À , H 2 PO 4 À , (CH 3 ) 2 PO 4 À ). These differences between mono and divalent ions likely result from the fact that the net anion-cation attraction has a larger contribution from the electrostatic component for the divalent cations.