An empirical model for solvation based on surface site interaction points

Surface site interaction points (SSIP) provide a quantitative description of the non-covalent interactions a molecule makes with the environment based on specific intermolecular contacts, such as H-bonds. Summation of the free energy of interaction of each SSIP across the surface of a molecule allows calculation of solvation energies and partition coefficients. A rule-based approach to the assignment of SSIPs based on chemical structure has been developed, and a combination of experimental data on the formation of 1 : 1 H-bonded complexes in non-polar solvents and partition of solutes between different solvents was used to parameterise the method. The resulting model is simple to implement using just a spreadsheet and accurately describes the transfer of a wide range of different solutes from water to a wide range of different organic solvents (overall rmsd is 1.4 kJ mol−1 for 1713 data points). The hydrophobic effect as well as the properties of perfluorocarbon solvents are described well by the model, and new descriptors have been determined for range of organic solvents that were not accessible by direct investigation of H-bond formation in non-polar solvents.

Footnote to table: * reported value converted from mole fraction standard state to molar standard state S6 Section 3:

Substructure fragments
For each molecule the SMILES string was analysed and a SMARTS based substructure code from table S3 was assigned to each heavy atom. Aromatic groups were assigned an additional code to describe a SSIP in the centre of the π-face of each aromatic 6 membered ring.                      Section 6). The constants in italic bold were optimised in order to minimise the rmsd between calculated and experimental free energies in the solvent/water partition models.

Octanol_water comparison
This sheet contains a list of 189 solutes for which free energy of transfer (-G 0 ) from water to wet octanol was available. These values are compared with calculated values using three different methods: a) Abraham solvation equation [3] using solvent coefficients for octanol taken from reference [4]. Calculated logP values are in column I and are converted to -G 0 in column K. Rmsd between calculated and experimental 1.1 kJ mol -1 b) cLogP calculated using Advanced Algorithm Builder software [5]. Calculated logP values are in column N and are converted to -G 0 in column P. Rmsd between calculated and experimental 0.8 kJ mol -1 c) -G 0 calculated by our new method are in column T. Rmsd between calculated and experimental 1.6 kJ mol -1

Expt. gas to solvent
This sheet contains a list of the 219 solutes that were used as the training set for the model. For each solute is listed the name and SMILES string and experimental free energy of transfer (-G 0 ) from gas to 35 different solvents.

Sources of Data
Experimental values of gas to solvent transfer free energies were obtained from literature sources as described below. These values were used to obtain water to solvent transfer free energies.
where data was available for crystalline solutes dissolved in both the anhydrous solvent and water and where the solute gas-to-water partition coefficient, Kw, is known.
Further details can be found in publications by Abraham, Acree and co-workers [6][7][8][9][10][11][12][13] Experimentally determined values of logK at 298 K were extracted from the appropriate references and converted into the free energy (-G 0 /kJ mol -1 ) for transfer from gas phase to solvent by the usual formula i.e. eqn. (S7.2). In order to conduct the feasibility study it was desirable to limit the size of the initial data set. Compounds were only selected if they had measured gas-to-solvent logK values available for water and several other solvents. An easily manageable set of 219 compounds was chosen as an initial training set and included a variety of common functional groups. Another set of 84 similar compounds was identified for which logPoctanol values and experimental gas to hexadecane and gas to water logK were available. This set of 84 solutes was used for validation of the parameters derived by analysis of the training set. New descriptors will need to be S44

Section 8: Correlation between Molecular Surface Area and the number of Surface Site Interaction Points
The Van der Waals surface areas were determined using the 0.002 e bohr -3 isosurface calculated with NWChem (Density Functional Theory B3LYP/6-31G* basis set) [1]. The surface areas were calculated by summing the number of points on the isosurface, but scaling the contribution of each point by the local density of points within a radius of 0.5 Å. Figure S8. Plot of number of SSIPs (x axis) v molecular surface area (Å 2 ) (y axis)