In silico prediction of linear free energy relationship descriptors of neutral and ionic compounds

Chul-Woong Cho; Stefan Stolte; Yeoung-Sang Yun; Ingo Krossing; Jorg Thöming

doi:10.1039/C5RA13595H

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C5RA13595H (Paper) RSC Adv., 2015, 5, 80634-80642

In silico prediction of linear free energy relationship descriptors of neutral and ionic compounds†

Chul-Woong Cho^ab, Stefan Stolte^ac, Yeoung-Sang Yun*^b, Ingo Krossing*^de and Jorg Thöming*^a
^aZentrum für Umweltforschung und nachhaltige Technologien (UFT), University of Bremen, Leobener Straße, 28359 Bremen, Germany. E-mail: thoeming@uni-bremen.de; Fax: +49 421 2188297; Tel: +49 421 21863300
^bSchool of Chemical Engineering, Chonbuk National University, Jeonju, Jeonbuk 561-756, Republic of Korea. E-mail: ysyun@jbnu.ac.kr; Fax: +82 63 2702306; Tel: +82 63 2702308
^cFaculty of Chemistry, University of Gdánsk, ul. Sobieskiego 18/19, 80-952, Gdánsk, Poland
^dFreiburger Materialforschungszentrum (FMF), University of Freiburg, Stefan-Meier-Str. 21, 79104 Freiburg, Germany. E-mail: krossing@uni-freiburg.de; Fax: +49 761 2036001; Tel: +49 761 2036122
^eInstitut für Anorganische und Analytische Chemie, University of Freiburg, Albertstraße 21, 79104 Freiburg, Germany

Received 11th July 2015 , Accepted 15th September 2015

First published on 16th September 2015

Abstract

We present a prediction model for linear free energy relationship (LFER) descriptors – excess molar refraction (E), dipolarity/polarizability (S), hydrogen bonding acidity (A) & basicity (B), McGowan volume (V), and interaction of cations (J⁺) and anions (J⁻) – of both ionic and neutral compounds at the same scale. From computational calculations using density functional theory, a conductor screening model, and the OBPROP program in Turbomole, we obtained the following physicochemical sub-parameters for 992 molecules and atoms, polar surface area, molecular weight, volume, energy of van der Waals, sigma moments, molar refraction, and hydrogen-bond donor and acceptor abilities of a molecule or an atom. By making selective combinations of these sub-parameters – including also the number of rings, OH groups, and hydrogen atoms attached to nitrogen – we obtained prediction models for the LFER descriptors V, E, S, A, and B with reasonable accuracies, i.e. for a training set of compounds all R² above 0.934. We validated the models by comparing calculated and experimentally determined LFER descriptors of a test set. Using the complete dataset, the following R² and SE values were obtained: E (R² = 0.949, SE = 0.136), S (R² = 0.940, SE = 0.378), A (R² = 0.936, SE = 0.148), B (R² = 0.973, SE = 0.160), J⁺ (R² = 0.816, SE = 0.351), and J⁻ (R² = 0.700, SE = 0.291). Furthermore, we demonstrated the applicability of the calculated LFER descriptors by predicting transfers of neutral and ionic compounds from water to propylene carbonate, sulfolane, and ethylene glycol with good accuracy. These results show that physicochemical properties of ionic and neutral compounds can be reliably predicted with identical LFER descriptors even for chemical entities that do not yet exist.

Introduction

A Linear Free Energy Relationship (LFER) or linear Gibbs energy relationship is a very useful concept for understanding and predicting biological, chemical, and physical activity mechanisms of chemicals. A typical LFER model is the Abraham solvation equation (eqn (1)) whose small number of descriptors provides readily comprehensible chemical potentials and leads to sufficiently accurate predictions.¹


LogSP = c + eE + sS + aA + bB + vV	(1)

where the dependent variable SP, abbreviated as solute property, refers to some partitioning-related properties of a series of solutes in a given system. The descriptors (capital letters) are as follows:

• E – excess molar refraction [(cm³ mol⁻¹)/10]; this represents a solute's polarizability and gives a measure of its ability to interact with a solvent through n- and π-electron pairs.

• S – dipolarity/polarizability (dimensionless); this gives a measure of the solute's ability to stabilize a charge or dipole.

• A and B – hydrogen bond acidity and basicity (dimensionless); A and B measures the extent of hydrogen bonding by the solute in a basic and an acidic solvent respectively.

• V – McGowan characteristic molar volume [(cm³ mol⁻¹)/100].

• System parameters are denoted by small letters (e, s, a, b, v).

For more detailed information on solute descriptors and eqn (1), we recommend the recently published review article by Poole et al. (2013).²

Because eqn (1) was limited in its applications for ions, two attempts by Abraham et al.^3–9 and our group^10,11 were made by adding ionic terms to eqn (1). Abraham et al.^3–9 added J⁺ and J⁻ terms for cations and anions respectively, as eqn (2).


LogSP = c + eE + sS + aA + bB + vV + j⁺J⁺ + j⁻J⁻	(2)

where j⁺ and j⁻ are system parameters of cations and anions respectively. For the application of eqn (2), Abraham et al. have reported the system parameters (e, s, a, b, v, j⁺, j⁻) with respect to a number of partitioning,^3–9 environmental, biological behaviors,^12,13 and the LFER descriptors of 4000–5000 neutral compounds, and around 300 ions and ionic species.^4–9

However LFER-descriptors determination based on Abraham theory needs at least 4 orthogonal systems for four missing descriptors i.e. S, A, B, and J⁺ or J⁻. Thus to simplify on the charge effect and to reduce the number of systems required for determination of the missing terms, we added Z terms to address the ionic interactions of charged compounds in our previous study:^10,11


LogSP = c + eE + sS^U + aA^U + bB^U + vV + z^cZ^c + z^aZ^a	(3)

where the different ionic terms (i.e. Z^c, Z^a) can be set at +1 for a monovalent cation and −1 for a monovalent anion. Based on the eqn (3), we experimentally determined the LFER descriptors of 30 monovalent cations¹⁰ and 18 monovalent anions using high performance liquid chromatography (HPLC).¹¹ Then we used them for predicting environmental properties i.e. octanol–water partitioning coefficient,¹¹ water solubility,¹¹ and hydrophobicity.¹⁰

However, these experimental determinations of the LFER descriptors are time-consuming and labor-intensive, because elaborate analytical techniques are required. The molecules of interest must exist physically; therefore it is not suitable for a design of chemical structures e.g. drug. To overcome these deficiencies, computational approaches have been used by several researchers, and several prediction methods have been reported.^14–20 But since they were established based on only neutral compounds, it was not clear whether these models could be used to calculate the LFER descriptors of ions at the same scale as those of neutral compounds. In order to check the point, we tried to compare between Abraham descriptors and those calculated from Zissimos et al.²⁰ as a typical example using 707 neutral compounds and 285 ionic compounds as listed in ESI.† Note that in this study we treat solely Abraham descriptors in eqn (2) while UFT descriptors in eqn (3) will be investigated in the future. The method of Zissimos et al.²⁰ is to use the sigma moments (σ₂ and σ₃) and hydrogen bonding donor (HBD₃) and hydrogen bonding acceptor (HBA₃) calculated based on density functional theory (DFT) and conductor screening model (COSMO). The equations for S, A, B are as follows:²⁰


	(4)


	(5)


	(6)

where R² is the coefficient of determination, SE is the standard error, N is the number of compounds, and F is F-statistics.

When correlating the Abraham descriptors with those calculated according to eqn (4)–(6), the problem was obviously observed as shown in Fig. 1 for S and Fig. S1† for A and B. The values calculated by Zissimos et al.²⁰ fit those for neutral compounds very well, but they do not fit those for ions. Although Abraham et al.^4–6 established some calculation models for a series of carboxylic acids,⁴ phenoxides,⁶ protonated pyridines⁵ and amines,⁴ the models are restricted to compounds without known LFER descriptors of their neutral form or pK_a value in some case for phenoxide and protonated pyridine. Accordingly, we aimed to develop LFER-descriptors calculation models, integrating both neutral and ionic compounds at the same scale.


	Fig. 1 Correlation between experimentally determined and calculated S values according to eqn (4).

Results and discussion

McGowan volume (V)

The McGowan volume is a very important factor in the prediction of the physicochemical properties of chemical compounds.²¹ In general, the V represents the lipophilicity of compounds; in most cases, therefore, an increase in volume reduces water solubility but increases organic solvent solubility. Since the molecular volume term makes a significant contribution to the prediction of solute^10,11 and solvent²¹ properties, miscalculation of V leads to serious prediction errors. Fortunately, the V term for neutral compounds is easily calculated from the kinds and number of atoms, and the number of ring structures.²² For an ion it can be calculated by adding 0.0215 (for an anion) or subtracting 0.0215 (for a cation) to the volume of its neutral form as Abraham et al. demonstrated.^3–9 Nevertheless, we found a more convenient way of calculating the V values: we derived a linear eqn (7) by correlating the McGowan volume values – which range from 0.110 to 3.81 – with the COSMO volume obtained by COSMO-RS (V_C).


	(7)

where R² is the coefficient of determination, R_adv² is the adjusted R², SE is the standard error, N is the number of chemicals used and F represents the Fischer statistics. Not surprisingly, the V_C term has a first-order correlation with a low standard error (SE) of 0.029. The values calculated according to eqn (7), along with Abraham's McGowan volume are given in ESI 2.† The fitting is shown in Fig. 2. Here we did not divide the whole dataset into a training set and a test set because they have a good fit.


	Fig. 2 Correlations between the Abraham V and calculated V_C in the unit of [(cm³ mol⁻¹)/100].

Excess molar refraction (E)

E is a straightforward descriptor as a dispersion interaction due to electron lone pair interactions.¹ It can be calculated using the solute molar fraction (MR; [cm³ mol⁻¹]/10) and the smaller MR of a hypothetical n-alkane with an identical volume, which can be transformed by (2.83195V − 0.52553):¹


E [cm³ mol⁻¹/10] = MR − (2.83195V − 0.52553)	(8)

here MR is defined as eqn (9).


	(9)

where η (dimensionless) is the refractive index of a substance in pure liquid form at 20 °C. In general it can be calculated using ChemSketch (ACD Labs, Toronto Canada), Absolv (ACD Labs, Toronto Canada) etc. However, these programs rarely produce the same E values for the same compound as Poole et al. (2013) stated,² and are restricted to ions. As an alternative program, we used the OBPROP program,²³ an internet free-software. Since OBPROP can calculate the MR values, we could skip a calculation step based on eqn (9) using η, and use directly eqn (8). First we tried to correlate MR from OBPROP (MR_OBP) with the Abraham's MR derived by reverse calculation of Abraham's E using eqn (8). The result showed that they are linearly correlated with R² of 0.953 [Fig. S2 and eqn (S1)†]. To check the applicability of the MR calculated by OBPROP, it was scaled by eqn (S1)† and used for calculating E values by eqn (8), and then it was compared with Abraham's E values. From the comparison, it was observed that the calculated E values were scattered over a range of low E as shown in Fig. S3.† It has a low R² of 0.665. This is due to the combinatorial reflections of calculation errors for the MR and V. Therefore, to enhance the calculation accuracy, we tried to directly predict the E values. Here we randomly divided the whole dataset into a training set and a test set (491 vs. 501). By a trial and an error, we were able to find out reasonably related terms i.e. the number of rings (N_Ring), polar surface area (PSA), molar refraction (MR), scaled COSMO-RS volume (V_C), van der Waals energy (E_vdW), molecular weight (MW), and sigma moments (σ₁ and σ₄). And by performing multiple linear regressions using the selected sub-descriptors, system coefficients of eqn (10) were determined. The E values calculated by eqn (10) (prediction coefficients are given in eqn (11) in Table 1) were well correlated with observed E of the training set in R² = 0.950 and SE = 0.130. By checking the p-values of each term – acceptable if p < 0.05 – we were able to ensure that the selected sub-descriptors were reasonably used because their p values were below 0.000.


E [cm³ mol⁻¹/10] = a^EN_Ring + b^EPSA + c^EMR + d^EV_C/100 + e^EE_vdW + f^EMW + g^Eσ₁ + h^Eσ₄ + i^E	(10)

Table 1 The coefficients and standard error in parenthesis of prediction model eqn (10) for E resulting in eqn (11) and (12)

Eqn		a^E	b^E	c^E	d^E	e^E	f^E	g^E	h^E	i^E	R²	SE	N	F
(11)	Training set	0.355 (0.016)	0.007 (0.00)	0.054 (0.003)	−1.736 (0.072)	−0.127 (0.020)	0.255 (0.031)	0.142 (0.014)	0.018 (0.003)	−0.041 (0.034)	0.950	0.130	491	1146.5
(11)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.221	0.950	0.130	491	1146.5
(12)	Total set	0.341 (0.012)	0.007 (0.000)	0.057 (0.002)	−1.762 (0.050)	−0.113 (0.014)	0.275 (0.022)	0.135 (0.009)	0.015 (0.002)	−0.037 (0.024)	0.949	0.136	992	2274.8
(12)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.127	0.949	0.136	992	2274.8

In the next step, eqn (11) was validated by predicting E of the test set of 501 chemicals. From the prediction, we checked that eqn (11) could predict E of the test set with R² = 0.949 and SE = 0.136. The coefficient of determination is as good as that of the training set. For more integral modelling, we re-estimated all coefficients based on eqn (10) using whole data points i.e. including both the training set and the test set. The correlation has R² = 0.949 and SE = 0.136 [eqn (12) in Table 1]. As shown in Table 1, the coefficients, R² and standard error are only slightly changed. Fig. 3 shows the fit of cation, anion and neutral compounds.


	Fig. 3 Correlation between observed E values and calculated E values by eqn (12).

Hydrogen bonding acidity (A; dimensionless)

A is a factor of great importance in biological and environmental systems. To predict it, we used a wide range of A values from 0 to 3.34. In performances for multiple linear regressions with the calculated sub-descriptors, we found the related terms. These are hydrogen bonding donor moments (HBD_i), its squared forms, the number of hydroxyl groups and the number of hydrogens attached to nitrogen in a molecule. Here the hydrogen bonding donor moments (HBD_i) contributes mainly to descriptor A. Especially its squared forms (HBD_i²) have a better agreement than original HBD_i. And by including additional the number of OH (hydroxyl) groups (N_OH), we slightly enhanced its accuracy, which is a reasonable approach because the molecules with an OH group have high hydrogen-bond donor potentials. And a further addition of the number of hydrogen atom attached to nitrogen (N_HN) enhances the accuracy slightly higher, as Abraham and Acree showed that the N_HN can be used to predict A values of protonated amines.⁴

Resultingly, the combination of the sub-parameters as eqn (13) could be used to predict the descriptors A of the training set in R² of 0.934 and SE of 0.147. The determined coefficients were given in eqn (14) in Table 2. To check whether the sub-parameters were reasonably used, adjusted R² and p-values of the sub-parameters were checked. Its adjusted R² value is 0.933, similar to R² of 0.934, and the p-values of the terms were all < 0.000. It is indicating that the selections of sub-parameters were rational for the model.


	(13)

where, i and j stand for 1, 3, 4, and 1, 2, 3 respectively. And a_i^A, b^A, c^A, d^A, e^A, and g^A are correlation coefficients (given in Table 2 along with p values). Then, we validated eqn (14) by predicting the A values of test set. The correlation between observed A and A calculated by eqn (14) of the test set is very fit. The respective values of R² and SE of the test set are 0.943 and 0.138 respectively, which are comparable to those obtained using the training set. When re-performing multiple linear regression of eqn (13) using whole compounds, it has R² = 0.936 and SE = 0.148 and its coefficients are given in eqn (15) in Table 2. Here we identified 18 outliers (1.8%) are listed in Table S1.† The in detailed explanation on outliers is given in ESI.† The fitting is shown in Fig. 4.

Table 2 Coefficients (standard error) of system parameters for predicting hydrogen bonding acidity (A), and probabilities (p-value) based on eqn (11) in case of using training set and all dataset resulting in eqn (14) and (15)

Eqn		a_i^A			b_j^A			c^A	d^A	e^A	g^A	R²	SE	N	F
Eqn		HBD₁²	HBD₃²	HBD₄²	HBD₁	HBD₂	HBD₃	c^A	d^A	e^A	g^A	R²	SE	N	F
(14)	Training set	217.5 (18.2)	−0.061 (0.005)	0.045 (0.005)	77.60 (5.742)	0.679 (0.048)	0.217 (0.038)	0.265 (0.022)	0.070 (0.016)	−0.022 (0.003)	−0.132 (0.015)	0.934	0.147	486	748.4
(14)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.934	0.147	486	748.4
(15)	Total dataset	171.0 (12.2)	−0.047 (0.003)	0.032 (0.003)	73.511 (4.318)	0.654 (0.037)	0.208 (0.027)	0.203 (0.016)	0.080 (0.011)	−0.019 (0.002)	−0.132 (0.015)	0.936	0.148	976	1626.2
(15)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.936	0.148	976	1626.2


	Fig. 4 Correlation between observed A values and calculated A values by eqn (15).

Hydrogen bonding basicity (B; dimensionless)

B is an important factor representing hydrophilic properties; it therefore has significant impact on water-solvent partitioning system. The range of B values for modelling was from 0 to 5.88. This prediction study was carried out in the same way as for the predictions of E and A. During the modelling, we found that a combination of second–sixth sigma moments (σ_2–6) and the second and third hydrogen bonding acidities (HBA₂ and HBA₃) terms were well correlated with observed B. Here, instead of the original HBA terms, HBA₂/V_C and HBA₃/V_C terms had a better agreement with B. Moreover by adding the first and second sigma moments squared, i.e. σ₁², σ₂², we were able to improve the prediction accuracy further. The equation including the selected sub-parameters is as following:


	(16)

where, μ stands for 2–6, and a_μ^B, b^B, c^B, d^B, e^B and f^B are correlation coefficients. This combination led to a good correlation with training set i.e. R² = 0.975 and SE = 0.162. We checked that a total of 9 descriptors were properly used as reflected by p < 0.001 and R_adv² = 0.978. The system parameters and p-value of each term are given in Table 3. Then eqn (17) was validated using the test set. The correlation of the calculated test set with Abraham's B of the test set was: R² = 0.971 and SE = 0.157, which are comparable to the results obtained with the training set. When using whole dataset for modelling B based on eqn (16), determined coefficients [eqn (18)] were slightly changed and it has a good agreement with R² = 0.973 and SE = 0.160. The coefficients and fitting are shown in Table 3 and Fig. 5 respectively.

Table 3 Coefficients (standard error) of system parameters and probabilities (p-value) and t-statistics of parameters for predicting hydrogen bonding basicity (B) resulting in eqn (17) and (18)

Eqn		a_i^B					b^B	c^B	d^B	e^B	f^B	R²	SE	N	F
Eqn		σ₂	σ₃	σ₄	σ₅	σ₆	b^B	c^B	d^B	e^B	f^B	R²	SE	N	F
(17)	Training set	0.243 (0.062)	1.019 (0.043)	0.512 (0.045)	−0.121 (0.010)	−0.073 (0.007)	0.119 (0.036)	−0.099 (0.014)	−0.074 (0.009)	0.007 (0.012)	0.060 (0.020)	0.975	0.163	491	2046
(17)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.020	0.975	0.163	491	2046
(18)	Total data	0.391 (0.044)	1.000 (0.029)	0.421 (0.030)	−0.117 (0.007)	−0.055 (0.005)	0.112 (0.023)	−0.149 (0.011)	−0.070 (0.007)	0.074 (0.009)	0.032 (0.014)	0.973	0.160	985	3851
(18)	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.024	0.973	0.160	985	3851


	Fig. 5 Correlations between observed and calculated B of cation, anion and neutral compounds according to eqn (18) in Table 3.

Cationic interaction term (dimensionless)

J⁺ represents the cationic interaction with the system; its range of 111 cations is from 0.076 to 2.49. In this prediction, we did not divide into training set and test set because number of dataset is not enough. By performing the same steps as above, it was found that 7 sub-parameters i.e. E_vdW, σ₃, σ₄, σ₆²/100, HBD₁, HBD₂, and HBA₂/V_C are related to the J⁺ term. The estimated p-values of all the sub-descriptors used except constant are lower than 0.05. This combination allowed us to predict J⁺ values with R² = 0.816 and SE = 0.351.


	(19)

The fitting between observed J⁺ and the calculated J⁺ according to eqn (19) is shown in Fig. 6.


	Fig. 6 Correlations between observed and calculated J⁺ by eqn (19).

Anionic interaction term (J⁻; dimensionless)

J⁻ represents the anionic interaction with the system. For predicting J⁻, 168 anions ranging from 0.625 to 4.011 were used. As same as prediction of J⁺ term, there is no a division into training set and test set. The prediction model for the J⁻ term set up with R² of 0.710 and SE of 0.291 using 9 sub-parameters i.e. σ₂, σ₃, σ₂², σ₃², σ₄², HBA₂, HBA₃, N_OH and N_Ring is shown in eqn (20). The R² value was rather inferior while SE value is reasonably low. It might be due to that the observed data points of J⁻ are mainly concentrated in between 1.5 and 3. The correlation fitting between Abraham and calculated J⁻ is shown in Fig. 7.


	(20)


	Fig. 7 Correlations between observed and calculated J⁻ by eqn (20).

Dipolarity/polarisability (S; dimensionless)

The descriptor S has a wide range of values from 0 to 7.75. As above, we constructed a prediction model eqn (21) by employing 8 terms including some calculated descriptors, strongly related to descriptor S. These are σ₁, σ₂², σ₄², HBA₄, Calc. E, Calc. B, Calc. J⁺, and (σ₁ × Calc. J⁻)/Calc. B. In modeling S, it was observed that the S values of anions, which magnitude are much higher than those of neutral and cationic compounds, and the S values of ions were mainly related to some of the calculated descriptors i.e. E, B, J⁺ and σ₁ × Calc. J⁻/Calc. B as shown in eqn (21).


S [dimensionless] = a^Sσ₁ + b^Sσ₂² + c^Sσ₄² + d^SHBA₄ + e^SCalc. E + f^SCalc. B + g^SCalc. J⁺ + h^S(σ₁ × Calc. J⁻)/Calc. B + i^S	(21)

Especially σ₁ × Calc. J⁻/Calc. B makes the largest contribution to S descriptor. In a prediction study using training set based on eqn (21), we checked that the calculated descriptors except to constant have low p-values (<0.000) and R_adv² was nearly the same to R². Its R² and SE are 0.944 and 0.357 respectively and coefficients are given in eqn (22) in Table 4. Then we validated eqn (22) using the test set. The eqn (22) can predict the B value of the test set in R² = 0.939 and SE = 0.375, which were comparable to those of the training set. Fig. 8 shows the correlation between observed and calculated S values. Additionally the S calculated by eqn (23) derived from whole data set, the R² and SE are 0.940 and 0.378 respectively (see Table 4).

Table 4 Coefficients (standard error) of coefficients determined for predicting dipolarity/polarizability (S), and probabilities (p-value) of parameters for eqn (21) resulting in eqn (22) and (23)

Eqn		a^S	b^S	c^S	d^S	e^S	f^S	g^S	h^S	i^S	R²	SE	N	F
(22)	Training set	−1.435 (0.119)	0.220 (0.027)	−0.008 (0.001)	−0.126 (0.010)	0.526 (0.032)	1.534 (0.076)	0.844 (0.077)	3.267 (0.141)	−0.110 (0.037)	0.944	0.357	489	1014.5
(22)	p-values	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.003	0.944	0.357	489	1014.5
(23)	Total data	−1.441 (0.089)	0.206 (0.023)	−0.009 (0.001)	−0.122 (0.007)	0.511 (0.024)	1.524 (0.060)	0.856 (0.057)	3.308 (0.103)	−0.099 (0.028)	0.940	0.378	981	1900.7
(23)	p-values	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.003	0.940	0.378	981	1900.7


	Fig. 8 Correlations between Abraham's S and calculated S of all dataset by eqn (23).

Further validation of applications of calculated descriptors by predicting solute properties

To check whether the calculated descriptors can be used for prediction of solute properties, we used them for predicting the transfer of chemicals (log [thin space (1/6-em)]

P) from water to propylene carbonate, sulfolane and ethylene glycol. The system parameters constructed by Abraham and Acree^5,8 were not considered for the prediction, instead we newly estimated the system parameters for better prediction using the calculated LFER descriptors.

By doing multiple linear regressions with the calculated LFER descriptors calculated by eqn (7), (12), (15), (18), (19), (20), and (23) established using total data set since their model parameters were derived by more chemical structures. In result, we found that they have a high potential to predict the transfer of compounds from water to propylene carbonate (dataset for 67 neutral and 27 ionic compound was taken from ref. 5) with R² = 0.939 and SE = 0.606 [thin space (1/6-em)] log units as shown in eqn (24). Here the E, S and A terms are not of importance and can be excluded as in eqn (24) of Table 5. For that reason, R² slightly decreases to 0.936. The calculated and observed logP values in water–propylene carbonate are given in Table S2† and its fitting is shown in Fig. 9. As seen in Table 5, p-values confirm that the calculated LFER descriptors were reasonably used.

Table 5 System coefficients for predicting the transfers of neutral and ionic compounds from water to propylene carbonate, sulfolane and ethylene glycol using the calculated LFER descriptors resulting in eqn (24)–(26)

Eqn	System	e	s	a	b	v	j⁺	j⁻	c	R²	SE	N	F
a Water–propylene carbonate.b Water–sulfolane.c Water–ethylene glycol.
(24)	W–P^a	—	—	—	−3.979	3.531	−2.911	0.655	0.133	0.939	0.606	94	340.7
	p-values	—	—	—	0.000	0.000	0.000	0.000	0.406
	t-statistics	—	—	—	−18.60	23.98	−16.04	3.85	0.835
(25)	W–S^b	0.645	0.160	—	−4.744	3.499	−3.518		0.074	0.992	0.462	114	2746.5
	p-values	0.001	0.006	—	0.000	0.000	0.000		0.573
	t-statistics	3.379	2.785	—	−71.16	20.75	−9.967		0.565
(26)	W–E^c	1.177	1.826	1.819	2.374	1.988	−7.676	−7.203	−1.210	0.903	1.012	97	117.81
	p-values	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	t-statistics	3.878	6.305	4.373	5.607	5.658	−12.80	−9.10	−4.205


	Fig. 9 Correlations between observed and calculated transfer values of compounds from water to propylene carbonate by eqn (24).

The calculated descriptors were also used to predict the transfer of 75 neutral and 39 ionic compounds from water to sulfolane. The ranges of the dataset⁸ are widely distributed from −10.81 to 6.23 log units. The results show that the correlation between experimental and calculated values has very good agreement; R² = 0.992 and SE = 0.465 [thin space (1/6-em)] log unit. However the calculated A and J⁻ with higher p-values than 0.05 is statistically not important, so they were excluded as provided in eqn (25) of Table 5; therefore SE value slightly decreases to 0.462. Its fit is shown in Fig. 10.


	Fig. 10 Correlations between observed and calculated transfer values of compounds from water to sulfolane by eqn (25).

Using the same steps above, the system parameters for predicting the transfer values from water to ethylene glycol were determined using calculated LFER descriptors of 97 chemicals (72 neutral compounds and 25 ionic compounds). The dataset were taken from ref. 5. In this prediction, 7 calculated LFER descriptors are involved in the model [eqn (26) of Table 5], of which the p-values are 0.00 indicating that they were reasonably used for modelling. Although SE value of prediction model was rather low (SE = 1.012), we could observe a nice prediction trend of R² = 0.901 (Fig. 11).


	Fig. 11 Correlations between observed and calculated transfer values of compounds from water to ethylene glycol by eqn (26).

Considering reviewer's suggestion, we have compared the predictability of the calculated LFER descriptors with that of partitioning-coefficient prediction module in COSMOtherm.³¹ Since in the real environment only water–propylene carbonate binary phase is possible, the transfer values of compounds only in that phase were studied. In result, COSMOtherm³¹ showed that it can predict the transfer of compounds from water to propylene carbonate with a correlation coefficient of 0.78 (Fig. S4†), indicating that the calculated LFER descriptors using sub-parameters from COSMOtherm³¹ is more useful to predict the transfers of compounds from water to propylene carbonate. The COSMOtherm-predicted log [thin space (1/6-em)] P values were given in Table S3.†

Conclusions

We presented prediction models for seven LFER descriptors of ions and neutral compounds. The models are composed of the sub-parameters, i.e. molar refraction, polar surface area, energy of van der Waals, sigma moments, and hydrogen-bonding acceptor moments & donor moments, which were computed by density functional theory, conductor screening model, and OBPROP program and some chemical structural information, i.e. the number of rings, hydroxyl groups and hydrogen atoms attached to nitrogen. The models were set up by a training set and validated by a test set, and then further verified by predicting transfer values of neutral and ionic compounds from water to propylene carbonate, sulfolane, and ethylene glycol. Accuracy of training set, test set and complete dataset in terms of R² and SE values for using complete dataset were as following: E (R² = 0.949, SE = 0.136), S (R² = 0.940, SE = 0.378), A (R² = 0.936, SE = 0.148), B (R² = 0.973, SE = 0.160), J⁺ (R² = 0.816, SE = 0.351), and J⁻ (R² = 0.700, SE = 0.291). Although the LFER descriptors by established models have lower accuracy compared to the values from experiments, in fact it is inevitable, these prediction models for LFER descriptors will support many researchers to study physicochemical behaviors of chemicals including even only theoretically existing molecules. They will be helpful to understand quantitatively molecular interactions of ions or neutral compounds or both cases in some specific environments at the same level.

Experimental section

Computational details and calculated sub-parameters

For structural optimizations of molecule descriptions, i.e. of neutral compounds and ions, DFT-calculations²⁴ were performed in combination with COSMO (COnductor Screening MOdel)²⁵ in the TURBOMOLE program package (version 5.10).²⁶ Resolution of identification²⁷ was carried out for the first reasonable starting structure of a single molecule at the BP86/TZVP level. In addition, AOFORCE^28,29 was used for calculating the vibrational frequencies of each molecule. These structures were refined using the TZVP basis set,³⁰ and were fully optimized using the COSMO calculation (ε_r = ∞). The fully optimized molecules were then translated to energy or ccf files and delivered to COSMO-RS (COSMO-Real Solvents).²⁵ Then COSMO-RS calculated the sub-parameters of optimized structures using BP_TZVP_C21_0108 parameterization. The calculated sub-parameters are given in ESI† 2 (Excel file).

The sub-parameters are polar surface area (PSA), COSMO-volume (V_C), energy of van der Waals (E_vdW), molar refraction (MR), sigma moments (σ_1–6), hydrogen-bond donor (HBD_1–4) and hydrogen-bond acceptor abilities (HBA_1–4). The sigma moments³¹ describe polarization of charge density on its molecule surface; σ₁: negative of total charge, σ₂: overall electrostatic polarity of a solute, σ₃: asymmetry of the sigma profile, σ_4–6: no simple physical interpretation; the σ₅ has an approximate linear relationship with σ₃, while σ₄ and σ₆ have asymmetry of the σ₃. The hydrogen bond moments are quantitative measurements of the acceptor and donor of the molecule, and they can be defined in the same way as sigma moments.³¹

Multiple linear regression analyses were carried out using the SPSS (12.0K for Windows) software. We used Sigma-plot (version 10.0) for statistical data analysis and plotting.

Collection of LFER descriptors

For the modelling, we collected LFER descriptors of 992 compounds i.e. 707 neutral compounds and 285 ionic compounds (114 cations and 171 anions) from the literature.^4–9 The descriptors for all the neutral compounds were experimentally measured values, while those of the ionic compounds were not only experimentally measured but also theoretically calculated values.^4–6,12,13 This ion dataset contains mainly monovalent cations and anions, while di-charged anions are seven.⁴

In the total dataset, it is observed that the S, A, B value ranges for ions are much greater than those for neutral compounds as shown in Table 6, while the E range of neutral compounds is slightly greater than that of ions.

Table 6 The value ranges of each descriptor (E, S, A, B, V, J⁺, and J⁻) of neutral and ionic compounds

	Neutral compounds	Ions
E	−0.10 to 4.07	−0.16 to 3.18
S	0.00–3.50	1.03–7.75
A	0.00–1.16	0.00–3.34
B	0.00–1.87	0.00–5.88
V	0.17–3.62	0.11–3.81
J⁺	0.00	−0.54 to 4.15
J⁻	0.00	0.63–4.44

Acknowledgements

This work was supported by the University Bremen (post doc fellowship) and the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) (2014R1A2A1A09007378, 2014R1A1A2008337). The authors acknowledge the use of the computing resources provided by the Black Forest Grid Initiative (BFG) (http://bfg.uni-freiburg.de).

Notes and references

M. H. Abraham, A. Ibrahim and A. M. Zissimos, J. Chromatogr. A, 2004, 1037, 29 CrossRef CAS PubMed.
C. F. Poole, J. Chromatogr. A, 2013, 1317, 85 CrossRef CAS PubMed.
M. H. Abraham and Y. H. Zhao, J. Org. Chem., 2004, 69, 4677 CrossRef CAS PubMed.
M. H. Abraham and W. E. Acree Jr, J. Org. Chem., 2010, 75, 1006 CrossRef CAS PubMed.
M. H. Abraham and W. E. Acree Jr, New J. Chem., 2010, 34, 2298 RSC.
M. H. Abraham and W. E. Acree Jr, J. Org. Chem., 2010, 75, 3021 CrossRef CAS PubMed.
M. H. Abraham and W. E. Acree Jr, Thermochim. Acta, 2011, 526, 22 CrossRef CAS.
T. W. Stephens, N. E. Rosa, M. Saifullah, S. Ye, V. Chou, A. N. Quay and W. E. Acree Jr, Fluid Phase Equilib., 2011, 309, 30 CrossRef CAS.
C.-W. Cho, S. Stolte, J. Ranke, U. Preiss, I. Krossing and J. Thöming, ChemPhysChem, 2014, 15, 2351 CrossRef CAS PubMed.
M. H. Abraham and W. E. Acree Jr, Phys. Chem. Chem. Phys., 2010, 12, 13182 RSC.
C.-W. Cho, U. Preiss, C. Jungnickel, S. Stolte, J. Arning, J. Ranke, A. Klamt, I. Krossing and J. Thöming, J. Phys. Chem. B, 2011, 115, 6040 CrossRef CAS PubMed.
M. H. Abraham and R. P. Austin, Eur. J. Med. Chem., 2012, 47, 202 CrossRef CAS PubMed.
K. Zhang, M. Chen, G. K. Scriba, M. H. Abraham, A. Fahr and X. Liu, J. Pharm. Sci., 2011, 100, 3105 CrossRef CAS PubMed.
T. Ghafourian and J. C. Dearden, J. Pharm. Pharmacol., 2000, 52, 603 CrossRef CAS PubMed.
O. Lamarche, J. A. Platts and A. Hersey, Phys. Chem. Chem. Phys., 2001, 3, 2747 RSC.
J. A. Platts, Phys. Chem. Chem. Phys., 2000, 2, 973 RSC.
J. A. Platts, Phys. Chem. Chem. Phys., 2000, 2, 3115 RSC.
J. A. Schwöbel, R. U. Ebert, R. Kühne and G. Schüürmann, J. Phys. Org. Chem., 2011, 24, 1072 CrossRef.
D. Svozil, J. G. K. Ševčík and V. Kvasnička, J. Chem. Inf. Comput. Sci., 1997, 37, 338 CrossRef CAS.
M. Zissimos, M. H. Abraham, A. Klamt, F. Eckert and J. A. Wood, J. Chem. Inf. Comput. Sci., 2002, 42, 1320 CrossRef PubMed.
J. M. Slattery, C. Daguenet, P. J. Dyson, T. J. Schubert and I. Krossing, Angew. Chem., 2007, 119, 5480 CrossRef.
Y. H. Zhao, M. H. Abraham and A. M. Zissimos, J. Org. Chem., 2003, 68, 7368 CrossRef CAS PubMed.
http://openbabel.org/wiki/Main_Page.
J. P. Perdew, Phys. Rev. B: Condens. Matter Mater. Phys., 1986, 33, 8822 CrossRef.
A. Klamt and G. Schüürmann, J. Chem. Soc., Perkin Trans. 2, 1993, 799 RSC.
TURBOMOLE V5.10 2008, a development of University of Karlsruhe, 1989-2007, TURBOMOLE GmbH, since 2007 available from http://www.turbomole.com.
F. Weigend and M. Häser, Theor. Chem. Acc., 1997, 97, 331 Search PubMed.
P. Deglmann and F. Furche, J. Chem. Phys., 2002, 117, 9535 CrossRef CAS.
P. Deglmann, F. Furche and R. Ahlrichs, Chem. Phys. Lett., 2002, 362, 511 CrossRef CAS.
A. Schäfer, C. Huber and R. Ahlrichs, J. Chem. Phys., 1994, 100, 5829 CrossRef.
F. Eckert, COSMOtherm reference manual, version C3.0, Release 15.01, Leverkusen, Germany, 1999–2014 Search PubMed.

Footnote

† Electronic supplementary information (ESI) available: Supporting information 1 gives following: a correlation between Abraham and calculated descriptors (A and B) according to eqn (5) and (6) (Fig. S1); a correlation between molecular refraction (MR) and OBPROP molecular refraction (MR_OBP) (Fig. S2) and its equation [eqn S1]; a correlation between Abraham E and calculated E by eqn (S2) (Fig. S3); calculated (Calc.) and literature (Obs.) solute descriptors of outliers (Table S1); calculated [by eqn (24)] and observed the transfer of compounds from water to propylene carbonate (Table S2). Calculated [by eqn (25)] and observed the transfer of compounds from water to sulfonate (Table S3). Calculated [by eqn (26)] and observed the transfer from water to ethylene glycol (Table S4). And supporting information 2 (Excel file) gives sub-parameters calculated from COSMO and OBPROP, and calculated & observed LFER descriptors of training and test set. See DOI: 10.1039/c5ra13595h

Click here to see how this site uses Cookies. View our privacy policy here.