I. E.
Kłosowska-Chomiczewska‡
^{a},
W.
Artichowicz‡
^{b},
U.
Preiss
^{c} and
C.
Jungnickel
*^{a}
^{a}Department of Colloid and Lipid Science, Faculty of Chemistry, Gdańsk University of Technology, Narutowicza St. 11/12, Gdańsk 80-233, Poland. E-mail: christian.jungnickel@pg.gda.pl; Tel: +48 58 347 2469
^{b}Department of Hydraulic Engineering, Faculty of Civil and Environmental Engineering, Gdańsk University of Technology, Narutowicza St. 11/12, Gdańsk 80-233, Poland
^{c}Interdisciplinary Centre for Advanced Materials Simulation (ICAMS), Ruhr-Universität Bochum, Universitätsstraße 150, Bochum 44780, Germany

Received
25th July 2017
, Accepted 6th September 2017

First published on 11th September 2017

We created a model to predict CMC of ILs based on 704 experimental values published in 43 publications since 2000. Our model was able to predict CMC of variety of ILs in binary or ternary system in a presence of salt or alcohol. The molecular volume of IL (V_{m}), solvent-accessible surface (Ŝ), solvation enthalpy (Δ_{solv}G^{∞}), concentration of salt (C_{s}) or alcohol (C_{a}) and their molecular volumes (V_{ms} and V_{ma}, respectively) were chosen as descriptors, and Kernel Support Vector Machine (KSVM) and Evolutionary Algorithm (EA) as regression methodologies to create the models. Data was split into training and validation set (80/20) and subjected to bootstrap aggregation. KSVM provided better fit with average R^{2} of 0.843, and MSE of 0.608, whereas EA resulted in R^{2} of 0.794 and MSE of 0.973. From the sensitivity analysis it was shown that V_{m} and Ŝ have the highest impact on ILs micellization in both binary and ternary systems, however surprisingly in the presence of alcohol the V_{m} becomes insignificant/irrelevant. Micelle stabilizing or destabilizing influence of the descriptors depends upon the additives. Previous attempts at modelling the CMC of ILs was generally limited to small number of ILs in simplified (binary) systems. We however showed successful prediction of the CMC over a range of different systems (binary and ternary).

The cation and anion combinations result in a myriad^{2,3} of possible compounds. This results in the challenge of finding the correct compound for the correct application. To solve the issue accurate prediction of the physicochemical properties of ILs is required. A number of attempts have been published to design ILs with a specified melting point,^{4–7} solubility,^{8,9} surface composition,^{10} surface tension,^{11} heat capacity,^{5,12} cloud point,^{13,14} density,^{12,15,16} viscosity,^{15–18} conductivity,^{16,18} and hydrophobicity.^{9,19} In addition, to reduce environmental impact a number of parameters such as toxicity,^{20–30} biodegradation,^{31} and soil sorption^{32–34} have been predicted.

One of the phenomenological parameters which is often chosen is the critical micelle concentration (CMC). This parameter provides information about a wide variety of other properties such as molar solubilization ratio, sorption and toxicity. In addition, the CMC is an easily measured, and often cited parameter, which makes it a perfect target value for prediction.

Previous attempts were made to model and predict the CMC of ionic surfactants and ILs. Barycki et al.^{35} created a model to predict CMC based on a dataset of only 59 ILs in 2016. The three descriptors chosen were based on the molecular GEometry, Topology, and Atom-Weights AssemblY (GETAWAY) descriptors, which resulted in an R^{2} of 0.959. The authors have incorrectly stated that their attempt is the first to predict the CMC of ionic liquids, when in fact we did this already in 2009. In that publication we developed a model based on molecular volume, solvent accessible surface area, as well as various interaction enthalpies, determined by COSMO-RS using 36 ILs, and a resulting R^{2} of 0.994.^{36} Vishnyakov et al. presented a model to predict CMC of non-ionic surfactants in binary solutions using dissipative particle dynamics simulations.^{37} Kardanpour et al. developed a model to predict CMC of gemini surfactants. The data set included 94 CMC values of gemini surfactants using topological, geometrical, functional group and WHIM descriptors. In the final equation created using a wavelet neural network 12 descriptors were used in optimized model and the highest R^{2} was 0.994.^{38} Jalali-Heravi et al. used multiple linear regression to model CMC of cationic surfactants based on 30 literature CMC values for alkyltrimethylammonium and alkylpyridinium salts, using topological (Balaban and Randic indices), electronic (total energy of the molecules) and molecular structure descriptors (volume of the tail of the molecule, maximum distance between the atoms, and surface area) and a stepwise regression method. The highest R^{2} of the models after cross-validation was 0.955.^{39} Another attempt in which Roy and Kabir developed a model to predict CMC of non-ionic surfactants in aqueous solutions based on 54 CMC values with use of extended topochemical atom (ETA) and non-ETA indices as descriptors, and stepwise multiple linear regression (MLR), genetic function approximation (GFA) and partial least squares (PLS) as chemometric tools. The PLS allowed to avoid inter-correlation among the descriptors. The coefficient of determination R^{2} after external validation for the best ETA + non-ETA-PLS was 0.986.^{40} Huibers et al. created a model to predict CMC of anionic surfactants (sodium alkyl sulfates and sodium sulfonates) based on 119 literature values. An R^{2} of 0.942 was obtained for a multiple linear model with three descriptors based on molecular topology and constitution.^{41} Katritzky et al. generated a CMC prediction model for 50 cationic surfactants (35 quaternary ammonium salts and 15 quaternary pyridinium salts) using molecular descriptors (related to the size and charge of the hydrophobic tail and to the size of the head). They used best multilinear regression and heuristic algorithm to determine the best multilinear models (mean R^{2} after cross-validation was 0.978), and a nonlinear artificial neural network to develop nonlinear regression models (mean R^{2} after cross-validation was 0.979).^{42} Huibers et al. used three topological descriptors (size of the hydrophobic group, the size of the hydrophilic group, and the structural complexity of the hydrophobic group) to create a model to predict the CMC based on values for 77 nonionic surfactants. Multiple linear regression analyses carried out with the heuristic algorithm resulted in R^{2} of 0.984.^{43} In 2007, Gad used structural, topological and thermodynamic descriptors (namely molecular weight, hydrophobic–hydrophilic fragments molecular weight ratio, polarizability, logP, energy of hydration, surface area, and dipole moment) to create a model to predict the CMC based on 50 CMC values for nonionic surfactants. The models were created with principal component analysis (PCA) and multiple linear regression technique (MLR) with an R^{2} of 0.9889.^{44} Yuan et al. used four electronic, spatial and thermodynamic descriptors to create a model to predict CMC based on 37 literature values for nonionic surfactants. As a chemometric tools the authors used stepwise multiple linear regression analysis, multiple simple linear model analysis, multiple linear regression, and genetic function approximation analysis giving R^{2} of 0.990.^{45} They also created a similar model based on 37 anionic surfactants, and obtained R^{2} of 0.996.^{46}

All these models have the disadvantage of being created with limited data, usually with no more than 100 data points. They usually rely on various single software solutions to calculate a plethora of descriptors of which the best subset which has the most predictive value is eventually chosen. In addition they have a limited applicability because they always refer to binary solutions (i.e. IL + water).

Therefore, for the first time in 2013 we (Preiss and Jungnickel) extended the models to also include the effects of, in this case, salts. These ternary systems were effectively modelled with the R^{2} of 0.859 with 151 data points.^{47} However, we were still using descriptors based on quantum chemical calculations. In the paper of Cho et al., we (Preiss and Jungnickel) used a poly parameter linear free energy relationship using the Abraham equation. The approach here is simpler as opposed to the previous works, because it does not require any quantum/chemical calculations. The prediction had an R^{2} of 0.994^{9} for the IL/water binary system. The disadvantage however is the necessity of determining the Abraham descriptors experimentally for each compound.

The prediction of data of either binary or various ternary systems is easier, because in each system, the descriptors are responsible for generally one effect. However, in systems in which a multitude of components (salt or alcohol) are present, the magnitude of impact of a descriptor may change. Therefore, the aim of this paper was to compare several approaches to predict the CMC of a large set of ILs (704), over a wide range of conditions. Here we present a complex, multi-methodological approach to predict the CMC of ILs in binary (IL and water) or ternary systems (IL and water, with monovalent inorganic salt or alcohol), as shown in Fig. 1. We aim to show several models that will allow for the prediction of the CMC based on an ILs molecular volume (V_{m}), solvent-accessible surface (Ŝ), solvation enthalpy (at infinite dilution) (Δ_{solv}G^{∞}), and temperature. For ternary systems additional input variables are taken into account, namely concentration of salt (C_{s}) or alcohol (C_{a}) and their molecular volumes (V_{m,s} and V_{m,a}, respectively).

Based on our previous experience,^{36,47} the following descriptors were considered: molecular volume of IL (V_{m}, as a sum of the anion and cation, or separately), solvent-accessible surface (Ŝ, as a sum of the anion and cation, or separately), solvation enthalpy (Δ_{solv}G^{∞}), concentration of salt (C_{s}) or alcohol (C_{a}) and their molecular volumes (V_{m,s} and V_{m,a}, respectively). The variables CMC, C_{s}, C_{a} and T were taken from the papers, while other input variables were calculated for the purpose of this paper. The range of each of the parameters used for modelling is presented in Table 1. The complete data is given in Table S1 (ESI†).

Parameter | CMC, mM |
V
_{m}, nm^{3} |
Ŝ, nm^{2} |
Δ_{solv}G^{∞}, kJ mol^{−1} |
C
_{s}, mM |
V
_{ms}, nm^{3} |
C
_{a}, mM |
V
_{ma}, nm^{3} |
T, K |
---|---|---|---|---|---|---|---|---|---|

Min. value | 0.01 | 199.04 | 221.36 | −710.65 | 0.10 | 100.15 | 20.49 | 69.12 | 278.15 |

Max. value | 2200 | 1150.50 | 1004.07 | −324.14 | 1000.00 | 229.22 | 1085.34 | 134.96 | 328.15 |

Mean value | 32.01 | 461.46 | 453.26 | −531.24 | 190.28 | 142.52 | 203.99 | 93.84 | 300.08 |

SD | 120.53 | 114.14 | 96.18 | 60.64 | 230.85 | 35.02 | 160.41 | 25.80 | 5.84 |

The program Molconvert was used for conversion between names and chemical structures.^{49} To obtain a reasonable initial number of descriptors, each molecule was optimized in the gas phase with MOPAC2016^{50} using PM6-DH+ and the PRECISE keyword.^{51,52} A vibrational analysis was performed to ensure the absence of a transition state.^{53} To retain consistency with comparable prediction models,^{54,55} a COSMO geometry optimization in the virtually ideal electrical conductor (ε_{r} = 999) using the same method, but not taking molecular symmetry into account, was then appended;^{56} the solvent-accessible (COSMO) surface Ŝ, the molecular volume V_{m} and the free solvation enthalpy in an ideal electric conductor Δ_{solv}G^{∞} were taken directly from the final output.

The EA generates clusters of equations for a given target expression using modified evolutionary algorithm.^{58} The software was calculating for each target expression for a minimum of 5 × 10^{9} generations, attempting to find an equation with optimal complexity and coefficient of determination R^{2} in each generation. Due to extensive data set, 80% of the data was used for training and 20% for validation. In order to provide interpretability of the regression model only basic mathematical operators were chosen for computation (addition, subtraction, multiplication). The equations that provided the highest R^{2} were recorded. The end point of the calculations for EA was consistently set to 150000 generations for all sets and systems, since after that period no significant improvements in R^{2} was observed, as shown in Fig. 3.

KSVM handles multidimensional non-linear relationships very well due to the usage of the feature space mapping. This proceeding creates a new feature space in which the non-linear relationships may become linear or close to such. Formally mapping is done with a function Φ(x_{a},x_{b}). However, due to the usage of the “kernel trick” and Lagrange multipliers method the resulting KSVM formulation does not require the knowledge of the mapping function explicitly but its scalar product only k(x_{a},x_{b}) = Φ(x_{a},x_{b})·Φ(x_{a},x_{b}). Thus the mapping is performed by means of the abovementioned scalar product.^{64} The detailed discussion of kernel functions in KSVM can be found for example in Hoffmann et al.^{65} By default the Gaussian kernel k(x_{a},x_{b}) = exp(−γ|x_{a} − x_{b}|) is usually used, in which γ is the kernel parameter.

The idea of KSVM regression is to find such function f(x) which deviates at most by some arbitrarily chosen value ε from the data provided as a training set (Fig. 2). Additionally the sought function is supposed to be as flat as possible. From practical point of view such property delivers the robustness against perturbations present in the observed data. However function which is flat enough and does not exceed the allowed error tolerance may not exist. To overcome this problem it is purposeful to allow some deviations exceeding the assumed level ε and to penalize them. For this purpose slack variables ξ and ξ* are introduced along with the concept of a soft margin loss function.^{66} Such situation is depicted in Fig. 2. Slack variables are not present in the resulting optimization problem explicitly due to application of the Lagrange multipliers method for solution of the arising problem.

The sought regression function expressed by means of the Lagrange multipliers is obtained as the result of solution of the following optimization problem:^{64}

(1) |

The regression function has the following form

(2) |

Vectors α and α* are the solution of the eqn (1). If one of the values α_{i},α_{i}* lies between 0 and C, then the corresponding point is a support vector, that is a point in a training dataset at which the margin is built. Example support vectors were highlighted with red colour and displayed in Fig. 4.

Fig. 4 The concept of the regression function, the margin of tolerance (ε), slack variables (ξ) and support vectors (red). |

For practical use of KSVM regression a kernel has to be chosen and its parameters provided. Moreover, it is necessary to provide the regularization coefficient value C and width of the error tolerance margin ε. There are many proposed methods for parameter selection, however to the authors' knowledge none of them is suitable to all possible applications of KSVM. There are two most popular approaches. First one is parameter determination by optimization algorithms like grid search or other optimization techniques. However a drawback of such proceeding is that it can lead to model overfitting which results in bad prediction accuracy, and it requires solution of additional optimization problem with respect to at least three variables – kernel parameter, ε and C, which often is very time consuming. In the considered case of the CMC regression this approach did not turn out to provide acceptable results. The second approach is to choose parameters arbitrarily using hints based on the data. Examples of such approach are given for example by Cherkassky and Mulier^{67} and Cherkassky et al.^{68}

In this work LIBSVM implementation of KSVM was used.^{69} The kernel of choice was a Gaussian one as this is usually the best choice for non-linear data.^{70} The parameters were chosen arbitrarily as suggested in ref. 69 and 70. The kernel parameter was set to the value of reciprocal of descriptors number γ = 1/n which is the default value suggested by the LIBSVM developers as suitable for most cases. The value of regularization parameter was chosen according to ref. 68, who suggest to use the range of the dependent variable y as the value of C. The error margin tolerance was set to ε = 0.1 which is a default value used in LIBSVM.

The primary aim of the paper, to determine if a global model can be applied to predict the CMC in a variety of complex systems can be answered simply affirmative. An exemplary result is shown in Fig. 5. Both regression methods of obtaining a predictive model have shown a satisfactory success. The summary of the results of all data sets is presented in Table 2.

The KSVM also gave a better fit with mean coefficient of determination equal to 0.843, whereas evolutionary algorithm (EA) resulted in of 0.794. The difference in the obtained results can be explained on the basis of the mathematical foundations of the applied models. KSVM unfolds nonlinear relationships hidden in the data and handles the problem in a holistic way. Unlike EA models, the KSVM model does not fit itself to the data (in terms of mathematical expression), but it has generalized form which is designed to handle such problems. Therefore KSVM regression models are less vulnerable to overfitting issues than EA models. This provides a very robust tool for regression which has the ability to give accurate prediction in case of previously unseen data.

When comparing these results with the models of CMCs of ILs published before, the fit of our model is satisfactory (R^{2} of 0.843 and 0.794 for KSVM and EA, respectively), but lower than those found in literature (R^{2} of 0.942–0.996).^{35–46} This is due to diversity of ILs (as shown in Table 1) taken into account as compared to other attempts to predict CMC where authors usually create models based on small samples,^{35–46} and focus on a very specific sub-group of surfactants, e.g. only alkyltrimethylammonium and alkylpyridinium salts^{39} or quaternary ammonium salts and quaternary pyridinium salts^{42} for cationic surfactants, only sodium alkyl sulfates and sodium sulfonates for anionic surfactants.^{41} Therefore, with such limited diversity in data, such models will generally produce a better fit since the molecules are already similar. In addition, our models are the first to predict the CMC over a range of different systems (binary and ternary), whereas all previous models focus only on binary systems. When analyzing the sensitivity of each descriptor on the final result similarities between KSVM and EA may be observed as shown in Fig. 6.

Fig. 6 Comparison of sensitivity toward different variables for models created with evolutionary algorithm (EA) and kernel support vector machine (KSVM). The method of calculation of sensitivity (A) and % positive/negative influence (B) is described by Kłosowska-Chomiczewska et al.^{71} It can be seen that in both approaches V_{m} and S dominate effect, with the V_{m} being mostly positive (for EA), and S always negative. However, no clear attributable effect may be observed. Error bars represent 95% CI. |

The surface area (Ŝ) has in each case the highest and always negative impact on the CMC, in both models. This may be interpreted as the higher the surface area, the larger the water cage surrounding the molecule, and thus the higher entropic penalty for each molecule, and thus the lower the CMC. For the V_{m} no clear positive or negative effect may be observed that is true for both EA and KSVM. This is due to various effects that each descriptor has in each system. Since this model is a summation of all systems, the sensitivity of the descriptors represents an average of all the effects. When comparing EA and KSVM results, the sensitivity of similar magnitude, and follows similar trends, which shows that using either numerical method the descriptors have similar effects on the CMC, independent of the numerical path.

To determine the effect of each descriptor in each system (IL + water, IL + water + salt, and IL + water + alcohol), we repeated the EA model for the individual systems. This therefore, will shed light on effect on dominant mechanisms responsible for micellization of ILs. The results of this analysis are displayed in Fig. 7. It should be noted that in this case, not only the optimal solution was taken into account, but rather the top three solutions of the EA.

From the sensitivity analysis (Fig. 7A and B) we may elucidate the dominant mechanisms responsible for micellization in every system. In the IL + water system, V_{m} and Ŝ have the highest and mostly negative impact on the CMC, whereas the influence of Δ_{solv}G^{∞} is minor and usually positive. Therefore, we may conclude that stabilization of micelles is mostly due to chain/chain interactions (−36% V_{m}) and avoidance of hydration of IL molecules (−91% Ŝ). At the same time V_{m} contributes to some destabilization of the micellization process, namely by the steric hindrance (+64% V_{m}) (these are largely the long chained ionic liquids). Additionally, the process is hindered by the interaction of ILs with water described by Δ_{solv}G^{∞}, where its effect is positive (+93% Δ_{solv}G^{∞}), which corresponds to the description given by Varfolomeev.^{74}

The addition of salt to the system has a moderately strong (Fig. 7A) but completely negative effect on the CMC (−100% C_{s}, Fig. 7B), which follows well with earlier descriptions of the effect of salt on the CMC.^{75} Moreover, it dramatically changes the influence of V_{m}, where in the presence of salt the stabilizing effect of the chain/chain interactions is no longer prominent in this system, but instead the charge shielding of the salt dominates. Which means that the strongly positive influence of V_{m} on CMC relates now to the steric hindrance; that is, the bigger the molecule, the more difficult it is to fit into micelle, therefore the higher the CMC. Parallel, the effect of Ŝ in the presence of salt becomes more pronounced (Ŝ sensitivity 2.71%, Fig. 7A), but completely negative, while the influence of Δ_{solv}G^{∞} remains similar, however less positive (+86% Δ_{solv}G^{∞}), with the same justification as for the IL + water system.

The most interesting was the effect of alcohol on the micellization of ILs. The V_{m} term is smaller, indicating that the influence of chain/chain interactions and steric hindrance are less relevant for the process. At the same time Ŝ and Δ_{solv}G^{∞} influence shifted to completely negative. Especially the latter one is interesting, as now in the system strong interactions between ILs molecules and solvent no longer destabilize micelles. This can be explained by alcohol acting as cosurfactant, incorporating between IL molecules in the micelles, changing the curvature of micelles,^{76–78} and therefore making formation of aggregates easier.

These mirror the common understanding of the effects of salts and alcohols on surfactant micellization.^{79–82} That these effects are so visible in the equations highlights that the models not only allow for the prediction of CMC, but also to provide an insight into the underlying mechanism of micellization.

Finally, in order to prove the robustness of our calculations we aimed at validating the model for both ternary systems (IL + water + salt and IL + water + alcohol) on the data for less complicated binary system (IL + water). That is, all ternary data was used as a training set, and the binary data was used as a validation set. This time EA performed better giving R^{2} of 0.566, while the R^{2} obtained with KSVM was 0.566 (as shown in Fig. 8A). However, both fits are considered as satisfactory.^{83}

The justification for this type of calculation is that the contribution of each of the descriptors of ILs (V_{m}, Ŝ and Δ_{solv}G^{∞}), should have the same effect on the CMC, with or without the additives of salt and alcohol. In essence, this experiment is analogous to taking a 3D plane, and projecting it onto a 2D surface. The essence of the curvature should be maintained, and the minimum that the methodologies find, should also be similar. Using the EA we can see a relatively good fit of the binary data using a model trained on ternary data. The reduced R^{2} compared to the overall model shown in Fig. 3 is due to the fact that the ternary data was much more scarce, and less training data was used (N = 229 for ternary data training, and N = 475 for binary data validation, as compared to N = 563 for training and N = 141 for validation for the overall model). To improve the fit, it was attempted to “uncouple” the ions. That is, the ionic descriptors were taken not as a sum of the cation and anion, but instead the cationic and anionic contribution were modelled separately. As can be seen in Fig. 6B, the effect of coupling or decoupling of the ions of the ionic liquids have some influence of the quality of fit with EA, and have smaller influence on the coefficient of determination for KSVM. This is expected, since the fitting of the binary data is in effect a reduction in complexity of the ternary system, and the salt, or alcohols terms with the EA, would simply cancel or be set to zero, and thus the only remaining terms in the equations are those of the binary system. In the case of KSVM this is not the same as feature space mapping is used. In such a case if some variable is included in the learning set but all values are equal to zero the space is not reduced, but it is assigned to some region in mapped feature space. Thus when in unseen data a value different from zero of such descriptor appears the projection can be invalid, as the mapping for such value was not explicitly created. Therefore a prediction can have very poor accuracy. In such case EA models perform better as they have better extrapolation abilities unlike KSVM. However some reports on extension of KSVM models to such applications are present. The fact that both KSVM and EA regression approaches were capable of predicting these effects correctly indicates the robustness of the descriptors, highlighting therefore that the projection of multidimensional solution to a less dimensional space is successful as well.

- J. Ranke, S. Stolte, R. Störmann, J. Arning and B. Jastorff, Chem. Rev., 2007, 107, 2183–2206 CrossRef CAS PubMed.
- N. Canter, Tribol. Lubr. Technol., 2005, 61, 15 Search PubMed.
- R. D. Rogers and K. R. Seddon, Science, 2003, 302, 792–793 CrossRef PubMed.
- D. M. Eike, J. F. Brennecke and E. J. Maginn, Green Chem., 2003, 5, 323–328 RSC.
- C. P. Fredlake, J. M. Crosthwaite, D. G. Hert, S. N. Aki and J. F. Brennecke, J. Chem. Eng. Data, 2004, 49, 954–964 CrossRef CAS.
- A. R. Katritzky, A. Lomaka, R. Petrukhin, R. Jain, M. Karelson, A. E. Visser and R. D. Rogers, J. Chem. Inf. Comput. Sci., 2002, 42, 71–74 CrossRef CAS PubMed.
- C. Yan, M. Han, H. Wan and G. Guan, Fluid Phase Equilib., 2010, 292, 104–109 CrossRef CAS.
- M. G. Freire, C. M. Neves, S. P. Ventura, M. J. Pratas, I. M. Marrucho, J. Oliveira, J. A. Coutinho and A. M. Fernandes, Fluid Phase Equilib., 2010, 294, 234–240 CrossRef CAS.
- C.-W. Cho, U. Preiss, C. Jungnickel, S. Stolte, J. Arning, J. Ranke, A. Klamt, I. Krossing and J. Thöming, J. Phys. Chem. B, 2011, 115, 6040–6050 CrossRef CAS PubMed.
- C. Kolbeck, T. Cremer, K. Lovelock, N. Paape, P. Schulz, P. Wasserscheid, F. Maier and H.-P. Steinruck, J. Phys. Chem. B, 2009, 113, 8682–8688 CrossRef CAS PubMed.
- R. L. Gardas and J. A. Coutinho, Fluid Phase Equilib., 2008, 265, 57–65 CrossRef CAS.
- U. P. Preiss, J. M. Slattery and I. Krossing, Ind. Eng. Chem. Res., 2009, 48, 2290–2296 CrossRef CAS.
- P. D. Huibers, D. O. Shah and A. R. Katritzky, J. Colloid Interface Sci., 1997, 193, 132–136 CrossRef CAS PubMed.
- Y. Ren, H. Liu, X. Yao, M. Liu, Z. Hu and B. Fan, J. Colloid Interface Sci., 2006, 302, 669–672 CrossRef CAS PubMed.
- J. Jacquemin, P. Husson, A. A. Padua and V. Majer, Green Chem., 2006, 8, 172–180 RSC.
- J. M. Slattery, C. Daguenet, P. J. Dyson, T. J. Schubert and I. Krossing, Angew. Chem., 2007, 119, 5480–5484 CrossRef.
- G. Yu, D. Zhao, L. Wen, S. Yang and X. Chen, AIChE J., 2012, 58, 2885–2899 CrossRef CAS.
- K. Tochigi and H. Yamamoto, J. Phys. Chem. C, 2007, 111, 15989–15994 CAS.
- C.-W. Cho, J. Ranke, J. Arning, J. Thöming, U. Preiss, C. Jungnickel, M. Diedenhofen, I. Krossing and S. Stolte, SAR QSAR Environ. Res., 2013, 24, 863–882 CrossRef CAS PubMed.
- Y. Zhao, J. Zhao, Y. Huang, Q. Zhou, X. Zhang and S. Zhang, J. Hazard. Mater., 2014, 278, 320–329 CrossRef CAS PubMed.
- M. I. Hossain, B. B. Samir, M. El-Harbawi, A. N. Masri, M. A. Mutalib, G. Hefter and C.-Y. Yin, Chemosphere, 2011, 85, 990–994 CrossRef PubMed.
- B. Peric, J. Sierra, E. Martí, R. Cruañas and M. A. Garau, Ecotoxicol. Environ. Saf., 2015, 115, 257–262 CrossRef CAS PubMed.
- C.-W. Cho, J.-S. Park, S. Stolte and Y.-S. Yun, J. Hazard. Mater., 2016, 311, 168–175 CrossRef CAS PubMed.
- D. J. Couling, R. J. Bernot, K. M. Docherty, J. K. Dixon and E. J. Maginn, Green Chem., 2006, 8, 82–90 RSC.
- F. Yan, S. Xia, Q. Wang and P. Ma, J. Chem. Eng. Data, 2012, 57, 2252–2257 CrossRef CAS.
- K. Roy, R. N. Das and P. L. Popelier, Chemosphere, 2014, 112, 120–127 CrossRef CAS PubMed.
- S. Bruzzone, C. Chiappe, S. Focardi, C. Pretti and M. Renzi, Chem. Eng. J., 2011, 175, 17–23 CrossRef CAS.
- J. S. Torrecilla, J. Palomar, J. Lemus and F. Rodríguez, Green Chem., 2010, 12, 123–134 RSC.
- F. Yan, Q. Shang, S. Xia, Q. Wang and P. Ma, J. Hazard. Mater., 2015, 286, 410–415 CrossRef CAS PubMed.
- F. Yan, S. Xia, Q. Wang and P. Ma, Ind. Eng. Chem. Res., 2012, 51, 13897–13901 CrossRef CAS.
- Y. Yu, X. Lu, Q. Zhou, K. Dong, H. Yao and S. Zhang, Chem. – Eur. J., 2008, 14, 11174–11182 CrossRef CAS PubMed.
- W. Mrozik, C. Jungnickel, T. Ciborowski, W. R. Pitner and P. Stepnowski, Gdansk, Poland, in 5th International Conference on Oils & Fuels for Sustainable Development, AUZO 2008, ed. J. Hupka, A. Tonderski, R. Aranowski and C. Jungnickel, Gdansk, Poland, 2008 Search PubMed.
- W. Mrozik, C. Jungnickel, T. Ciborowski, W. R. Pitner, J. Kumirska, Z. Kaczyński and P. Stepnowski, J. Soils Sediments, 2009, 9, 237–245 CrossRef CAS.
- W. Mrozik, J. Nichthauser and P. Stepnowski, Pol. J. Environ. Stud., 2008, 17, 383–388 CAS.
- M. Barycki, A. Sosnowska and T. Puzyn, J. Colloid Interface Sci., 2017, 487, 475–483 CrossRef CAS PubMed.
- U. Preiss, C. Jungnickel, J. Thöming, I. Krossing, J. Łuczak, M. Diedenhofen and A. Klamt, Chem. – Eur. J., 2009, 15, 8880–8885 CrossRef CAS PubMed.
- A. Vishnyakov, M.-T. Lee and A. V. Neimark, J. Phys. Chem. Lett., 2013, 4, 797–802 CrossRef CAS PubMed.
- Z. Kardanpour, B. Hemmateenejad and T. Khayamian, Anal. Chim. Acta, 2005, 531, 285–291 CrossRef CAS.
- M. Jalali-Heravi and E. Konouz, J. Surfactants Deterg., 2003, 6, 25–30 CrossRef CAS.
- K. Roy and H. Kabir, Chem. Eng. Sci., 2012, 73, 86–98 CrossRef CAS.
- P. D. Huibers, V. S. Lobanov, A. Katritzky, D. Shah and M. Karelson, J. Colloid Interface Sci., 1997, 187, 113–120 CrossRef CAS PubMed.
- A. R. Katritzky, L. M. Pacureanu, S. H. Slavov, D. A. Dobchev, D. O. Shah and M. Karelson, Comput. Chem. Eng., 2009, 33, 321–332 CrossRef CAS.
- P. D. Huibers, V. S. Lobanov, A. R. Katritzky, D. O. Shah and M. Karelson, Langmuir, 1996, 12, 1462–1470 CrossRef CAS.
- E. A. Mahmoud Gad, J. Dispersion Sci. Technol., 2007, 28, 231–237 CrossRef.
- S. Yuan, Z. Cai, G. Xu and Y. Jiang, Colloid Polym. Sci., 2002, 280, 630–636 CAS.
- S. Yuan, Z. Cai, G. Xu and Y. Jiang, J. Dispersion Sci. Technol., 2002, 23, 465–472 CrossRef CAS.
- U. P. Preiss, P. Eiden, J. Łuczak and C. Jungnickel, J. Colloid Interface Sci., 2013, 412, 13–16 CrossRef CAS PubMed.
- J. Brophy and D. Bawden, Aslib Proceedings, Emerald Group Publishing Limited, 2005, vol. 57, p. 498.
- C. v.5.11.4, Molecule File Converter Molconvert, http://www.chemaxon.com, accessed 2017-04-23.
- J. J. Stewart, MOPAC2016, Stewart Computational Chemistry, 2016.
- J. J. Stewart, J. Mol. Model., 2007, 13, 1173–1213 CrossRef CAS PubMed.
- M. Korth, J. Chem. Theory Comput., 2010, 6, 3808–3816 CrossRef CAS.
- M. J. Dewar and G. P. Ford, J. Am. Chem. Soc., 1977, 99, 7822–7829 CrossRef CAS.
- W. Beichel, U. P. Preiss, S. P. Verevkin, T. Koslowski and I. Krossing, J. Mol. Liq., 2014, 192, 3–8 CrossRef CAS.
- U. P. Preiss and M. I. Saleh, J. Pharm. Sci., 2013, 102, 1970–1980 CrossRef CAS PubMed.
- A. Klamt and G. Schüürmann, J. Chem. Soc., Perkin Trans. 2, 1993, 799–805 RSC.
- R. Bini, C. Chiappe, C. Duce, A. Micheli, R. Solaro, A. Starita and M. R. Tiné, Green Chem., 2008, 10, 306–309 RSC.
- V. Aryadoust, Psychol. Test Assess. Model., 2015, 57, 301 Search PubMed.
- C. Cortes and V. Vapnik, Mach. Learn., 1995, 20, 273–297 Search PubMed.
- V. Vapnik, S. E. Golowich and A. Smola, Advances in neural information processing systems, 1997, 281–287 Search PubMed.
- B. E. Boser, I. M. Guyon and V. N. Vapnik, Proceedings of the fifth annual workshop on Computational learning theory, ACM, 1992, p. 144.
- B. Schölkopf and H. A. Mallot, Adaptive Behavior, 1995, 3, 311–348 CrossRef.
- B. Schölkopf, C. Burges and V. Vapnik, Artificial Neural Networks—ICANN 96, 1996, pp. 47–52 Search PubMed.
- V. N. Vapnik and V. Vapnik, Statistical learning theory, Wiley, New York, 1998 Search PubMed.
- T. Hofmann, B. Schölkopf and A. J. Smola, Ann. Stat., 2008, 1171–1220 CrossRef.
- B. Schölkopf, A. J. Smola, R. C. Williamson and P. L. Bartlett, Neural Comput., 2000, 12, 1207–1245 CrossRef.
- V. Cherkassky and F. Mulier, 1998.
- V. Cherkassky and Y. Ma, Artificial Neural Networks—ICANN 2002, 2002, p. 82 Search PubMed.
- C.-C. Chang and C.-J. Lin, ACM Transactions on Intelligent Systems and Technology (TIST), 2011, vol. 2, p. 27 Search PubMed.
- G. C. Cawley and N. L. Talbot, J. Mach. Learn. Res., 2010, 11, 2079–2107 Search PubMed.
- I. Kłosowska-Chomiczewska, K. Mędrzycka, E. Hallmann, E. Karpenko, T. Pokynbroda, A. Macierzanka and C. Jungnickel, J. Colloid Interface Sci., 2017, 488, 10–19 CrossRef PubMed.
- V. N. Vapnik and S. Kotz, Estimation of dependences based on empirical data, Springer-Verlag, New York, 1982 Search PubMed.
- V. Vapnik, Nonlinear Modeling, Springer, 1998, pp. 55–85 Search PubMed.
- M. A. Varfolomeev, A. A. Khachatrian, B. S. Akhmadeev, B. N. Solomonov, A. V. Yermalayeu and S. P. Verevkin, J. Solution Chem., 2015, 44, 811–823 CrossRef CAS.
- U. P. Preiss, P. Eiden, J. Łuczak and C. Jungnickel, J. Colloid Interface Sci., 2013, 412, 13–16 CrossRef CAS PubMed.
- C. Rodriguez-Abreu, K. Aramaki, Y. Tanaka, M. A. Lopez-Quintela, M. Ishitobi and H. Kunieda, J. Colloid Interface Sci., 2005, 291, 560–569 CrossRef CAS PubMed.
- S. Chen, D. F. Evans, B. Ninham, D. Mitchell, F. D. Blum and S. Pickup, J. Phys. Chem., 1986, 90, 842–847 CrossRef CAS.
- W. M. Gelbart, W. E. McMullen, A. Masters and A. Ben-Shaul, Langmuir, 1985, 1, 101–103 CrossRef CAS.
- H. Heerklotz and R. M. Epand, Biophys. J., 2001, 80, 271–279 CrossRef CAS PubMed.
- E. Dutkiewicz and A. Jakubowska, Colloid Polym. Sci., 2002, 280, 1009–1014 CAS.
- S. R. Raghavan, G. Fritz and E. W. Kaler, Langmuir, 2002, 18, 3797–3803 CrossRef CAS.
- S. C. Owen, D. P. Chan and M. S. Shoichet, Nano Today, 2012, 7, 53–65 CrossRef CAS.
- A. L. Edwards, The correlation coefficient, An Introduction to Linear Regression and Correlation, 1976, vol. 4, pp. 33–46 Search PubMed.

## Footnotes |

† Electronic supplementary information (ESI) available: Complete dataset of ILs used for prediction, comparison of data with and without bootstrap aggregation, and EA equations. See DOI: 10.1039/c7cp05019d |

‡ Both authors are equal contributing first authors. |

This journal is © the Owner Societies 2017 |