Rudra Narayan
Das
and
Kunal
Roy
*
Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India. E-mail: kunalroy_in@yahoo.com; kroy@pharma.jdvu.ac.in; Web: http://sites.google.com/site/kunalroyindia/ Fax: +91-33-2837-1078; Tel: +91 98315 94140
First published on 25th July 2012
Ionic liquids (ILs) have important industrial applications due to their unconventional and encouraging properties making them unique chemical species with extremely lowered vapour pressure, high thermal stability, and enhanced solvation characteristics. They are considered as “green solvents” chiefly due to their task-specificity and minimal release into the environment. The assessment of toxicity of ILs to living ecosystems has received considerable attention in recent years. Development of predictive quantitative structure–toxicity relationship (QSTR) models for ionic liquids can help in designing derivatives with a reduced toxicity profile, thereby making them greener and eco-friendlier. The present study attempts to develop a classification model as well as a regression model to capture specific structural information of ionic liquids responsible for their toxic manifestation to Vibrio fischeri. The models were developed using various two-dimensional chemical descriptors along with dummy variables and subjected to rigorous statistical validation employing multiple strategies, whereas other models of ILs for Vibrio fischeri toxicity reported so far have not employed such strict tools. The classification model has been characterized by acceptable Wilk's λ statistics, pharmacological distribution diagram assessment, and receiver operating characteristics (ROC) analysis parameters. The regression models have been judged according to the OECD guidelines, and the best model showed encouraging external predictivity (R2pred = 0.739). The toxicity of ionic liquids to V. fischeri was found to be related to branching, molecular size, and solvation entropy of cations along with a lipophilicity contribution of the anions.
Although ionic liquids potentially correspond to environmental acceptability, there is still a dearth of specific knowledge regarding their behaviour and interaction with the living ecosystem. Apart from the advantages obtained through reduced vapour pressure, the ionic liquids must be subjected to special consideration regarding a sustainable synthesis13i.e., maximum utilization of reaction material (green chemistry principle 2), lowered toxicity profile and harmlessness (green chemistry principle 3), and restricted environmental diligence i.e., easy breakdown into harmless products (green chemistry principle 10).1 Aqueous solubility of many ionic liquids poses the danger of their release and thereby exertion of untoward effects in aquatic ecosystems. The high chemical and thermal stability also highlights the problem of bioaccumulation of these agents. Hence, proper assessment of the toxicological profile of ionic liquids is necessary.14,15 The present study focuses on the predictive modelling of toxicity of ionic liquids against aquatic bacterium, Vibrio fischeri.
Quantitative structure–activity relationship (QSAR) can sufficiently provide a rational basis for the design of suitable ionic liquids with the intended purpose of use. In this study, we have focused on the prediction of toxicity, i.e., quantitative structure–toxicity relationship (QSTR), of diverse ionic liquid species. The practise of QSAR technique for the prediction of activity, toxicity, or physicochemical properties is robustly supported with compounds’ detailed chemistry and interaction with biological system on a strong background of statistical validation.16 Presently, the use of predictive QSAR models is widely accepted by different international regulatory authorities including the Organization for Economic Co-operation and Development (OECD), which has proposed guidelines for developing QSAR models to serve as an alternative strategy of toxicity testing17 on a reasonable basis.
Toxicity data for ionic liquids to different reference aquatic invertebrate organisms namely Vibrio fischeri (bacteria),18Pseudokirchneriella subcapitata (green algae),19Daphnia magna (crustacean),20 vertebrate organisms namely Danio rerio (zebrafish),21Rana nigromaculata (frog),22 and aquatic plant Lemna minor (duckweed)23 are now available in the literature. Here, we have focussed on developing QSTR models on Vibrio fischeri toxicity endpoint which is widely employed for toxicity studies for different environmental issues,24 namely screening of surface or ground waters, wastewaters, soil or sediment extract monitoring, as well as for toxicity testing of chemical compounds.25
Attempts to develop predictive mathematical models on Vibrio fischeri toxicity data for ionic liquids have been quite a few compared to other classes of hazardous chemicals. Couling et al.26 developed a four parameter equation containing E-state indices and surface area descriptors against V. fischeri toxicity of ionic liquids. Bruzzone et al.27 reported a QSTR model on V. fischeri toxicity data of a class of 33 ILs employing the CODESSA treatment. In another study, Viboud et al.28 conducted a QSTR analysis on the V. fischeri toxicity data of 54 ILs using multifactorial component analysis and reported the importance of the planarity of the cation ring on toxicity. A multiple regression model was developed by Luis et al.29 involving V. fischeri toxicity values of 96 ILs and the effect of different structural groups were reported. A number of other studies also exist in the literature where the authors have reported structural correlation of ionic liquids e.g. effect of alkyl chain length of cations with V. fischeri toxicity without developing any true mathematical model.23,30 In none of the above mentioned QSTR studies, strict external validation strategies employing multiple tools have been employed. However, in this present study, we have externally validated the developed models using several metrics so as to judge the reliability and predictivity of the models on untested new ionic liquid agents with greater confidence.
In the present study, we have attempted to develop predictive QSTR regression models as well as a classification model on Vibrio fischeri toxicity data of ionic liquids (collected from the literature)31 in order to derive a structural relationship of the compounds with the toxicity. The objective of the work has been to develop an initial classification model for ionic liquids using linear discriminant analysis32,33 as a primary screening tool enabling preliminary assessment of toxicity towards Vibrio fischeri and finally to develop a robust mathematical regression model to quantitatively predict the toxicity and to derive specific information regarding the contribution of different structural components towards the toxicity. By using the information derived from both the studies, it will be possible to design newer ionic liquids with less toxicity potential to the living ecosystem (specifically Vibrio fischeri), hence making them ‘Greener alternatives’ to other conventional solvents. We have developed and validated the regression models by strictly following the OECD guidelines for QSAR model development.
The measure of goodness of fit of the discriminant classification model was characterized by Wilk's lambda (λ) statistics58 and Mahalanobis distance measure59 with the respective p-values. Other validation measures for the classification analysis include sensitivity (or true positive rate), specificity, precision, accuracy and F-measure criteria60 for the training as well as test set observations. The chi-square (χ2) parameter (higher value signifying greater separability) has been additionally determined for judging the quality of achieved discrimination. The popular receiver operating characteristics (ROC)60–64 analysis was performed on the obtained sensitivity and specificity parameters and the result have been validated employing the area under the ROC curve measure. We have also computed two more recently described validation parameters for the ROC analysis namely the ROC graph Euclidean distance (ROCED) and the ROC graph Euclidean distance corrected with Fitness Function (FIT(λ)) (ROCFIT).65 Other than these, the Matthews correlation coefficient (MCC)66 has been reported. We have additionally determined the pharmacological distribution diagram (PDD)67 with the aim of providing a visual representation of the obtained classification between toxic and non-toxic compounds. Another validation parameter ρ (ratio between the numbers of compounds in the training set to the final descriptor in the model) was reported for the classification model. The theory of the validation parameters can be found at the ESI.†
(1) |
Variables | Coefficient | Wilk's λ | F-remove (df = 1, 97) | p-Value |
---|---|---|---|---|
MSD(cation) | 6.658 | 0.400 | 33.022 | 0.0000 |
FeCl4 | 1.226 | 0.348 | 16.353 | 0.0001 |
SaasC(cation) | 1.555 | 0.346 | 15.440 | 0.0002 |
MOC2MIM | 0.988 | 0.333 | 11.288 | 0.0011 |
C4MMPy | −0.857 | 0.317 | 6.033 | 0.0158 |
MLOGP(anion) | 1.274 | 0.337 | 12.771 | 0.0006 |
Σα(cation) | −4.597 | 0.345 | 15.279 | 0.0002 |
C4(CH3)2N-Py | 0.807 | 0.318 | 6.579 | 0.0118 |
Cap | 1.462 | 0.361 | 20.387 | 0.0000 |
C8MIM | 0.663 | 0.314 | 5.035 | 0.0271 |
12OSO3 | 1.017 | 0.329 | 10.128 | 0.0020 |
C6MPy | 0.653 | 0.311 | 4.226 | 0.0425 |
Constant | −3.638 |
The compounds were further classified into three groups based on their individual discriminant function scores. Those showing DF values greater than 0.5 were assigned as toxic, those with less than 0, i.e., negative values were assigned as non-toxic, and those with values between 0 and 0.5 were assigned in the group of undetermined toxicity. In all the cases the term toxicity corresponds to the V. fischeri toxicity of the ionic liquids with respect to toluene as a threshold. This treatment was performed separately on priorly defined higher toxic and lower toxic groups maintaining the training and test set division of compounds. Hence, from this treatment of grouping, we get the results for training and test set compounds separately for the a priori defined more toxic compounds (46 observations), and less toxic compounds (101 observations). Eqn (1) is characterized by eight indicator parameters of which five are for cationic and three are for anionic species, and four two-dimensional descriptors, of which three belonging to the cationic part and one for the anionic part of the analyzed ionic liquids. The cationic indicator variable C4MMPy and extended topochemical atom (ETA) parameter Σα(cation) for cations bear negative coefficients. Presence of the former parameter C4MMPy, the butyldimethylpyridinium moiety, in an ionic liquid will reduce its toxicity, whereas the ETA parameter Σα(cation) signifies that increase in molecular bulk of cations causes reduction of V. fischeri toxicity of ionic liquids. On the other hand, we have ten more parameters in the best discriminant equation (eqn (1)) reflecting their positive impact in raising toxicity of ILs to V. fischeri. Occurrence of the cationic fragments methoxyethylmethylimidazolium (MOC2MIM), butyl(dimethylamino)pyridinium (C4(CH3)2N-Py), octylmethylimidazolium (C8MIM), and hexylmethylpyridinium (C6MPy); anionic fragments tetrachloroferrate(III) (FeCl4), caprylate (Cap), and o-dodecylsulfate (12OSO3) in structures of ionic liquids cause increase in the toxicity. Three additional topological, connectivity, and thermodynamic parameters in eqn (1) show their positive impact towards toxicity of ILs. MSD(cation) (mean square distance), a parameter derived from the distance matrix, decreases with increased branching in compounds; hence higher values of MSD, i.e., reduced branchedness of the cations has a propensity of raising toxicity. SaasC(cation), an electrotopological state parameter which represents an aromatic carbon atom attached to two other aromatic carbons and one non-hydrogen atom (, the C-atom shown in dot), refers to the aromaticity of carbon atoms. Here, it indicates that carbon atom of such type in cations will positively contribute towards toxicity. MLOGP(anion), the Moriguchi octanol–water partition coefficient, indicates the proportional relationship of lipophilicity of anions towards toxicity ILs to V. fischeri. The structural information provided by MSD and Σα for cations in eqn (1) correspond to each other. Toxicity of ionic liquids to V. fischeri increases when the value of MSD rises and that of Σα reduces. Now, increase of MSD refers to reduced branching, whereas reduction in values of Σα refers to reduced molecular bulk. The hydrophobicity parameter MLOGP for the anionic fragment is directly related to the toxicity; hence, increase in hydrophobicity of anions causes increase in toxicity. The cationic indicator parameters MOC2MIM and C8MIM explain the effect of imidazolium moiety, and C4(CH3)2N-Py, C6MPy, and C4MMPy show the importance of pyridinium nucleus. Hence, from the present study, it is evident that V. fischeri toxicity is explainable in terms of presence of heterocyclic ring with nitrogen atom.
A contribution plot was developed, shown in Fig. 1, for the best discriminating descriptors (eqn (1)) by taking the product of their average values with their corresponding coefficients. From the plot, it can be observed that the cationic indicator parameters MOC2MIM, C4MMPy, C4(CH3)2N-Py, C8MIM, C6Mpy and the anionic parameter 12SO3 share almost equal contributions towards toxic and non-toxic groups, and the other parameters show their effects on both the groups. The ETA parameter Σα(cation) seems to be more important for negative contribution to the toxic group. The variable MSD(cation) bears a marked positive contribution for the toxic class, whereas, the cationic electrotopological state atom parameter SaasC(cation), the anionic indicator variable Cap, FeCl4 and the lipophilicity parameter MLOGP(anion) show more positive contributions towards the toxic group than a negative effect on the non-toxic group. This plot (Fig. 1) demonstrates the marked importance of molecular branching (MSD(cation)) and molecular bulk (Σα(cation)) for predicting the toxicity of ionic liquids towards V. fischeri.
Fig. 1 Contribution plot of the variables present in the classification model (eqn (1)). |
A pharmacological distribution diagram (PDD) was developed between the expectancy values and discriminant function ranges in the form of a frequency distribution diagram. We have identified the training and test sets separately within the priorly defined toxic and non-toxic groups. Fig. 2 shows that the bars representing those of toxic groups and non-toxic groups lie in the ranges of positive and negative values of DF respectively with less degree of overlap, and the maximum values of expectancy for toxic and non-toxic groups lie in the different sides of DF = 0; this infers a good separation between the two classes. The plot also explains that among the classified toxic chemicals, the maximum toxicity value is obtained at DF = 5 for the training set and DF = 6 for the test set and for the non-toxic class, the maximum value is obtained at DF = −4 for both the training and test sets.
Fig. 2 Pharmacological distribution diagram between the expectancy values and discriminant function (DF) intervals for the training and test sets. |
Additional validation parameters namely undetermined toxicity, false toxicity/non-toxicity, overall accuracy, and adjusted accuracy, calculated for the training as well as test sets for both the toxic and non-toxic groups have been indicated in Table S2 in the ESI.†
Division of compounds | N | Sensitivity (%) | Specificity (%) | Accuracy (%) | Precision (%) | F-measure (%) |
---|---|---|---|---|---|---|
Training set | 110 | 87.10 | 96.20 | 93.64 | 90.00 | 88.52 |
Test set | 37 | 60.00 | 86.36 | 75.68 | 75.00 | 66.67 |
We have generated a ROC plot separately for the training and test set compounds. From Fig. 3, it can be observed that all the points lie in the upper left triangle of the ROC space for both the training and test sets, and hence the performance of the model is assumed to be better than random classifier. In case of the training set, the points are more closely aligned to 100% sensitivity as the curve rises upward and the curve possesses an AUC value of 0.976 square units which is highly appreciable (Fig. 3). For the test set, the points are near the liberal zone and also carry an acceptable value of AUC of 0.852 square units.
Fig. 3 Receiver operating characteristics (ROC) curve of the classification model for the training and test set compounds. |
The ROCED parameter was computed using the Euclidean distance values of the training and test set results. The parameter ROCED bears a value of 0 for perfect classifier and 4.5 for random classifier; a value within 2.5 was proposed as good classifier by Pérez-Garrido et al.65 Our model showed a value of 1.021 for ROCED, which corresponds to a good quality of the ROC analysis. ROCFIT was calculated by dividing the ROCED with Wilk's λ value, and an acceptable value of 3.426 was obtained. The Matthews correlation coefficient (MCC) usually varies from −1 to +1 referring to an inverse classification to a perfect classification respectively, whereas a value of 0 corresponds to random classification performance. The present study also showed an acceptable value for the Matthews correlation coefficient (MCC); a value of 0.841 for the training and 0.486 for the test set. The training set shows a near perfect classifier value of MCC (0.841) which is better than that for the test set, where the value is still well above a random classification instance (0.486).
(2) |
The external predictivity metric R2pred is also highly encouraging and it accounts for 73.90% predicted variance for the test set compounds. All the rm2 metric values are well in the acceptable region. The average rm2 () value for external validation (= 0.623) of this model is higher than that for internal validation (), thereby explaining better external predictivity of the model. Thevalue (0.554) is also acceptable for describing the overall predictive performance of the model. The Δrm2 values for external (Δrm2(test) = 0.156) and overall validation (Δrm2(overall) = 0.195) are acceptable excepting internal validation (Δrm2(LOO) = 0.209), where it is just above the proposed tolerance limit of 0.2. Additionally, the value of the metric for randomization, cRp2, is also greater than the threshold value of 0.5. The results of process and model randomization tests are impressively characterized by cRp2 values of 0.634 and 0.669 respectively. Fig. 4 indicates the observed toxicity values in the abscissa plotted against calculated (training set) and/or predicted (test set) toxicity values in the ordinate in the form of a scatter plot using eqn (2). The diagram shows uniform scattering of test and training set chemicals.
Fig. 4 Observed vs. calculated (training set)/predicted (test set) toxicity plot for the regression model (eqn (2)). |
Any negative value of a spline function appearing in a GFA equation indicates zero value i.e., no contribution of the variable. All the variables in eqn (2) have been ranked according to their standardized coefficients in a descending order referring most prior variable appearing first and so forth. Eqn (2) shows presence of five descriptors, three among them correspond to the effect of cations and the remaining two describe the effect of anions on V. fischeri toxicity. The most significant descriptor, as ranked by the standardized coefficient values, is the spline form of ‘GMTI’, indicating that a value of ‘GMTI’ for the cations less than 2305 will be negatively contributing towards toxicity. The parameter specifically accounts for better measure of branching for cyclic structure than acyclic ones.34 Increased branching causes reduction in ‘GMTI’ values. Hence, considering the negative impact of the spline term (<2305−GMTI>) it can be inferred that toxicity will be reduced with raised branchedness among cationic groups up to a ‘GMTI’ value less than 2305. The next important descriptor is the valence connectivity index of fifth order (5χv)34 indicating branching among cationic groups with five bond fragments. The parameter actually refers to the effect of molecular size, because smaller cationic groups having less than five contiguous bond fragments will not have its effect, unlike larger length cyclic or acyclic cations where values of the parameter 5χv(cation) can be calculated. Being a valence connectivity index, the parameter 5χv also shares information from heteroatoms and multiple bonds. A lower value of 5χv chiefly represents reduced molecular size that can be attributed to reduced chain length. As 5χv bears a negative coefficient, toxicity of ILs will decrease if 5χv increases i.e., if the molecular size is raised in terms of acyclic straight or branched chain mostly in the form of side chain substituents to cations where the opportunity of presence of five and more contiguous bonds lies. Next, we have the extended topochemical atom (ETA) index [η′]local for anions that corresponds to the contribution of locally bonded atoms.35–37 Here, local boding refers to the atoms connected with single covalent bonds, i.e., contributions of atoms separated by two or more covalent bonds and any non-bonded interactions are not taken in consideration for this specific index. It represents the local molecular topological nature and local branching for each atom and then sums for all atoms, and its value increases as branching rises. Hence, because of the positive coefficient of [η′]local, reduction in branching of anionic groups will contribute to the reduced toxicity of ionic liquids. The variable ‘NsOH’ is an electrotopological state atom index count of hydroxyl (–OH) group. Here, the negative coefficient of ‘NsOH’ represents that reduction of V. fischeri toxicity value can be achieved with an increased number of hydroxyl group in cations. This reduction of toxic effects with the increase of –OH groups can be attributed to raised hydrophilicity of the cations due to hydroxyl groups, which hinders them to pass through biological membranes. The final parameter 4χs(anion) gives information on the anionic groups. 4χs is the solvation connectivity index of fourth order bond fragment and it describes the solvation entropy and dispersion interactions of molecules34 in solution phase. It also describes branching characteristics for fourth order bond fragments, and such long fragments make 4χs to be related to molecular size as well as molecular surface area. Now, as this parameter is weighted by principal quantum number term, it can be considered as a measure of electronegativity that controls the hydrophilicity of a molecule. Here, it is negatively related to toxicity; hence, a higher value signifying higher electronegativity will reduce toxicity. If electronegativity of a molecule is higher, it will get hydrophilic because of greater separation of molecular charge, and its penetration across biological membrane reduces.
We used the leverage method to define the AD using eqn (2).70Fig. 5 shows the Williams plot,71 where three standard deviation (σ) unit of standardized cross-validated residual value on both positive and negative sides of zero has been identified for detection of any response outlier compounds. The calculated critical HAT value in this study is 0.200. From Fig. 5 it can be observed that the training set compound with serial number 114 possesses standardized cross-validated residual value greater than 3σ unit, but it lies within the critical HAT (h*) value of 0.2. This compound may have erroneous observed toxicity value and it can be identified as an outlier due to wrong value of response variable, i.e., a response outlier. Compounds with serial numbers 75, 85, 139 and 140 belonging to the training set are characterized by high leverage values, but they lie within the assumed 3σ limit of the ordinate. These chemicals are not outliers, but influence the applicability domain to span over larger area and hence can be termed as influential chemicals. The test set compound with serial number 137 is within the 3σ limit of the ordinate, but possess higher leverage value (h > h*). As we can see from Fig. 5 that compound 137 (serial number) is surrounded by other four influential training set chemicals (serial numbers 75, 85, 139 and 140), its prediction can be assumed to be somewhat reliable or dependable.
Fig. 5 Williams plot showing the domain of applicability of the regression model (eqn (2)). |
The present work involves development of a classification as well as a regression model for the prediction of toxicity of ILs to Vibrio fischeri. The dataset31 contains diverse 147 ionic liquids comprising of 30 anions and 64 cations. The most important aspect of our work is that we have performed rigorous validation of both the classification and regression models by applying multiple strategies and in both the cases encouraging results have been obtained for both external validation (compounds not used for model development) and internal validation (compounds used for model development). The classification model is characterized by highly acceptable sensitivity and specificity values of 87.10% and 96.20% for the training set, and 60% and 86.36% for the test set respectively. The values are in a good state of comparison with the results reported in the source dataset.31 The training set showed better overall accuracy and precision values (93.64% and 90% respectively) than the test set (75.68% and 75%) (as mentioned in Table 2). The discriminant model was also explained as of good quality in terms of Wilk's λ statistics (0.298). The best regression model reported in the present study contains much more number of compounds than most of the previous reports. The training set comprises of 90 compounds and, compared to previous reports, we noted additional validation parameters like Q2 (leave-one-out cross-validation) and rm2 metrics besides the quality parameters (R2 or Ra2). The model is characterized by good external predictivity with an appreciable R2pred value of 0.718 for 36 compounds, which has not been reported for the previous models. The rm2 metrics73 were determined for the training, test, as well as the overall sets. Moreover we have performed randomization for the genetic process and for the model as well to obviate the possibility of chance correlation of the model; the process showed encouraging values of the metric cRp2 (Eqn (2)). It is obvious that a true comparison cannot be performed among studies where the dataset composition varies. However, prediction of results must be assessed in terms of higher statistical stability and reliability. The models developed by us consist of appreciable number of observations and they have undergone a good amount of statistical tests, and hence can be considered as statistically more reliable, predictive, robust to other proposed models on the toxicity of ionic liquids to Vibrio fischeri.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c2tx20020a |
This journal is © The Royal Society of Chemistry 2012 |