Prediction of acute toxicity of emerging contaminants on the water flea Daphnia magna by Ant Colony Optimization–Support Vector Machine QSTR models
According to the European REACH Directive, the acute toxicity towards Daphnia magna should be assessed for any industrial chemical with a market volume of more than 1 t/a. Therefore, it is highly recommended to determine the toxicity at a certain confidence level, either experimentally or by applying reliable prediction models. To this end, a large dataset was compiled, with the experimental acute toxicity values (pLC50) of 1353 compounds in Daphnia magna after 48 h of exposure. A novel quantitative structure–toxicity relationship (QSTR) model was developed, using Ant Colony Optimization (ACO) to select the most relevant set of molecular descriptors, and Support Vector Machine (SVM) to correlate the selected descriptors with the toxicity data. The proposed model showed high performance (QLOO2 = 0.695, Rfitting2 = 0.920 and Rtest2 = 0.831) with low root mean square errors of 0.498 and 0.707 for the training and test set, respectively. It was found that, in addition to hydrophobicity, polarizability and summation of solute-hydrogen bond basicity affected toxicity positively, while minimum atom-type E-state of –OH influenced toxicity values in Daphnia magna inversely. The applicability domain of the proposed model was carefully studied, considering the effect of chemical structure and prediction error in terms of leverage values and standardized residuals. In addition, a new method was proposed to define the chemical space failure for a compound with unknown toxicity to avoid using these prediction results. The resulting ACO–SVM model was successfully applied on an additional evaluation set and the prediction results were found to be very accurate for those compounds that fall inside the defined applicability domain. In fact, compounds commonly found to be difficult to predict, such as quaternary ammonium compounds or organotin compounds were outside the applicability domain, while five representative homologues of LAS (non-ionic surfactants) were, on average, well predicted within one order of magnitude.