In silico prediction of chemical aquatic toxicity with chemical category approaches and substructural alerts†
Aquatic toxicity is an important endpoint in the evaluation of chemically adverse effects on ecosystems. In this study, in silico models were developed for the prediction of chemical aquatic toxicity in different fish species. Firstly, a large data set containing 6422 data points on aquatic toxicity with 1906 diverse chemicals was constructed. Using molecular descriptors and fingerprints to represent the molecules, local and global models were then developed with five machine learning methods based on three fish species (rainbow trout, fathead minnow and bluegill sunfish). For the local models, both binary and ternary classification models were obtained for each of the three fish species. For the global models, data of all the three fish species were used together. The predictive accuracy of both the local and global models was around 0.8 for the test sets. Moreover, data of the sheepshead minnow were used as an external validation set. For the best local model (model 2), the predictive accuracy was 0.875 for the sheepshead minnow, while for the best global model (model 14), the predictive accuracy was 0.872 for the sheepshead minnow. The FN compounds in model 2 and model 14 were 18 and 10, respectively. Hence, model 14 was the best model, and thus could predict the toxicity of other fish species’. Furthermore, information gain and ChemoTyper methods were used to identify toxic substructures, which could significantly correlate with chemical aquatic toxicity. This study provides critical tools for an early evaluation of chemical aquatic toxicity in an environmental hazard assessment.