Fjodor
Melnikov
a,
Jakub
Kostal
bc,
Adelina
Voutchkova-Kostal
c,
Julie B.
Zimmerman
ad and
Paul
T. Anastas
*ad
aSchool of Forestry and Environmental Studies, Yale University, New Haven, CT 06520, USA. E-mail: paul.anastas@yale.edu
bComputational Biology Institute, the George Washington University, Washington DC, USA
cDepartment of Chemistry, the George Washington University, Washington, DC, USA
dDepartment of Chemical and Environmental Engineering, Yale University, New Haven, CT 06520, USA
First published on 25th May 2016
In silico toxicity models are critical in addressing experimental aquatic toxicity data gaps and prioritizing chemicals for further assessment. Currently, a number of predictive in silico models for aquatic toxicity are available, but most models are challenged to produce accurate predictions across a wide variety of functional chemical classes. Appropriate model selection must be informed by the models’ applicability domain and performance within the chemical space of interest. Herein we assess five predictive models for acute aquatic toxicity to fish (ADMET Predictor™, Computer-Aided Discovery and REdesign for Aquatic Toxicity (CADRE-AT), Ecological Structure Activity Relationships (ECOSAR) v1.11, KAshinhou Tool for Ecotoxicity (KATE) on PAS 2011, and Toxicity Estimation Software Tool (TEST) v.4). The test data set was carefully constructed to include 83 structurally diverse chemicals distinct from the training data sets of the assessed models. The acute aquatic toxicity models that rely on properties related to chemicals’ bioavailability or reactivity performed better than purely statistical algorithms trained on large sets of chemical properties and structural descriptors. Most models showed a marked decrease in performance when assessing insoluble and ionized chemicals. In addition to comparing tool accuracy and, this analysis provides insights that can guide selection of modeling tools for specific chemical classes and help inform future model development for improved accuracy.
To mitigate the challenges associated with in vitro and in vivo toxicity testing, global regulations, including European Chemical Agency (ECHA) REACH initiative, U.S. Toxic Substances Control Act (TSCA) and Canadian Environmental Protection Act (CEPA), encourage increased reliance on in silico approaches.1,7 Similarly, the 2014 National Research Council (NRC) alternatives assessment framework advocates for increased use of in silico methods.8 While not necessary definitive, in silico models can also inform prioritization of chemicals for further testing.5,7,9–12 The development of reliable in silico models for aquatic toxicity relies on availability of high-quality toxicity data for a range of fish species. Office of Chemical Safety and Pollution Prevention (OCSPP) outlined a list of fish species approved for measuring or estimating toxicity to specific organisms and aquatic systems as a whole.13–15
The cost-benefit advantages and regulatory support of in silico methods,16,17 have led to the development of a number of tools for ecotoxicity assessments. Specifically, several Quantitative Structure–Activity Relationships (QSARs), which relate chemical's structural features and physicochemical properties to biological activity, and read-across models, which estimate the toxicity of chemical by comparison to structurally similar compounds have been developed for chemical toxicity to fish and are widely used for ecological risk assessment.7,8 Such tools include the Ecological Structure Activity Relationships (ECOSAR), Kashinhou Tool for Ecotoxicity (KATE) and Toxicity Estimation Software Tool (TEST), which are freely available standalone packages. Ecological Structure Activity Relationships (ECOSAR) and Toxicity Estimation Software Tool (TEST) were developed by the US EPA and the Syracuse Research Corporation,13 while KATE is a product of The Japanese Ministry of the Environment and the Japanese National Institute for Environmental Studies (NIES).18 ECOSAR and KATE rely on octanol-water partitioning coefficient to estimate fish toxicity via series of linear regression models, while TEST uses a large number of structural and electrotopological properties to estimate acute fish toxicity via a number of statistical algorithms.19 Another tool of interest, ADMET, developed by Simulation Plus,20 relies on a range of chemical properties to estimate acute aquatic toxicity using neural networks. CADRE-Aquatic Toxicity (CADRE-AT) uses a small number of mechanistically-relevant reactivity and bioavailability parameters to predict a category of concern for both acute and chronic aquatic toxicity. CADRE-AT is an extension of a set of heuristic rules for molecular design of chemicals with minimal aquatic toxicity that are based on physicochemical properties and reactivity parameters.21–23 In addition to assessment, CADRE-AT is aimed at helping chemists design (or re-design) compounds in order to minimize likelihood of high concern for aquatic toxicity. Unlike other tools in this evaluation, CADRE-AT is computationally intensive, requiring the use of high-performance computing clusters.23
To ensure model quality and regulatory relevance, the Organization for Economic Cooperation and Development (OECD) created a set of guidelines for model development that require external validation metrics, clear applicability domains, and mechanistic relevance to the modeled biochemical processes.24 Despite these guidelines, lack of external validations and model performance outside the training sets remain a major concern.4,25,26 While clear applicability domain definition ensures that the model assumptions are met and provides a measure of prediction confidence,24,27–29 model overfitting and poor applicability domain definitions may lead to low external prediction accuracy in spite of the high accuracy in the model training set.29–32 Previous validation efforts have suggested that model accuracy for a range of aquatic toxicity endpoints decreases during validation.18,33–37 However, these studies either did not conduct a strictly external validation, relied on small data sets, or evaluated one tool at a time.
This study presents a systematic assessment of the widely used and recently developed software tools to predict acute aquatic toxicity to fish and provides insights into the applicability, accuracy and ease of use (e.g., speed, convenience, and the level of expert knowledge required) of these models. Unlike prior research in the area, the test set used in this evaluation is distinct from the training sets of all evaluated tools. Thus, the assessment gives a common benchmark for model performance and further development. Since best practices in model development dictate that independent variables should be empirically relevant to target endpoints,29 special attention is given to chemical properties considered by each program and their relevance to the current understanding of fish toxicity modes of action (MOAs).
Chemical ID | Experimental data | Predicted toxicity | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMET | CADRE-AT | ECOSAR | KATE | TEST | |||||||||||
Chemical name | Chemical classa | LC50b | Cat | LC50 | CP | Cat | CP | LC50 | AD | CP | LC50 | AD | CP | LC50 | CP |
a Chemical class as identified by ECOSAR tool; Cat – regulatory category (1–4); AD – indicated whether the chemical is in the applicability domain; CP? – indicates correct and incorrect predictions based on regulatory categories; NA – not available. b Notes about the LC50 measurements. | |||||||||||||||
(17beta)Estra-1,3,5(10)triene-3,17-diol | Phenols | 1.55 | 2 | 0.797 | No | 2 | Yes | 1.58 | Yes | Yes | 8.35 | No | Yes | 0.650 | No |
5-Fluoro-2,4(1H,3H)pyrionidinedione | Carbonyl Ureas | 2420 | 4 | 1840 | Yes | 4 | Yes | 590 | Yes | Yes | NA | NA | NA | NA | NA |
2-Bromo-2-nitro-1,3-propanediol | Halo Alcohols | 27.6 | 2 | 660 | No | 2 | Yes | 778 | Yes | No | 3230 | No | No | 273 | No |
1-(4-Chlorobenzoyl)-5-methoxy-2-methyl-1H-indole-3-acetic acid | Pyrazoles/pyrroles -acid | 81.9 | 2 | 0.761 | No | 2 | Yes | 0.878 | Yes | No | 2.32 | No | Yes | 0.440 | No |
(2S,5R,6R)-6-[[(2R)-2-Amino-2-phenylacetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid | Aliphatic amines-acid | 1000 | >4 | 17.9 | No | 3 | No | 1534 | Yes | Yes | 7920 | No | Yes | 1.21 | No |
Bromomethane | Neutral organics | 1.82 | 2 | 699 | No | 2 | Yes | 429 | Yes | No | 220 | Yes | No | 554 | No |
Chloromethane | Neutral organics | 550 | 4 | 639 | Yes | 4 | Yes | 274 | Yes | No | 194 | Yes | No | 415 | No |
Iodomethane | Neutral organics | 1.26 | 2 | 391 | No | 2 | Yes | 272 | Yes | No | 185 | No | No | 588 | No |
Methanamine | Aliphatic amines | 237 | 3 | 6349 | No | 3 | Yes | 323 | Yes | Yes | NA | NA | NA | 793 | No |
Ethanamine | Aliphatic amines | 1000 | 4 | 1512 | Yes | 4 | Yes | 223 | Yes | No | NA | NA | NA | 790 | Yes |
Isopropyl amine | Aliphatic amines | 1000 | 4 | 1056 | Yes | 4 | Yes | 155 | Yes | No | NA | NA | NA | 902 | Yes |
2-Methyloxirane | Epoxides, mono | 215 | 3 | 2373 | No | 3 | Yes | 45.0 | Yes | No | 3.86 | Yes | No | 136 | Yes |
Acetone cyanohydrin | Nitrile alpha-OH | 0.570 | 1 | 536 | No | 4 | No | 0.933 | Yes | Yes | 3450 | Yes | No | 377 | No |
Dimethyl sulfate | Esters | 7.50 | 2 | 4119 | No | 2 | Yes | 200 | Yes | No | 1040 | No | No | 67.3 | Yes |
Triethyl phosphate | Esters (phosphate) | 100 | >3 | 582 | Yes | 4 | Yes | 9.83 | Yes | No | 1740 | Yes | Yes | 85.0 | No |
Propanoic acid | Neutral organics-acid | 87.2 | 2 | 2739 | No | 2 | Yes | 11521 | No | No | 1900 | Yes | No | 248 | No |
(2S)-2-Hydroxypropanoic acid | Neutral organics-acid | 130 | 3 | 19093 | No | 3 | Yes | 177000 | No | No | 16.4 | Yes | No | 1680 | No |
Camphene | Neutral organics | 1.17 | 2 | 1.53 | Yes | 2 | Yes | 0.873 | Yes | No | 0.714 | Yes | No | 3.07 | Yes |
6,15-Dihydro-5,9,14,18-anthrazinetetrone | Neutral organics | 46.0 | 2 | 0.026 | No | 2 | Yes | 0.003 | No | No | NA | NA | NA | 0.013 | No |
4-Hydroxy-3-(3-oxo-1-phenylbutyl)-2H-1-benzopyran-2-one | Vinyl/allyl alcohols | 49.2 | 2 | 0.422 | No | 2 | Yes | 5.51 | Yes | Yes | 43.5 | No | Yes | 5.30 | Yes |
1,2-Benzenedicarboxylic acid | Neutral organics-acid | 1000 | >4 | 2079 | Yes | 3 | No | 9323 | No | Yes | NA | NA | NA | 38.4 | No |
o-Chlorobenzaldehyde | Aldehydes (mono) | 2.62 | 2 | 5.62 | Yes | 1 | No | 5.24 | Yes | Yes | 5.75 | Yes | Yes | 6.35 | Yes |
2-Ethyl-1,3-hexanediol | Neutral organics | 624 | 4 | 1187 | Yes | 4 | Yes | 275 | Yes | No | 580 | Yes | Yes | 364 | No |
2-Imidazolidinethione | Thioureas | 502 | >4 | 324 | No | 3 | No | 79928 | Yes | Yes | 2790 | No | Yes | 273 | No |
Benzenesulfonyl chloride | Acid halides | 3.00 | 2 | 3251 | No | 2 | Yes | 3.82 | Yes | Yes | NA | NA | NA | 31.9 | Yes |
alpha-Terpineol | Neutral organics | 6.53 | 2 | 32.1 | Yes | 2 | Yes | 8.07 | Yes | Yes | 32.1 | Yes | Yes | 28.2 | Yes |
1-Chloro-4-nitrobenzene | Neutral organics | 15.0 | 2 | 10.9 | Yes | 1 | No | 50.5 | Yes | Yes | 18.6 | Yes | Yes | 9.67 | Yes |
3-Phenyl-2-propenal | Vinyl/allyl aldehydes | 4.64 | 2 | 1.07 | Yes | 2 | Yes | 0.201 | Yes | No | 8.74 | No | Yes | NA | NA |
1,2-Dibromoethane | Neutral organics | 24.4 | 2 | 27.9 | Yes | 2 | Yes | 151 | Yes | No | 108 | Yes | No | 14.9 | Yes |
Butanoic acid | Neutral organics-acid | 65.0 | 2 | 1405 | No | 2 | Yes | 4963 | No | No | NA | NA | NA | 139 | No |
2,5-Furandione | Neutral organics | 138 | 3 | 3.18 | No | 3 | Yes | 177 | No | Yes | NA | NA | NA | NA | NA |
N-(2-Aminoethyl)-1,2-ethanediamine | Aliphatic amines | 1000 | 4 | 737 | Yes | 4 | Yes | 10281 | Yes | Yes | NA | NA | NA | 1250 | Yes |
1,2-Ethanediol, diacetate | Esters | 90.0 | 2 | 40.4 | Yes | 2 | Yes | 167 | Yes | No | 297 | Yes | No | 130 | No |
1-Octene | Neutral organics | 3.66 | 2 | 1.27 | Yes | 2 | Yes | 1.12 | Yes | Yes | 0.758 | Yes | No | 1.18 | Yes |
2-(2-Methoxyethoxy)ethanol | Neutral organics | 2683 | 4 | 19514 | Yes | 4 | Yes | 71002 | Yes | Yes | 22400 | Yes | Yes | 14800 | Yes |
N,N,N-Trimethyl-1-hexadecanaminium, chloride | Neutral organics | 0.158 | 1 | 0.924 | Yes | 1 | Yes | 22.7 | Yes | No | NA | NA | NA | NA | NA |
1-Bromodecane | Neutral organics | 18.7 | 2 | 0.295 | No | 2 | Yes | 0.107 | No | No | 0.087 | Yes | No | 0.420 | No |
2-(2-Butoxyethoxy)ethanol | Neutral organics | 1300 | 4 | 1285 | Yes | 4 | Yes | 4555 | Yes | Yes | 2000 | Yes | Yes | 750 | Yes |
Benzoic acid, phenylmethyl ester | Esters | 1.40 | 2 | 1.97 | Yes | 2 | Yes | 3.60 | Yes | Yes | 2.80 | No | Yes | 1.65 | Yes |
2-Octanol | Neutral organics | 75.0 | 2 | 36.2 | Yes | 2 | Yes | 23.6 | Yes | Yes | 21.9 | Yes | Yes | 24.2 | Yes |
Decane | Neutral organics | 530 | >4 | 1.47 | No | 1 | No | 0.140 | No | No | 0.063 | Yes | No | 0.590 | No |
Methanesulfonyl chloride | Acid halides | 11.0 | 2 | 38841 | No | 2 | Yes | 16.1 | Yes | Yes | 11.8 | No | Yes | 98.3 | Yes |
N,N-Dimethyl acetamide | Amides | 1000 | 4 | 2339 | Yes | 4 | Yes | 1558 | Yes | Yes | NA | NA | NA | 1180 | Yes |
1-Aminonaphthalene | Anilines (unhindered) | 7.00 | 2 | 6.96 | Yes | 2 | Yes | 13.6 | Yes | Yes | NA | NA | NA | 13.9 | Yes |
Benzyl acetate | Esters | 4.00 | 2 | 13.6 | Yes | 2 | Yes | 18.0 | Yes | Yes | 21.3 | No | Yes | 38.4 | Yes |
1H-1,2,4-Triazole | Triazoles (non-fused) | 498 | 3 | 331 | Yes | 3 | Yes | 3574 | Yes | No | NA | NA | NA | NA | NA |
Cyanamide | Neutral organics | 90.2 | 2 | 190 | No | 4 | No | 11597 | No | No | 568 | Yes | No | NA | NA |
2-(1,3-Dihydro-3-oxo-2H-indol-2-ylidene)-1,2-dihydro-3H-indol-3-one | Vinyl/allyl ketones | 42.0 | 2 | 2.54 | Yes | 1 | No | 37.7 | Yes | Yes | NA | NA | NA | 0.200 | No |
N-[4-[Bis[4-(dimethylamino)phenyl]methylene]-2,5-cyclohexadien-1-ylidene]-N-methylmethanaminiumchloride (1:1) | Neutral organics | 0.100 | 1 | 0.107 | Yes | 1 | Yes | 2771 | No | No | NA | NA | NA | NA | NA |
Nitroguanidine | Aliphatic amines | 2268 | >4 | 51.3 | No | 4 | Yes | 5563 | Yes | Yes | 55900 | No | Yes | NA | NA |
2,5-Dichlorophenol | Phenols | 3.30 | 2 | 12.3 | Yes | 2 | Yes | 6.96 | Yes | Yes | 5.42 | Yes | Yes | 5.93 | Yes |
Acetic acid, ammonium salt (1:1) | Neutral organics | 72.0 | 2 | NA | NA | 2 | Yes | 1.27 × 106 | No | No | NA | NA | NA | NA | NA |
2-Methoxy-2-methylbutane | Neutral organics | 100 | >3 | 315 | Yes | 4 | Yes | 99.0 | Yes | No | 167 | Yes | Yes | 403 | Yes |
Carbamic acid, monoammonium salt | Neutral organics | 40.6 | 2 | NA | NA | 4 | No | 7.97 × 106 | No | No | NA | NA | NA | NA | NA |
4-Chloro-2-methylphenol | Phenols | 2.30 | 2 | 15.7 | Yes | 2 | Yes | 7.22 | Yes | Yes | 7.54 | Yes | Yes | 8.53 | Yes |
2-(1,1-Dimethylethyl)-1,4-benzenediol | Hydroquinones | 0.274 | 1 | 10.8 | No | 1 | Yes | 0.099 | Yes | Yes | NA | NA | NA | 21.0 | No |
2-(2,4-Dichlorophenoxy)acetic acid compd. with N-methylmethanamine (1:1) | Neutral organics | 312 | 3 | NA | NA | 3 | Yes | 2427 | No | No | NA | NA | NA | NA | NA |
2-[4-(1,1-Dimethylethyl)phenoxy]cyclohexyl-2-propynyl ester sulfurous acid | Neutral organics | 0.154 | 1 | 0.510 | Yes | NA | NA | 0.180 | No | Yes | 0.218 | No | Yes | 1.10 | No |
N1-(3-aminopropyl)-N1-dodecyl-1,3-propanediamine | Aliphatic amines | 1.01 | 2 | 0.078 | No | 2 | Yes | 1.39 | Yes | Yes | 0.586 | Yes | No | NA | NA |
Tanone 50 | Aliphatic amines | 0.012 | 1 | 107 | No | NA | NA | 627 | Yes | No | NA | NA | NA | NA | NA |
1,2-Dichloro-3-nitrobenzene | Neutral organics | 12.0 | 2 | 7.45 | Yes | 2 | Yes | 16.2 | Yes | Yes | 8.10 | Yes | Yes | 7.82 | Yes |
5-Chloro-2-(4-chlorophenoxy)phenol | NA | 0.460 | >1 | NA | NA | 1 | Yes | NA | NA | NA | NA | NA | NA | NA | NA |
2-(Octylthio)ethanol | Neutral organics | 2.85 | 2 | 1.96 | Yes | 2 | Yes | 8.95 | Yes | Yes | 3.13 | No | Yes | 2.37 | Yes |
CI pigment yellow 83 | Amides | 40.2 | 2 | 0.002 | No | NA | NA | 0.027 | No | No | NA | NA | NA | NA | NA |
1,3-Bis(hydroxymethyl)-5,5-dimethyl-2,4-imidazolidinedione | Carbonyl ureas | 298 | 3 | 11660 | No | 3 | Yes | 6331 | Yes | No | 2970 | Yes | No | 6890 | No |
N-Decyl-N,N-dimethyl-1-decanaminium chloride (1:1) | Neutral Organics | 0.750 | 1 | 0.0004 | Yes | 1 | Yes | 1.22 | Yes | No | 4.57 × 10−6 | No | Yes | NA | NA |
Dimethyldiallylammonium chloride | Neutral organics | 1.10 | 2 | 0.112 | No | 4 | No | 1.43 × 106 | No | No | NA | NA | NA | NA | NA |
7a-Ethyldihydro-1H,3H,5H-oxazolo[3,4-c]oxazole | Aliphatic amines | 169 | 3 | 1555 | No | 3 | Yes | 308 | Yes | Yes | 73.2 | Yes | No | 2410 | No |
1-(1-Butoxypropan-2-yloxy)propan-2-ol | Neutral organics | 50.0 | 2 | 2905 | No | 2 | Yes | 45354 | Yes | No | 7410 | Yes | No | NA | NA |
alpha-(Nonylphenyl)-omega-hydroxypoly(oxy-1,2-ethanediyl) | Neutral organics | 5.54 | 2 | 2.63 | Yes | 2 | Yes | 2.15 | No | Yes | 0.181 | Yes | No | NA | NA |
N,N′-1,2-Ethanediylbis N-acetylacetamide | Imides | 140 | >3 | 52.0 | No | 4 | Yes | 1911 | Yes | Yes | 1820 | Yes | Yes | NA | NA |
1,3-Dichloro-2-propanol phosphate (3:1) | Esters (phosphate) | 3.60 | 2 | 2.57 | Yes | 2 | Yes | 2.47 | Yes | Yes | 902 | No | No | 0.650 | No |
N,N-Dimethyldecanamide | Amides | 21.0 | 2 | 11.8 | Yes | 2 | Yes | 5.53 | Yes | Yes | 6.86 | Yes | Yes | NA | NA |
alpha-Methyl-4-(2-methylpropyl)benzeneacetic acid | Neutral organics-acid | 100 | >3 | 28.6 | No | 3 | Yes | 41.6 | No | No | 19.4 | Yes | No | 2.55 | No |
(2R)-2-(4-Chloro-2-methylphenoxy)propanoic acid | Neutral organics-acid | 97.0 | >2 | 38.5 | Yes | 3 | Yes | 254 | No | Yes | 10.4 | No | Yes | 21.0 | Yes |
Polypropylene glycol | Neutral organics | 4123 | 4 | 9253 | Yes | 4 | Yes | 24424 | Yes | Yes | 19000 | Yes | Yes | NA | NA |
[2S-[2alpha,5alpha,6beta(S*)]]-6-[[Amino(4-hydroxyphenyl)acetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid | Phenol amines -acid | 1000 | >4 | 38.5 | No | 3 | No | 370 | Yes | No | 20400 | No | Yes | 2.27 | No |
Tetrkis(hydroxymethyl)phosphonium, sulfate(2:1)(salt) | Neutral organics | 95.5 | 2 | 1.03 × 106 | No | 4 | No | 4.26 × 1022 | No | No | NA | NA | NA | NA | NA |
5-[2-Chloro-4-(trifluoromethyl)phenoxy]-2-nitrobenzoic acid, 2-ethoxy-1-methyl-2-oxoethyl ester | Esters | 0.528 | 1 | 0.061 | Yes | 2 | Yes | 1.22 | No | No | 0.923 | No | Yes | 0.028 | Yes |
N-[1,3-Bis(hydroxymethyl)-2,5-dioxo-4-imidazolidinyl]-N,N′-bis(hydroxymethyl)urea | Carbonyl ureas | 150 | >3 | 9006 | Yes | 4 | Yes | 6.70 × 106 | Yes | Yes | 9240 | Yes | Yes | 2760 | Yes |
2-[(5-Chloro-8-quinolinyl)oxy]acetic acid, 1-methylhexyl ester | Esters | 13.3 | >2 | 0.809 | No | 2 | Yes | 0.544 | No | No | 0.502 | No | No | 0.630 | No |
2-Chloro-5-[3,6-dihydro-3-methyl-2,6-dioxo-4-(trifluoromethyl)-1(2H)-pyrimidinyl]benzoic acid 1,1-dimethyl-2-oxo-2-(2-propenyloxy)ethyl ester | NA | 6.02 | 2 | NA | NA | 2 | Yes | NA | NA | NA | NA | NA | NA | 0.095 | No |
1-(2,4-Dichlorophenyl)-4,5-dihydro-5-methyl-1H-pyrazole-3,5-dicarboxylic acid 3,5-diethyl ester | Esters | 4.20 | 2 | 0.196 | No | 1 | No | 1.13 | Yes | Yes | 1.97 | No | Yes | 0.370 | No |
Chemical categories were defined based on LC50 values and EPA acute aquatic toxicity categories of concern.39 Category 4 was added to distinguish chemicals with very low hazard potential. The four regulatory categories are: Category 1 – High hazard (LC50 < 1 mg L−1), Category 2 – Moderate hazard (1 < LC50 < 100 mg L−1), Category 3 – Low hazard (100 < LC50 < 500 mg L−1) and Category 4 – No hazard (LC50 > 500 mg L−1). Multiple LC50 thresholds were available for 40 of the 83 chemicals in the data set with the distribution of differences between the minimum and the maximum LC50 thresholds shown in Fig. S2.† When multiple experimental results for a single chemical were available, the geometric mean of the experimental LC50 values was used because LC50 values are typically log-normally distributed; under conditions of log-normality, geometric means are better estimators of centrality than arithmetic means.40 Regardless of the geometric mean, the vast majority of experimental values resulted in classification of the chemical into the same EPA category of concern for aquatic toxicity with six substances having reported LC50 values that spanned two categories. Experimental LC50 values for an anticoagulant, Warfarin, span three regulatory categories, ranging from 0.037 mg L−1 to >1000 mg L−1, and were independent of test duration. The complete data for these seven chemicals is given in Table S2.†
ADMET | CADRE-AT | ECOSAR | KATE | TEST | |
---|---|---|---|---|---|
a The exact number of compounds is not available. b Number of chemicals in the validation set (N = 83) that are in the AD of each model. OCSPP – office of chemical safety and pollution prevention; 2D ANNE -two-dimensional artificial neural network ensemble. AD – applicability domain. | |||||
Free-ware? | No | No | Yes | Yes | Yes |
Statistical method | 2D ANNE | Classification system | Class-specific linear regression | Class-specific linear regression | Consensus model |
AD definition | Molecular descriptor space | Molecular descriptor space | LogP range and class categorization concerns | LogP range and class categorization concerns | Molecular descriptor space |
Training set size | 490 | 565 | 1000sa | 535 | 823 |
Training set species | Pimephales promelas | Pimephales promelas | All OCSPP approved species | Oryzias latipes, pimephales promelas | Pimephales promelas |
Output | LC50 | Toxicity category (n = 4) | LC50 | LC50 | LC50 |
# of chemicals in the ADb | 78 | 80 | 61 | 35 | 57 |
ADMET predictor™ estimates acute fish toxicity using two-dimensional (2D) Artificial Neural Network Ensemble (ANNE). Although only limited details for the ANNE are available due to proprietary nature of the algorithm, it is known that the model relies on hundreds of structural, constitutional, topological, and electronic properties as descriptors. Two estimates of logP are available – one based on the internal ANNE model, and another based on the atom fragment contribution (AFC) method outlined elsewhere.46 LogD7.4 is calculated with the ANNE method trained on ionizable compounds.20 ADMET is trained on fathead minnow data available from US EPA.60 The program requires SMILES strings or 3D structure files as inputs to provide estimates of LC50 values and can process multiple substances in batch mode. ADMET generates predictions only for compounds that fall within its applicability domain, which is assessed automatically on the basis of descriptor space in the training set.
CADRE-AT uses a series of classification models to bin chemicals into categories of concern for acute and chronic aquatic toxicity. The models are based on mechanistically-relevant bioavailability and reactivity parameters that include distribution coefficient (logD7.4), global quantum-mechanical reactivity indices and other physicochemical descriptors. Reactivity indices include frontier orbital energies, such as lowest unoccupied molecular orbital (LUMO), the highest occupied molecular orbital (HOMO) and the HOMO–LUMO energy gap (ΔE). These parameters are reflective of non-specific chemical reactivity with macromolecules.23,61 Like ADMET, CADRE-AT was trained on the fathead minnow data available from US EPA.60 Since descriptors are calculated at a high level of theory and require 3D chemical structures as inputs, CADRE-AT does not provide instantaneous predictions; typical processing times range from seconds to a few days per chemical, depending on the size and conformational flexibility of the structure(s) involved. CADRE-AT does not have an applicability domain and provides predictions for all organic chemicals that are amenable to the required computations.
Ecological Structure Activity Relationships (ECOSAR) v1.11 estimates acute aquatic toxicity via the Mayer–Overton relationship for chemicals within a structurally similar class.13,62 ECOSAR is trained on a large data set of Ecotoxicity studies from the ECOTOX database that follow OCSPP guidelines.15 The database is divided into 111 structural classes, and linear regression models between LC50 toxicity estimates and logP were developed for substances in each class. When chemicals belong to multiple chemical classes the most conservative (most toxic) estimate is provided based on the principle of excess toxicity. LogP is calculated with the EPISUITE KOWWIN module v. 1.68 using the AFC method.63,64 The KOWWIN module evaluates partitioning of neutral compounds only; thus, toxicity of organic acids and bases is estimated based on QSARs for non-ionized molecules of the same class. The program requires SMILES strings or CAS numbers as inputs to estimate LC50 thresholds and can process multiple substances in batch mode. ECOSAR is designed to perform best on compounds with logP < 5 and molecular weight < 1000 amu.13,62 Chemicals that do not meet the latter two criteria, or are structurally dissimilar from the domain of every QSAR model within ECOSAR, are considered outside the applicability domain.
KAshinhou Tool for Ecotoxicity (KATE) on PAS 2011 estimates acute aquatic toxicity via Mayer–Overton relationship for chemicals within a structurally similar class, akin to ECOSAR. Forty structural chemical classes are used in KATE. Estimated LC50 values are determined from linear regression models that use logP, which is obtained from an internal experimental database or is estimated with the AFC method.63,64 KATE is trained on the US EPA fathead minnow (Pimephales promelas) and the Japanese Ministry of Environment Oryzias latipes datasets.65,66 The program requires SMILES strings or CAS numbers as inputs to assess chemical toxicity and can process multiple substances in a single run with batch mode. The tool is available as a standalone application or as a web plug-in. The batch mode size is limited to 50 chemicals. KATE internally defines the applicability domains by comparing the logP of the test chemical to the range of logP values in each of the structural classes of the training set.18
Toxicity Estimation Software Tool (TEST) v.4.1 consists of a number of models that estimate acute aquatic toxicity thresholds by read-across among structural analogs or via multivariate regression. The models are based on hundreds of structural, constitutional, connectivity, shape, topological, molecular distance, fragments, and electrotopological property descriptors. Several partition coefficient estimates are provided. LogP is calculated with two group contribution methods derived by Ghose45 or Wang.67 TEST is trained on Fathead minnow dataset from the EPA ECOTOX database.38,65 The program requires only SMILES strings or CAS numbers as inputs to quickly assess chemical toxicity and can process multiple substances in a single run with batch mode. Each read-across or regression model has specific applicability domain. The program provides estimated LC50 threshold based on each model's prediction, as well as a consensus average of the component models. Given that the consensus result was previously reported as the most accurate estimate provided by TEST,68 it was used in this validation exercise.
Measures of predictive accuracy | ADMET | CADRE-AT | ECOSAR | KATE | TEST |
---|---|---|---|---|---|
a Total accuracy is the fraction of chemicals assessed by each tool for which the predicted LC50 falls within the same regulatory category as the measured LC50. b Similar to total accuracy, predictive power measures the total number of correct category assignments. However, lack of prediction is treated as an incorrect assignment. c Cannot be calculated; software tool provides regulatory category designation only. d Parametric correlation might provide poor estimate of covariance due to extreme outliers. RMSE – root mean squared error. | |||||
Total accuracy (%)a | 53% | 83% | 51% | 58% | 48% |
Predictive power (%)b | 49% | 80% | 49% | 40% | 35% |
Number of missing predictions | 5 | 3 | 2 | 26 | 23 |
Coefficient of variance (R2) | 0.27 | NAc | 0.11d | 0.35 | 0.21 |
RMSE (log scale) | 1.60 | NAc | 2.94d | 1.47 | 1.32 |
% within 1 regulatory category | 80.8 | 92.5 | 85.2 | 85.5 | 88.3 |
% within a factor of 2 (%) | 25.6 | NAc | 25.9 | 26.3 | 30.0 |
% within a factor of 5 (%) | 48.7 | NAc | 54.3 | 47.4 | 50.0 |
% within a factor of 10 (%) | 57.7 | NAc | 63.0 | 64.9 | 63.3 |
% within a factor of 100 (%) | 80.8 | NAc | 76.5 | 82.5 | 85.0 |
% within a factor of 1000 (%) | 91.0 | NAc | 86.4 | 94.7 | 98.3 |
Measures of predictive accuracy | ADMET | CADRE-AT | ECOSARa | KATEa | TEST |
---|---|---|---|---|---|
a This tool provides predictions when chemicals lie outside the applicability domain (AD). b Total accuracy within the AD is the fraction of chemicals assessed by each tool for which the predicted LC50 falls within the same regulatory category as the measured LC50. The chemicals with AD warnings are excluded from the assessment. c Cannot be calculated; the tool provides regulatory category designation only. d Number chemicals for which the tool provided toxicity estimates that are also within the tool's AD. e Parametric correlation might provide poor estimate of covariance due to extreme outliers. | |||||
Total accuracy inside AD (%)b* | 53% | 83% | 61% | 46% | 48% |
Coefficient of variance (R2) | 0.27 | NAc | 0.13e | 0.25 | 0.21 |
RMSE (log scale) | 1.60 | NAc | 1.29 | 1.35 | 1.32 |
Number of chemicals (out of 83)d | 78 | 80 | 59 | 35 | 57 |
No explicit AD analyses for ADMET, CADRE-AT, and TEST were performed because the programs do not allow for predictions outside their respective ADs (ADMET, TEST) or lack applicability domain definitions (CADRE-AT). A narrow AD definition may decrease model performance due to large fraction of missed prediction and low predictive power. Furthermore, models trained on small data sets with narrow ADs may be overfitted, resulting in a poor accuracy during evaluation. Among evaluated tools that estimate LC50 values, ECOSAR showed the narrowest error distribution when analysis was limited to chemicals within its AD (Table 4).
Measures of predictive accuracy | ADMET | CADRE-AT | ECOSAR | KATE | TEST |
---|---|---|---|---|---|
Total accuracy – narcotics (total: 24) | 68% | 88% | 52% | 50% | 71% |
Missing prediction – narcotics | 2/24 | 0/24 | 1/24 | 4/24 | 7/24 |
Total accuracy – neutral narcotics (logP < 5) | 87% | 94% | 53% | 57% | 77% |
Missing prediction – neutral narcotics (logP < 5) | 1/17 | 0/17 | 1/17 | 2/17 | 3/17 |
Total accuracy – reactive chemicals (total: 6) | 33% | 67% | 50% | 40% | 50% |
Missing prediction – reactive chemicals | 0/6 | 0/6 | 0/6 | 1/6 | 2/6 |
The majority of chemicals impact toxicity through non-specific, reversible interactions with biological membranes, known as narcosis.81–83 Chemicals that primarily act via narcosis include aliphatic and aromatic hydrocarbons, chlorinated hydrocarbons, alcohols, ethers, ketones, aldehydes, weak acids and bases, and some aliphatic nitro compounds.84–86 Narcotics have been shown to exert toxic effects on fish at constant target tissue concentrations on the order of 220–470 mmol kg−1 of lipids.87 Thus, the toxicity of a narcotic is related to its ability to partition across the gill and target membranes and intercalate between the lipid bilayer. Conventionally, this process has been modeled using logP, which provides an estimate of the degree of partitioning across the membranes and the affinity for the hydrophobic region of the lipid bilayer.81,88–94
KATE and ECOSAR predict aquatic toxicity based on single predictor (logP), which has been shown to be mechanistically relevant to compounds acting solely by narcosis.86 Their respective algorithms assume that although toxicants with particular functional groups induce toxicity in excess of that estimated by logP, the “excess” toxicity is constant factor for each category and can be adjusted with an appropriate class-specific correction factor.13,18 Surprisingly, KATE and ECOSAR afford lower accuracy than the other three tools for assessing toxicity categories of chemicals identified as narcotics by the Verhaar scheme (Table 5). The results merit further investigation of the models and the Verhaar classification scheme. It is likely that errors in estimates of logP by ECOSAR and KATE lead to errors in toxicity estimates. Indeed, underestimation of logP by ECOSAR is directly related to its underestimation of toxicity (overestimating LC50) in the set of five chemicals (Fig. 5). The five chemicals are Crystal Violet dye (CAS# 548-62-9), DMDM Hydantoin (CAS# 6440-58-0), Dowanol 54B (CAS# 78491-02-8), and Butafenacil (CAS# 134605-64-4) (Table S1†). On the other hand, overestimated logP values did not lead to proportionally overestimated LC50 results. These differences may arise from the differences in QSAR equations used to predict excess toxicity by ECOSAR; i.e. toxicity above prediction afforded by the baseline octanol-water portioning equation for neutral organic narcotics.13 Furthermore, these differences in ECOSAR and KATE may also be attributed to unstable regression models, as the QSARs for some chemical classes are based on as few as 2 data points (diazonium aromatics).13,31 Previous study on an older ECOSAR version found that 22% of the QSAR equations in the tool were “reliable”.95 In these cases, the tools’ performance would likely be improved if training set chemicals were partitioned into QSAR models by MOA rather than by chemical class.65,96 Further work to test this hypothesis is ongoing. It should be noted that a significant portion of chemicals with large toxicity errors from predictions by KATE and ECOSAR had logP warnings, which should alert the user to discount the accuracy of prediction for those chemicals (Fig. 4).
Partitioning coefficients alone are poor predictors of acute aquatic toxicity for chemicals acting though specific interactions with biological macromolecules.97 Such interactions include covalent reactivity with protein residues and nucleic acids, non-covalent binding to enzymes and receptors (e.g. acetylcholinesterase and estrogen receptors), oxidative phosphorylation uncoupling, and central nervous system stress.69,98,99 Predictive toxicology models can be improved by considering chemical properties mechanistically relevant to these interactions, such as reactivity and steric parameters.100–103 CADRE-AT uses global reactivity indices (such as frontier orbital energies) and physicochemical properties (such as logD7.4, molecular volume and accessible surface area) to bin chemicals into EPA's categories of concern. To this end, the higher performance and greater applicability domain (uniform predictive accuracy for soluble, insoluble, charged and neutral compounds – Table 6) noted for CADRE-AT likely stem from the closer mechanistic relevance of its descriptors.23
Measures of predictive accuracy | ADMET | CADRE-AT | ECOSAR | KATE | TEST |
---|---|---|---|---|---|
The accuracy of the tool is significantly different from its accuracy for neutral molecules with logP < 5 at α = 0.05 (*), and α = 0.10 (•). Two-sided Wilcoxon test was used. | |||||
Total accuracy – neutral (logP ≤ 5) | 57% | 83% | 63% | 62% | 54% |
No. of missing predictions – neutral (logP ≤ 5) | 2/53 | 1/53 | 2/53 | 11/53 | 12/53 |
Total accuracy – neutral (logP > 5) | 38% | 83% | 25%* | 33% | 17%• |
Missing prediction – neutral (logP > 5) | 0/8 | 2/8 | 0/8 | 2/8 | 2/8 |
Total accuracy – anionic | 33% | 83% | 33%• | 33% | 57% |
Missing prediction – anionic | 3/12 | 0/12 | 0/12 | 3/12 | 3/12 |
Total accuracy – cationic | 60% | 80% | 30%• | 50% | 75% |
Missing prediction – cationic | 0/10 | 0/10 | 0/10 | 8/10 | 6/10 |
TEST and ADMET rely on machine learning and consensus models with a diverse array of molecular predictors. Thus, it is not possible to ascertain the mechanistic relevance of the molecular parameters, other than partition coefficients, to MOAs. ADMET ANNE method outperforms the TEST consensus algorithm in accuracy (52% vs. 48%, respectively) and provides estimates for a wider range of compounds (5 vs. 23 missing predictions, respectively). However, due to high number of predictors and more complex statistical algorithms as it is nearly impossible to identify sources of misclassification.
In contrast, ionized species diffuse rapidly through aqueous phase but have lower partitioning into tissues. However, logP estimates do not consider the contribution of ionized molecular forms to chemical partitioning and toxicity. For this reason, using logD7.4 is likely a better estimate of bioavailability than logP, as the former takes into account the effect of ionization at biologically-relevant pH on the hydrophobicity. Furthermore, ionized compounds may exhibit stronger interactions with biological membranes than other narcotics,105 or cause toxicity by an entirely different mode of action than their neutral counterparts.106,107 Consequently, estimations of the acute toxicity of ionizable compounds requires descriptors that reflect the properties of the compound in the predominant ionization state at biological pH. All tools that do not consider ionization exhibited decreased accuracy or failed predictions for a large fraction of ionizable compounds (Table 6). Notable exceptions to the decreased accuracy include ADMET's assessment of cationic compounds, and TEST's assessment of anionic compounds, which are on par with the programs’ accuracy for neutral chemicals. CADRE-AT retained similar accuracy for ionizable compounds, likely because it uses logD7.4 and calculates the reactivity parameters for predominant species at pH 7.4.
Footnote |
† Electronic supplementary information (ESI) available: Toxicity and property data for chemicals in the test set and data quality figures. See DOI: 10.1039/c6gc00720a |
This journal is © The Royal Society of Chemistry 2016 |