Ankur
Kumar
a,
Probir Kumar
Ojha
a and
Kunal
Roy
*b
aDrug Discovery and Development Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
bDrug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India. E-mail: kunal.roy@jadavpuruniversity.in
First published on 6th March 2024
Humans and other living species of the ecosystem are constantly exposed to a wide range of chemicals of natural as well as synthetic origin. A multitude of compounds exert profound long-term detrimental health effects. The chronic toxicity profile of chemicals is of utmost importance for long-term risk assessment. Experimental testing of the chronic toxicity of compounds is not always a feasible option considering the magnitude of the number of chemicals, resource intensiveness in terms of time, limited availability of experimental data, and associated cost, which therefore necessitates the use of in silico approaches to overcome the associated limitations. In this work, QSAR (quantitative structure–activity relationship) models were developed employing the regression-based PLS method with strict adherence to OECD guidelines. For this study, chronic and sub-chronic toxicity datasets with LOEL (lowest observed effect levels) and NOEL (no observed effect level) as endpoints were used for model development. The validated models are robust, reliable, and predictable. The statistical results of the models are as follows: R2: 0.6–0.71, QLOO2: 0.51–0.635, and QF12: 0.52–0.658. From the validated models, it was concluded that lipophilicity, electronegativity, the presence of aromatic ethers or aliphatic oxime groups, the presence of complexity in structures, the state of unsaturation in molecules, and the presence of halogen and heavy atoms (phosphate, sulphur, etc.) are responsible for chronic/sub-chronic toxicity. The QSAR models developed in our study can be utilized for the effective gap-filling of toxicity data sets, categorization, and prioritization of chemicals, along with chronic toxicity prediction of new synthetic compounds. Furthermore, we used 2568 approved drugs from the DrugBank and PPDB databases for screening purposes using the validated models, which further corroborated the developed models based on the available toxicity data.
Environmental significanceAt present, chemicals are an essential part of our daily life. These chemicals are either of natural or synthetic origin, and they enter the environment and living beings in different ways. Therefore, there is a chance that these chemicals may have detrimental effects on the environment and human health. The long-term (chronic), sub-chronic, and short-term (acute) toxicities of environmental chemicals are presently of great concern. Due to the scarcity of information on the chronic toxicological effects of most of the compounds, it is difficult to evaluate their potential impact on human health. Therefore, it is necessary to develop an alternative method to identify chemicals that have long-term toxic effects and assess their toxicity. |
To minimize animal testing, test duration, and associated experimental resources, quantitative structure–activity relationship (QSAR) is an alternative in silico technique for the efficient estimation of chronic and sub-chronic toxicity of chemicals.5 QSAR correlates between the response activity/toxicity and numerical description of molecular structures.6
Some attempts were made earlier to develop in silico models for chronic and sub-chronic toxicity prediction in mammals.4,7–12 However, some of the previous studies reported neither different internal and external validation metrics nor mechanistic interpretations. Due to the lack of structural diversity, the generalizability of the models of some previous studies was also restricted. Some of the previous studies reported different endpoints, duration of the study, different numbers and types of compounds, different reference species, and different modeling algorithms.
In the present study, we assessed the chronic (more than 360 days) (pLOEL value: 3.881 to (−1.40); pNOEL value: 4.88 to (−1.88)), and sub-chronic (180 ± 90 days) (pLOEL value: 3.07 to (−1.89), pNOEL value: 3.85 to (−1.32)) toxicity of diverse organic chemicals in rats and mice using the LOEL (lowest observed effect level) and NOEL (no observed effect level) as the endpoints. We have taken maximum experimental (chronic and sub-chronic) toxicity data (NOEL and LOEL value in mg per kg per day) of rats only (around 98%) and very little data for mice. These organic chemicals include a range of diverse chemicals such as pharmaceuticals, industrial waste compounds, food, and agricultural, natural, and compounds meant for daily use. Regression-based partial least squares (PLS) models were developed utilizing only 2D descriptors. Stepwise regression, genetic algorithm, and the best subset selection (BSS) were used as the feature selection methods. We also screened the additional real-world databases, e.g. DrugBank database (https://go.drugbank.com/) and PPDB (Pesticide Properties DataBase), for the estimation of chronic and sub-chronic toxicity using the developed PLS models and checked the quality of the predictions using the Prediction Reliability Indicator (PRI) tool (available from http://teqip.jdvu.ac.in/QSAR_Tools)13 for the developed model. In our present study, the developed QSAR models are accurate, robust, reliable, validated, predictable, wide domain of applicability, and mechanistically interpretable. We introduce here new models for the LOEL and NOEL endpoints based on the collected data. However, we may note that the endpoint values depend on the experimental conditions, inter-laboratory variations, the number of samples, and the exposure details. Furthermore, the values depend on the nature of the species in which the experiment is performed, and the lowest value is usually used in cases when multiple values are available.7–12
Model | Latent variables | Training set | Test set | ||||
---|---|---|---|---|---|---|---|
Model R2 | Model QLOO2 | MAELOO | R pred 2 or Q(F1)2 | Q (F2) 2 | MAEtest | ||
IM1 | 2 | 0.673 | 0.604 | 0.575 | 0.618 | 0.606 | 0.559 |
IM2 | 2 | 0.711 | 0.635 | 0.545 | 0.658 | 0.598 | 0.542 |
IM3 | 4 | 0.607 | 0.547 | 0.464 | 0.562 | 0.537 | 0.546 |
IM4 | 5 | 0.634 | 0.518 | 0.639 | 0.523 | 0.500 | 0.730 |
IM1: pLOEL chronic toxicity
pLOEL = −17.04182 + 0.18387 × nHM + 39.64332 × Eta_epsi_3 − 0.13193 × nCb − −0.90898 × nArCOOH − 0.58155 × nOHp − 0.37378 × C − 007 + 0.53198 × B01[N–O] + 0.47263 × B06[C–N] + 0.79515 × B06[C–Cl] + 1.71437 × B01[S–P] − 0.45704 × nArCOOR + 0.88359 × nRSR |
IM2: pNOEL chronic toxicity
pNOEL = 0.7256 + 0.26063 × nHM + 0.63801 × C − 019 − 0.36212 × O − 058 + 1.5621 × B03[C–P] + 0.86576 × B04[C–N] + 0.60945 × nArOR |
IM3: pLOEL sub-chronic toxicity
pLOEL = −1.81023 − 0.15185 × nO + 0.55685 × nCsp + 0.18064 × X4v + 2.50839 × nRCNO + 0.10683 × H − 048 − 0.37199 × MaxdssC + 0.54049 × MaxssssC + 2.97984 × B01[C–F] − 0.89772 × B05[O–S] + 0.5201 × SAscore + 0.08844 × C − 026 + 0.18375 × nCconjX |
IM4: pNOEL sub-chronic toxicity
pNOEL = −2.04357 + 0.98599 × nCrq − 0.25671 × H − 051 − 1.09359 × minssCH2 + 1.84276 × B01[C–C] − 0.71667 × B03[C–C] + 0.76782 × B04[N–N] + 0.81005 × B05[C–O] + 1.10367 × F01[S–P] − 0.32873 × F02[O–O] + 0.36323 × B07[C–C] + 3.75617 × Eta_alpha_A − 0.69496 × nRCONR2 |
Descriptor with its contribution given in bracket | Description | Type of descriptor | Fragment | Mechanistic interpretation |
---|---|---|---|---|
The pLOEL model (IM1) | ||||
nHM (+ve) | Number of heavy atoms | Constitutional index | P, Br, S, Cl | Heavy atoms (P, Br, S, Cl) in a chemical structure are associated with chronic chemical toxicity chemicals towards rats, as explained by Kar et al.32 & Singh et al.33 The effect can be observed in compounds 1 (Allura red AC) (given in Fig. 3) and 4 (dibutyl phthalate) |
Eta_epsi_3 (+ve) | ETA electronegativity measure 3 | ETA descriptor | — | This descriptor is related to the electronegativity of the compound. A compound's toxicity can be attributed to its electronegativity,34,35 as evidenced in compounds 9 (ethylphthalyl ethyl glycolate) (given in Fig. 3) and 10 (FD & C blue no. 2) |
nCb- (−ve) | Number of substituted benzene C (sp2) | Functional group count descriptor |
![]() |
The toxicity of substituted benzenes is related to their ability to penetrate the cell through the cell membrane, and the electronic interactions of the chemicals with the active site. In our case, substituted benzenes with hydrophilic groups (prevalent in the present data set) may enhance the tendency of hydrogen bonding of a compound with water which in turn might impart hydrophilicity, reducing compounds' toxicity.36,37 The negative regression coefficient of this (nCb-: the number of substituted groups) descriptor indicates here that it has an inverse correlation with toxicity endpoints as observed in compounds 117 (metalaxyl) and 124 (mirex) (demonstrated in Fig. 3) |
nArCOOH (−ve) | Number of carboxylic acids (aromatic) | Functional group counts descriptor |
![]() |
The existence of carboxylic acid may be crucial for increasing the compound's hydrophilicity.36,38 Thus, it reduces the toxicity as displayed in compounds 121 (methyl-4-chlorophenoxyacetic acid, 2) (displayed in Fig. 3) and 104 (γ-hexachlorocyclohexane) |
nOHp (−ve) | Number of primary alcohols | Functional group count |
![]() |
The occurrence of a higher number of hydroxyl groups in compounds increases the solubility of chemicals, thus increasing the excretion rate of these chemicals.39 Phase II metabolism (conjugations) requires primary hydroxyls for chemical detoxification.40 This phenomenon can be explained by compounds 91 (diquat) (provided in Fig. 3) and 97 (ethyl methyl phenyl glycinate), where the presence of more primary alcohols makes the compounds less toxic |
C-007 (−ve) | CH2X2 | Atom-centered fragment |
![]() |
This descriptor indicates the linkage between the number of methylene groups to electronegative atoms like phosphorus, nitrogen, sulphur, oxygen, and various halogens.41 In our study, an inverse correlation was found between this descriptor and the chronic toxicity of compounds against rats, as evidenced by the least toxic compounds 133 (oxadiazon) (given in Fig. 3) and 140 (phenformin) |
B06[C–N] (+ve) | Presence/absence of C–N at topological distance 6 | 2D atom pair descriptor |
![]() |
The descriptor is associated with molecular size, and higher values of the same will escalate the compound's lipophilicity.42 The occurrence of nitrogen atoms may also enhance the chronic toxicity of the compounds towards rats by imparting electronegativity (the presence of nitrogen will make the compound more electronegative)16,24,43 as shown in compounds 22 (propyl gallate) (presented in Fig. 3) and 26 (1-naphthyl) ethylene-diamine dihydrochloride, N–) |
B06[C–Cl] (+ve) | Presence/absence of C–Cl at topological distance 6 | 2D atom pair descriptor |
![]() |
Generally, the presence of a Cl atom (halogen) increases the lipophilicity of chemical compounds. Thus, it can easily cross the cell membranes, resulting in high chronic toxic.44,45 This phenomenon is demonstrated in compounds 43 (aspartame) (demonstrated in Fig. 3) and 20 (methyl salicylate) |
B01[S–P] (+ve) | Presence/absence of S–P at topological distance 1 | 2D atom pair descriptor |
![]() |
The presence of phosphorus and sulphur atoms may be responsible for the enhancement of chronic toxicity,46,47 as shown in compounds 14 (p-hydroxybenzoic acid methyl ester) (given in Fig. 3), and 19 (methyl methacrylate) |
nArCOOR (−ve) | Number of esters (aromatic) | Functional group count descriptor |
![]() |
The nArCOOR group is polar (hydrogen bonding of oxygen of nArCOOR with water) in nature. Polarity and toxicity are inversely related to each other.42 A functional group with a polar fragment like nArCOOR reduces the toxicity16,24,43 of chemicals in rats as demonstrated in compounds 101 (fluvalinate) (given in Fig. 3), and 105 (hexahydro-1,3,5-trinitro-1,3,5-triazine) |
nRSR (+ve) | Number of sulfides | Functional group count descriptor |
![]() |
The presence of a higher number of sulphurs in molecular structure enhances the toxicity of compounds.48 With the increase in the numerical value of nRSR, the chronic toxicity of a compound is increased, as evidenced in compounds 35 (acitluorin sodium) (illustrated in Fig. 3) and 24 (styrene) |
B01[N–O] (+ve) | Presence/absence of N–O at topological distance 1 | 2D atom pair descriptor |
![]() |
The presence of two electronegative atoms in this descriptor may contribute to the chronic toxicity of chemicals in rats, as suggested by Toropov et al.34 in 2008. Notably, this feature amplifies the toxicity of chemicals, as evidenced in compounds 36 (alachlor) (given in Fig. 3) and 17 (lithocholic acid) |
![]() |
||||
The pNOEL model (IM2) | ||||
nHM (+ve) | Number of heavy atoms | Constitutional index | P, Br, S, Cl | The presence of heavy atoms ((P, Br, S, Cl)) in chemical structure is associated with chronic heavy metal toxicity in rats,33 as shown in compounds 21 (chlordane) (given in Fig. 4) and 65 (mirex) |
C-019 (+ve) | ![]() |
Atom-centered fragment descriptor |
![]() |
The presence of halogens or heteroatoms (generally electronegative) like oxygen, nitrogen, phosphorus, and various halogens may enhance the toxicity of chemicals to rats.33,44 This can be notably demonstrated by compounds 30 (dieldrin) (provided in Fig. 4) and 21 (chlordane) |
O-058 (−ve) | ![]() |
Atom-centered fragment descriptor |
![]() |
This descriptor is related to hydrophilicity (high potential to form H-bonding).49 There exists an inverse relationship between hydrophilicity and toxicity.50 Thus, the occurrence of this fragment in the backbone structures does not influence the toxicity, as shown by compound 14 (asulam) (illustrated in Fig. 4) |
B03[C–P] (+ve) | Presence/absence of C–P at topological distance 3 | 2D atom pair descriptor |
![]() |
The presence of the phosphate group may influence the toxicity of the chemicals.42,51 B03[C–P] is directly correlated with compound toxicity as demonstrated by compound 29 (dichlorvos) (displayed in Fig. 4) |
B04[C–N] (+ve) | Presence/absence of C–N at topological distance 4 | 2D atom pair descriptor |
![]() |
The presence of highly electronegative atoms like nitrogen may influence the compounds' toxicity, as shown in compounds 71 (phosmet) (displayed in Fig. 4) and 68 (oxamyl)42 |
nArOR (+ve) | Number of ethers (aromatic) | Functional group count descriptor |
![]() |
Generally, aromatic ethers are toxic in nature.52 Thus, the compound containing such fragments has high pNOEL values (chronic toxicity value), as illustrated in compounds 69 (oxyfluorfen) (displayed in Fig. 4) and 54 (isoxaben) |
![]() |
||||
The pLOEL model (IM3) | ||||
SAscore (+ve) | Synthetic accessibility score | Molecular property | — | SAscore signifies the synthetic accessibility score and is linked to the complexity of molecules. The higher value of this descriptor shows that the synthesis of such compounds is complex.42 This, in turn, increases the toxicity of compounds, as shown in compounds 116 (hexachlorobutadiene) (demonstrated in Fig. 5) and 117 (hexachlorocyclopentadiene) |
nO (−ve) | Number of oxygen atoms | Constitutional descriptor |
![]() |
The presence of oxygen in the structure makes it more hydrophilic by the formation of H-bonding.49 This observation can be demonstrated by compounds with low toxicity, like compounds 25 (isomaltitol) (illustrated in Fig. 5) and 138 (oxytetracycline hydrochloride) |
nCsp (+ve) | Number of sp hybridized carbon atoms | Constitutional index descriptor |
![]() |
This feature is related to unsaturation in chemical compounds due to the presence of sp hybridized carbon atoms. Unsaturated compounds are more toxic due to their high reactivity53 as demonstrated in compounds 151 (pronamide) (shown in Fig. 5) and 154 (pydrin) |
X4v (+ve) | Valence connectivity index of order 4 | Connectivity index descriptor | — | This descriptor is related to the molecular size and shape of the compounds.54 The high numerical value of this descriptor makes the compound more toxic, as shown in compounds 135 (octabromodiphenyl ether) (given in Fig. 5) and 165 (tetraethyldithiopyrophosphate) |
nRCNO (+ve) | Number of oximes (aliphatic) | Functional group count |
![]() |
The presence of an aliphatic oxime group in the molecular structures might be responsible for the toxicity enhancement.55 This phenomenon is demonstrated by compounds 60 (aldicarb) (shown in Fig. 5) and 61 (aldicarb sulfone) with higher toxicity |
H-048 (+ve) | H attached to C2(sp3)/C1(sp2)/C0(sp) | Atom-centered fragment descriptor | — | This type of hydrogen atom is very reactive in nature56 and may exhibit toxicity towards rats, as shown by compounds 166 (tetrakis(hydroxymethyl)phosphonium chloride (THPC)) (displayed in Fig. 5) and 167 (tetrakis(hydroxymethyl)phosphonium sulphate (THPS)) |
MaxdssC (−ve) | Maximum dssC (maximum atom-type E-state: ![]() ![]() |
Atom-type E-state descriptor |
![]() |
The negative regression coefficient of this descriptor explains its inverse relationship with toxicity as observed in compounds 91 (1,3-dichloro-2-propanol) (given in Fig. 5) and 147 (phenylbutazone) |
MaxssssC (+ve) | Maximum ssssC (maximum atom-type E-state: ![]() ![]() |
Atom-type E-state descriptor |
![]() |
This descriptor is responsible for structure complexity,57 leading to the enhancement of the toxicity as seen in compound 47 (thujone) (given in Fig. 5) |
B01[C–F] (+ve) | Presence/absence of C–F at topological distance 1 | 2D atom pair descriptor |
![]() |
Fluorine (halogen) atoms in the compound tend to increase the toxicity profile of molecules (due to the high electronegativity of fluorine)58 as shown in compounds 158 (sodium fluoroacetate) (illustrated in Fig. 5) and 112 (fluometuron) |
B05[O–S] (−ve) | Presence/absence of O–S at topological distance 5 | 2D atom pair descriptor |
![]() |
The presence of oxygen and sulfur increases the hydrophilicity of compounds due to hydrogen bonding,59 resulting in a reduction of toxicity of the chemical compounds. This phenomenon is depicted in compounds 59 (acetoacetamide-N-sulfonic acid) (demonstrated in Fig. 5) and 79 (carmoisine) |
nCconjX (+ve) | Number of X on exo-conjugated C | Functional group count descriptor |
![]() |
This fragment enhances the electronegativity of molecules due to the presence of a halogen atom (X), thus enhancing toxicity.16,24,43 This can be explained by compounds 120 (isopropalin) (illustrated in Fig. 5) and 14 (dodecyl gallate) |
C-026 (+ve) | R-CX-R where X represents the existence of an electronegative atom | Atom-centered fragment descriptor |
![]() |
The occurrence of an electronegative atom (P, O, S, N, Se, halogens) makes the compound more electronegative,16,24,43 which in turn enhances the toxicity of compounds as seen in compounds 135 (octabromodiphenyl ether) (given in Fig. 5) and 150 (promethazine hydrochloride) |
![]() |
||||
The pNOEL model (IM4) | ||||
nCrq (+ve) | Number of ring quaternary C (sp3) | Functional group count |
![]() |
This group is associated with the lipophilic profile of molecules,16,24,43 enabling easy penetration across the cell membrane, thus causing toxicity. This descriptor contributes positively towards the sub-chronic toxicity against rats, which is explained by compounds 32 (thujone) (provided in Fig. 6) and 29 (isobornyl acetate) and vice versa in compounds 12 (ethylbenzene) and 13 (2-ethylbutyric acid) |
H-051 (−ve) | Hydrogen atom attached to alpha-C atom | Atom-centered fragment descriptor |
![]() |
This fragment is associated with the polarity of the compounds.24 This descriptor has a negative correlation with the sub-chronic toxicity of compounds, as inferred from the negative value of the regression coefficient. This was evidenced in compounds 2 (acetone) (displayed in Fig. 6) and 35 (acetoacetamide) |
minssCH2 (−ve) | Minimum ssCH2 (–CH2–) | Atom-type E-state descriptor |
![]() |
The negative regression coefficient associated with minssCH2 (the minimum E-state value of a specific group associated with two single bonds (ss) in a hybrid group (CH2)) indicates a negative correlation with sub-chronic toxicity, as observed in compounds 12 (ethylbenzene) (given in Fig. 6) and 34 (acenaphthene) |
B01[C–C] (+ve) | Presence/absence of C–C at topological distance 1 |
![]() |
This fragment is correlated with the size (long chain) of molecules. Thus, the presence of these fragments may enhance the lipophilicity of the molecules (easily cross the cell membrane),16,24,43 ultimately increasing toxicity. This observation can be explained by compounds 41 (bentazon) and 68 (merphos) (displayed in Fig. 6) | |
B07[C–C] (+ve) | Presence/absence of C–C at topological distance 7 | 2D atom pair descriptor |
![]() |
B07[C–C] fragment is directly correlated with the lipophilicity of the molecules (easily crossing the cell membrane),16,24,43 ultimately increasing toxicity. This phenomenon can be shown in compounds 41 (bentazon) and 68 (merphos) (displayed in Fig. 6) |
B03[C–C] (−ve) | Presence/absence of C–C at topological distance 3 | 2D atom pair descriptor |
![]() |
The presence of such fragments in the molecules reduces the toxicity,49 as shown in compounds 13 (2-ethylbutyric acid) (demonstrated in Fig. 6) and 34 (acenaphthene) |
B04[N–N] (+ve) | Presence/absence of N–N at topological distance 4 | 2D atom pair descriptor |
![]() |
Electronegative atoms (presence of two nitrogen atoms), if present in the structure, may enhance the toxicity of compounds.60 This phenomenon is described in compounds 77 (m-phenylenediamine) (demonstrated in Fig. 6) and 73 (olaquindox) |
B05[C–O] (+ve) | Presence/absence of C–O at topological distance 5 | 2D atom pair descriptor |
![]() |
As discussed in the above section (B04 [N–N] section). This phenomenon is described in compounds 74 (paclobutrazol) and 32 (thujone) (shown in Fig. 6) |
F01[S–P] (+ve) | Frequency of S–P at topological distance 1 | 2D atom pair descriptor |
![]() |
As discussed in the above section (B04 [N–N] section). This phenomenon is described in compounds 88 (tetraethyldithiopyrophosphate) (illustrated in Fig. 6) and 68 (merphos) and vice versa in 48 (cyclodextrin, beta) and 67 (maleic anhydride) |
F02[O–O] (−ve) | Frequency of O–O at topological distance 2 | 2D atom pair descriptor |
![]() |
The presence of two electron-rich atoms may be responsible for electrostatic repulsion,61 thus can reduce compound toxicity. This feature is inversely related to the toxicity of compounds as explained by the compounds 40 (azorubine) (presented in Fig. 6) and 48 (beta cyclodextrin) |
Eta_alpha_A (+ve) | ETA average core count | 2D atom pair descriptor | — | The positive regression coefficient of this feature shows that with an increase in the numerical value of this descriptor, the endpoint (pNOEL value) of compounds will also be increased. For example, compound nos. 49 (1,4-dibromobenzene) (shown in Fig. 6) |
nRCONR2 (−ve) | Number of the tertiary amides (aliphatic) in molecular structure | ETA descriptor |
![]() |
The existence of this group may reduce chemical toxicity (due to hydrophilic interaction since there may be a chance of formation of H-bonding with N, O), as evidenced in compounds 69 (metolachlor) and 79 (propachlor) (shown in Fig. 6) |
![]() | ||
Fig. 3 Mechanistic interpretation of the modeled descriptors against chronic toxicity (pLOEL) in rats. |
![]() | ||
Fig. 4 Mechanistic interpretation of the modeled descriptors against chronic toxicity (pNOEL) in rats. |
![]() | ||
Fig. 5 Mechanistic interpretation of the modeled descriptors against sub-chronic toxicity (pLOEL) in rats. |
![]() | ||
Fig. 6 Mechanistic interpretation of the modeled descriptors against sub-chronic toxicity (pNOEL) in rats. |
For the observation i, SSE is the squared sum of the residuals, in a model with A components, K variables, and N observations. A0 is 1 if the model was centered and 0 otherwise. It is claimed that DModX is approximately F-distributed, so it can be used to check if an observation deviates significantly from a normal PLS model.26,27,43
Sl. no. | DrugBank ID | Generic name | Sl. no. | DrugBank ID | Generic name |
---|---|---|---|---|---|
1 | DB11768 | Zytron | 33 | DB00697 | Tizanidine |
2 | DB12267 | Brigatinib | 34 | DB00878 | Chlorhexidine |
3 | DB00845 | Clofazimine | 35 | DB01243 | Chloroxine |
4 | DB00882 | Clomifene | 36 | DB06234 | Maribavir |
5 | DB09397 | Technetium Tc-99m sulfur colloid | 37 | DB11327 | Dipyrithione |
6 | DB09225 | Zotepine | 38 | DB11632 | Opicapone |
7 | DB00251 | Terconazole | 39 | DB01149 | Nefazodone |
8 | DB09366 | Propyliodone | 40 | DB01233 | Metoclopramide |
9 | DB00239 | Oxiconazole | 41 | DB06237 | Avanafil |
10 | DB01007 | Tioconazole | 42 | DB06480 | Prucalopride |
11 | DB01110 | Miconazole | 43 | DB06155 | Rimonabant |
12 | DB01153 | Sertaconazole | 44 | DB11155 | Triclocarban |
13 | DB08943 | Isoconazole | 45 | DB09063 | Ceritinib |
14 | DB14201 | 2,2′-Dibenzothiazyl disulfide | 46 | DB11995 | Avatrombopag |
15 | DB11691 | Naldemedine | 47 | DB00235 | Milrinone |
16 | DB00373 | Timolol | 48 | DB00360 | Sapropterin |
17 | DB00539 | Toremifene | 49 | DB04864 | Huperzine A |
18 | DB00925 | Phenoxybenzamine | 50 | DB00242 | Cladribine |
19 | DB01403 | Methotrimeprazine | 51 | DB00257 | Clotrimazole |
20 | DB14881 | Oliceridine | 52 | DB00475 | Chlordiazepoxide |
21 | DB01127 | Econazole | 53 | DB00557 | Hydroxyzine |
22 | DB06708 | Lumefantrine | 54 | DB00613 | Amodiaquine |
23 | DB01167 | Itraconazole | 55 | DB00678 | Losartan |
24 | DB00431 | Lindane | 56 | DB00730 | Thiabendazole |
25 | DB00756 | Hexachlorophene | 57 | DB00748 | Carbinoxamine |
26 | DB00295 | Morphine | 58 | DB00800 | Fenoldopam |
27 | DB00844 | Nalbuphine | 59 | DB01131 | Proguanil |
28 | DB06230 | Nalmefene | 60 | DB01215 | Estazolam |
29 | DB11952 | Duvelisib | 61 | DB01608 | Periciazine |
30 | DB08604 | Triclosan | 62 | DB00327 | Hydromorphone |
31 | DB00555 | Lamotrigine | 63 | DB00704 | Naltrexone |
32 | DB00629 | Guanabenz | 64 | DB06800 | Methylnaltrexone |
Mazzatorta et al.7 reported a predictive model of 445 compounds employing multivariate analysis (multiple linear regression or MLR and linear discriminant analysis or LDA) based on two-dimensional physicochemical descriptors. Gadaleta et al.8 reported a study based on the k-NN algorithm for predicting oral sub-chronic toxicity in rats using a training set of 254 chemicals and an external set comprising 179 chemicals. Julian-Ortiz et al.9 reported MLR and LDA models using a diverse set of 234 chemicals (LOAEL values) by using graph-theoretical indices as molecular descriptors. A regression-based model was reported by Mumtaz et al.10 using rat chronic toxicity data and LOAEL as the endpoint. Hisaki et al.11 reported several QSAR models using environmental chemical toxicity data (repeated dose, developmental, and reproductive toxicities) for the NOEL predictions. Toropova et al.4 reported a few regression-based QSAR models for the NOAEL (chronic toxicity) calculation using the Monte Carlo technique. Pradeep et al.12 reported several machine learning-based models (mainly k-nearest neighbors, support vector machine, random forest, and gradient boosting regression) using chronic, sub-chronic, and sub-acute toxicity data. Comparisons with previously reported studies (models) with validation metrics are provided in Table 4.
Work | Model | Endpoint | Internal validation metrics | External validation metrics | ||||
---|---|---|---|---|---|---|---|---|
Model R2 | Model Q(LOO)2 | MAELOO | R pred 2 or Q(F1)2 | Q (F2) 2 | MAEtest | |||
Present work (regression-based) | IM1 (PLS) | LOEL_CHRONIC (≥360 days) | 0.673 | 0.604 | 0.575 | 0.618 | 0.606 | 0.559 |
IM2 (PLS) | NOEL_CHRONIC (≥360 days) | 0.711 | 0.635 | 0.545 | 0.658 | 0.598 | 0.542 | |
IM3 (PLS) | LOEL_SUB-CHRONIC (180 ± 90 days) | 0.607 | 0.547 | 0.464 | 0.562 | 0.537 | 0.546 | |
IM4 (PLS) | NOEL_SUB-CHRONIC (180 ± 90 days) | 0.634 | 0.518 | 0.639 | 0.523 | 0.500 | 0.730 | |
Mazzatorta et al.7 | QSAR models | Chronic_LOAELs | 0.54 | — | — | — | — | — |
Gadaleta et al.8 | k-NN algorithm | Sub-chronic_LOAELs | ≥0.543 | ≥0.632 | — | — | — | — |
de Julian-Ortiz et al.9 | MLR and LDA | Chronic_LOAELs | 0.524 (whole set) | — | — | — | — | — |
Mumtaz et al.10 | Regression model | Chronic_(LOAELs) | — | — | — | 0.84 | — | — |
Hisaki et al.11 | QSAR models | Developmental, and reproductive (NOELs) | — | — | — | — | — | — |
Toropova et al.4 | Monte Carlo technique_regression models | NOAELs | 0.679–0.718 | 0.672–0.712 | — | 0.610–0.627 | — | — |
Pradeep et al.12 | Several machine learning algorithms including k nearest neighbors, support vector machines, random forests, and gradient-boosting regression | Chronic, subchronic, reproductive, developmental, subacute (LEL/LOEL/LOAEL and NEL/NOEL/NOAEL) | (−0.19)–0.54 | — | — | (−0.09)–0.57 | — | — |
In this current study, we developed different QSAR models for risk assessment of chronic toxicity (more than 360 days) and sub-chronic (180 ± 90 days) toxicity data using a large available curated dataset of diverse chemicals such as pharmaceuticals, industrial waste compounds, food, agricultural, natural, and compounds meant for daily use in rats and mice using the LOEL and NOEL as the endpoints and strictly following the OECD guidelines. We considered a higher number of compounds than those considered in previously reported models. We used the genetic algorithm as the descriptor thinning method to extract the vital structural features that are important for the endpoints. We have interpreted the models and found the structure–toxicity relationships that are responsible for chronic toxicity and vice versa. The internal and external validation metrics of the predicted PLS models suggest that the models are reliable, predictive, and mechanistically interpretable with a wide domain of applicability representing diverse groups of chemicals compared to the previous works. It can be inferred that lipophilicity, electronegativity, aromatic ethers or aliphatic oxime groups, the complexity of structures, unsaturation in molecules, and the presence of halogen and heavy atoms (phosphate, sulphurs, etc.) are responsible for the chronic or sub-chronic toxicity, whereas the presence of polar and hydroxyl group in molecules (hydrophilic properties) can reduce the chronic and sub-chronic toxicities. Therefore, this information should be useful for the development of safer and greener chemicals that will maintain bio-diversity. The validated models may be employed for screening, and prioritization of chemicals, pharmaceuticals, and other compounds inside the chemical space of the developed models and can be used for screening of chemical databases and data-gap filling.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3va00265a |
This journal is © The Royal Society of Chemistry 2024 |