Open Access Article
Longzhu Q.
Shen
a,
Fjodor
Melnikov
a,
John
Roethle
b,
Aditya
Gudibanda
c,
Richard S.
Judson
d,
Julie B.
Zimmerman
ae and
Paul T.
Anastas
*ab
aSchool of Forestry and Environmental Studies, New Haven, CT 06511, USA. E-mail: paul.anastas@yale.edu; Fax: +1-203-436-8574; Tel: +1-203-432-5215
bDepartment of Chemistry, Yale University, New Haven, CT 06511, USA
cDepartment of Computer Science, Yale University, New Haven, CT 06511, USA
dU.S. EPA, National Center for Computational Toxicology, RTP NC 27711, USA
eDepartment of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA
First published on 16th September 2016
The NRF2-ARE antioxidant pathway is an important biological sensing and regulating system that responds to xenochemicals. NRF2 senses chemically-caused production of reactive oxygen species (ROS) and electrophilic interactions with chemical species. Upon NRF2 activation, the expression of a wide array of genes will be upregulated to counteract oxidative or electrophilic insults. However, when the external disruption exceeds the inherent resilience of the biological system, cellular damage can occur, eventually leading to cytotoxicity. Induced NRF2 activity in in vitro assays is therefore a signal that a man-made chemical may cause unwanted cellular activity. This was the motivation to derive a chemical design strategy to minimize the risk that new chemicals would perturb this pathway. We constructed a logistic regression model using design variables derived from density functional theory (DFT) calculations and physical properties. The model showed excellent predictive power to distinguish between NRF2-active and inactive chemicals based on the EPA ToxCast high throughput screen (HTS) assay data (tested in the concentration range of 10−3–102 μM). External evaluation showed that the area under the curve (AUC) for the receiver operating characteristic (ROC) of the model is 0.81 and the precision is 0.90. Combining this model with a previously developed cytotoxicity model, we developed a probabilistic design diagram to guide chemical design with the twin goals of minimizing NRF2 antioxidant pathway activity and cytotoxicity. This work initiated a simultaneous design strategy against two toxicity pathways of interest in molecular design research.
Three critical components are needed to build a predictive model of NRF2 activity to be used in green molecular design. First, we need a reliable biological assay that probes the chemically induced activation of the NRF2-ARE pathway. In vitro high throughput screening (HTS) methods answer this need. HTS is a cost-effective technology to examine how chemicals disrupt biological pathways and hence potentially lead to adverse health outcomes. It embodies a paradigm shift in toxicology from using in vivo to using in vitro and in silico testing approaches, as described by the U.S. National Research Council (NRC) in their report on Toxicity Testing in the 21st Century.22,23 In order to evaluate practical approaches to implement the “Tox21” vision, the U.S. National Toxicological Program (NTP), the U.S. Environmental Protection Agency (EPA) and the NIH National Center for Advancing Translational Sciences (NCATS) collaboratively forged a collaborative research partnership. This Tox21 partnership is using HTS methods to test thousands of chemicals in a wide variety of cells, pathways and technologies, relevant to many aspects of chemical toxicity.24–29 The NRF2-ARE antioxidant pathway has been probed using the ToxCast/Tox21 assay battery. Second, we need to construct a design variable space with physically meaningful and accurately calculable parameters. Physical observables and chemically intuitive variables are preferred because of their accessibility from experimental measurements. Additionally, these design variables should also encapsulate mechanistic information regarding the molecular initiating events that lead to the toxicity endpoint of interest. Thirdly, the predictive model needs to be built so that the design variable space can be made searchable and the inverse function of the model can be presented in a probabilistic and complete manner. This requires a careful design of the modeling algorithm.
Previously, acute aquatic toxicity,30,31 mutagenicity/carcinogenicity32 and cytotoxity33 have been investigated as single target toxicity endpoints to derive green molecular design strategies. In this report, we present for the first time in molecular design research a coupled design diagram that simultaneously considers two design goals, in this case the minimization of both cytotoxity and the NRF2-ARE pathway perturbation. This attempt is motivated by the significance that certain chemicals can induce one toxic response without the other. With the progress of molecular design research, it is anticipated that more and more models can and will be linked together to provide a comprehensive design strategy.
ATG_NRF2 is part of a multiplexed assay using an inducible reporter that measures mRNA for gain-of-signal activity at the transcription factor-level as they relate to the gene NFE2L2. This is one of 52 assay component measured or calculated from the ATG_CIS assay. Activity is detected with fluorescence intensity signals by reverse transcription polymerase chain reaction (RT-PCR) and capillary electrophoresis technology. Activity is quantified by the level of mRNA reporter sequence unique to the cis-acting reporter gene response element ARE, which is responsive to the endogenous human nuclear factor, erythroid 2-like 2 [GeneSymbol:NFE2L2|GeneID:4780|Uniprot_SwissProt_Accession:Q16236]. The assay is cell-based and uses HepG2, a human liver cell line, with measurements taken at 24 hours after chemical dosing in a 24-well plate.35
Tox21_ARE uses an inducible reporter to measure gain-of-signal activity related to the gene NFE2L2. The assay measures beta-lactamase induction, as detected with fluorescence intensity signals by GAL4 beta-lactamase reporter gene technology. Changes to fluorescence intensity signals produced from an enzymatic reaction involving the key substrate [CCF4-AM] are indicative of changes in transcriptional gene expression due to agonist activity regulated by NRF2. The assay is cell-based, producing a single-readout that uses HepG2, with measurements taken at 24 hours after chemical dosing in a 1536-well plate.29
The two assays were tested on chemicals with similar concentration ranges, approximately 10−3–102 μM (see Fig. S1 in ESI†). Consequently, our predictive model is strictly applied to the likelihood for chemicals incurring biological responses over this particular concentration range. The testing outcome concordance between these two assays is 0.73 (see Fig. S2 in ESI†). Cytotoxicity was also examined in the ToxCast data. Assay data related to cytotoxicity were used to construct a cytotoxicity model that was linked to the NRF2-ARE toxicity pathway model. Detailed description of the cytotoxicity model is provided in our recent publication.33
1. Molecular weight < 1000. The bioavailability for chemicals with molecular weight greater than 1000 is negligible.36
2. Single compound with a definite structure, excluding geometrical and optical isomers and mixtures. The molecular mechanisms for mixtures can be complex, and are not considered here.
3. Containing no metal elements. The calculation of electronic properties of metals is significantly more challenging than for organics, so the focus here is on the latter.
4. Concordant testing outcomes between ATG_NRF2 and Tox21_ARE assays (both positive or both negative). Concordance is required because both assays have false positives and false negatives, so chemicals with concordant behavior are more likely to be correctly assigned.37
A total of 1261 chemicals satisfied these four criteria. Binary hit calls provided in the ToxCast release34 were used to categorize chemicals into two classes: chemicals detected to be not perturbing the NRF2-ARE toxicity pathway (inactive) and otherwise (active). The chemical membership distribution is biased towards the inactive chemicals, which account for ∼72% of the total sample population.
All chemical data were evenly split into training and test sets with the same ratio between active and inactive chemicals for cross-validation and external evaluation. The first treatment of the chemical structures is to remove counter-ions from the molecules using the open-source OpenBabel chemistry toolbox.38 Subsequently, 3D structures with the lowest energy conformation were generated using the ChemAxon Marvin calculator plugin.39
P (log water–octanol partition coefficient). log
P was calculated using the ChemAxon Marvin calculator plugin.39 The remaining variables were computed based on DFT implemented in Gaussian 09 rev.D.01.40 Boese and Martin's τ-dependent hybrid functional41 and basis set 6-31+G(d) were used for full geometry optimization. Vertical IP and electron affinity (EA) were calculated in vacuum. The species with an extra electron were then transferred into the implicit aqueous environment based on a universal solvation model42 to obtain vertical EA.aq. SOF and EPH were derived according the following formulas.43| SOF = 1/(IP − EA) | (1) |
| EPH = (IP + EA)2/8(IP − EA) | (2) |
Radical stability energy (RSE) is defined as the reaction energy for the reaction:
![]() | (3) |
The numerical distribution of each design variable is plotted in ESI Fig. S3.†
P has been frequently used to estimate such likelihood and has shown acceptable predictivity in quantitative structure–activity relationship (QSARs) of acute aquatic toxicity.46 Subsequent to molecular transport into a cell, xenobiotics proceed to interact with biological targets. This process is driven at least partially by induced electric dipole moments and dispersion forces.47 Polarizability (PLRZ) is a physical quantity that describes the relative tendency towards molecular electron cloud distortion under the influence of an external electric field. This reflects the energy alteration during a molecular recognition process. This provides the physical basis for including PLRZ in the predictive model.
The last variable to mention is radical stability energy (RSE). It approximates the degree to which a chemical is liable to form reactive radicals, which is the trigger for releasing the NRF2–Keap1 interaction. However, the low predictive power of this variable in the model suggests that direct radical activation is not the dominant mechanism of NRF2 activation in chemicals included in this study. It is consistent with the fact that HepG2 cell lines do not possess full metabolic activities48–50 and that biotransformation is an important avenue for generating reactive radicals.51,52
![]() | (4) |
P) were selected based on LASSO regularization54 for the predictive model construction. The logistic regression model was trained on ∼600 chemicals with 10-fold cross validation, i.e. chemicals were randomly partitioned into 10 equal sized groups and a single group is retained as validation set for the model built upon the other nine groups in rotation until every group has been tested. Then the established model was externally tested on the remaining ∼600 chemicals.
Model construction, data analysis, graphics and design diagrams in this study were coded using the Python programming language and the R statistical computing environment.55–67
| precision = truepositive/(truepositive + falsepositive) | (5) |
| accuracy = (truepositive + truenegative)/all | (6) |
Because we are interested in producing ‘safer’ chemicals in the design process, we define chemicals that do not perturb the NRF2-ARE antioxidant pathway as positive in this study. Therefore, type I error is the focus of interest from the green design perspective. Precision measures the probability that a chemical will be a true positive relative to the fraction of chemicals that are predicted to be positive. This probability is of high interest to green chemists. It should be maximized for the purpose of safer chemical design. Accuracy is a measure of the balanced performance of the model. Given that the overall rate for a chemical to be inactive against the NRF2-ARE pathway is imbalanced, the ROC AUC (equivalent to the balanced accuracy) is a more robust measure for the overall classification accuracy of the predictive model. The model performance is summarized in Table 1.
| Measure | Precision | Accuracy | ROC AUC |
|---|---|---|---|
| Cross validation | 0.86 ± 0.04 | 0.80 ± 0.03 | 0.81 ± 0.04 |
| External evaluation | 0.90 | 0.78 | 0.81 |
From Table 1, one can see that three performance indicators are in reasonable agreement. Additionally, the cross-validation and external evaluation results agree well with one other. This suggests that the classifier has reasonable forward predictive performance and can be used for deriving green design probabilities for novel chemicals.
| NRF2-ARE | NRF2-ARE | ||
|---|---|---|---|
| Positive | Negative | ||
| Cytotox | Positive | 209 (30.0%) | 99 (14.3%) |
| Cytotox | Negative | 23 (3.3%) | 364 (52.4%) |
Mechanistically, the NRF2-ARE pathway activation can be triggered by either covalent or redox modifications of the thiol groups on Keap1.15,16 Cytotoxicity can be induced by multiple causes including cell membrane disruption, activation of specific signaling pathways or DNA modification.70,71 Given the multitude of mechanisms of cytotoxicity, it is of interest to investigate the differences between chemicals that induce only one or both of NRF2-ARE activity and cytotoxicity. The following statistical discussion will focus on the chemicals that only cause cytotoxicity (cytotox group, 99 chemicals) or incur both cytotoxicity and NRF2-ARE response (cytotox + ARE group, 209 chemicals).
Table 3 shows the calculated ROC AUC72,73 and p-values from a Mann–Whitney U test74 for the three design variables (SOF, PLRZ and log
P) used in the logistic model. In accordance with the Mann–Whitney U test, SOF strongly suggests that the two groups of chemicals (cytotox and cytotox + ARE group) are drawn from two unequivalent distributions. This is consistent with the ROC AUC result, where SOF exhibits the highest discriminative power among the three design variables. The overlaid SOF histogram (Fig. 1) further illustrates that the distributions of the two chemical groups are staggered, with the cytotox group slightly shifted to the left. Based on the analysis, it follows that the two groups of chemicals have differential selectivity toward invoking the NRF2-ARE pathway and cytotoxicity. The distributions are not totally separated, which is an indication that SOF is not solely responsible for separating the two groups.
| Design variable | ROC AUC | p-Value for Mann–Whitney U test |
|---|---|---|
| SOF | 0.64 | 2.7 × 10−5 |
| PLRZ | 0.53 | 0.19 |
log P |
0.55 | 0.04 |
Many processes leading to toxicity are driven by chemical reactivity toward critical biological targets including DNA, enzymes, lipids and others, and thus should be able to be described by chemical design variables. In the current case, SOF was found to be helpful in discriminating between the cytotox and cytotox + ARE chemical groups. This can be understood from Pearson's HSAB theory75—acid–base reactions preferentially occur between chemicals with a similar level of softness/hardness. Since covalent bond formation plays a crucial role in chemical initiation of cytotoxicity and the NRF2-ARE pathways, it is thus plausible to expect certain discriminative power from SOF in distinguishing the two classes of interest. In a biochemical interaction, active functional groups on proteins and DNA normally act as nucleophiles, while xenobiotics often act as electrophiles.76,77 The sulfhydryl groups on protein cysteine residues are generally softer than the amino and hydroxy groups on nucleic acids.78 In rapidly dividing cells, cytotoxicity has been attributed to chemical modification of DNA.79,80
Interactions between chemicals and critical proteins can also lead to cytotoxicity.45 The thiol group is an important functional unit on many proteins, and is responsible for sensing a variety of the chemical signals. The NRF2 activation mechanism is an example of the thiol group functioning as a molecular switch.81 Therefore, it is reasonable that chemicals with a preference to induce only cytotoxicity may initiate this event primarily through reacting with DNA, which is a ‘harder–harder’ interaction. On the other hand, chemicals that cause both cytotoxicity and NRF2-ARE activity are more likely to achieve this effect through modifying thiol groups,81 which is a ‘softer–softer’ interaction. Thus, it is observed that the SOF distribution for the cytotox chemical group is shifted to the left (harder) compared to the cytotox + ARE group.
![]() | ||
| Fig. 2 Coupled nomograms for NRF2-ARE activity and cytotoxicity.33 P(non-NRF2-ARE): probability of not activating NRF2-ARE (purple). P(non-cytotox): probability of not inducing cytotoxicity (blue). The benign probability axis is colored while other design variables axes are printed black. R1 indicates an auxiliary axis. The numerical range of each design variable axis are calculated using 5 to 95 percentiles of numerical data to produce probabilities corresponding to the range of the benign probability axis. Benign probability above 50% is considered for the green molecular design purpose. | ||
Given the high concordance between cytotoxicity and NRF2-ARE activation, one would expect to observe a degree of similarity between the two diagrams. However, its should be kept in mind that the concordance matrix (Table 2) possesses non-zero off-diagonal elements. This means that the predicted probabilities may vary between the two toxicity endpoints. To obtain a design solution with a target probability of not stimulating the NRF2-ARE pathway and not causing cytotoxicity, one can simply start from desired probabilities for each endpoint and work out values for design variables on the diagrams. A sample solution is given in Fig. 2, using dotted lines. In this example, suppose we have planned probabilities for a chemical not perturbing the NRF2-ARE pathway (73%) and the probability of not incurring cytotoxicity (58%). We start by identifying the positions of these two probabilities on their corresponding probability axes and link them with a dotted line. Next, we put dots on the desired values for softness (0.12) and log
P (2.9) on the design variable axes because those values fit into the feasible design space. Then we connect those dots on the design variable axes with the dots on the probability axes and we arrive at the values for polarizability. This case study shows the different probabilities for non-cytotoxic and non-NRF2-ARE activity for the same set of values of the design variables. Of course, users have the freedom to choose any three fixed points on either diagram to determine the last unknown parameter (either a design variable or a probability). One can also work on one side of the diagram first to determine the benign probability and then to check its probability of not invoking the other endpoint by filling those numbers on the other half of the diagram.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/C6GC02073A |
| This journal is © The Royal Society of Chemistry 2016 |