TRIumph in nanotoxicology: simplifying transcriptomics into a single predictive variable

Viacheslav Muratov; Karolina Jagiello; Tomasz Puzyn

doi:10.1039/D5NH00330J

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D5NH00330J (Communication) Nanoscale Horiz., 2025, 10, 3116-3126

TRIumph in nanotoxicology: simplifying transcriptomics into a single predictive variable

Viacheslav Muratov ^a, Karolina Jagiello *^ab and Tomasz Puzyn *^ab
^aUniversity of Gdansk, Faculty of Chemistry, Laboratory of Environmental Chemoinformatics, Wita Stwosza 63, 80-308 Gdansk, Poland. E-mail: karolina.jagiello@ug.edu.pl; tomasz.puzyn@ug.edu.pl
^bQSAR Lab Ltd., Trzy lipy 3, 80-172 Gdansk, Poland

Received 12th May 2025 , Accepted 26th August 2025

First published on 28th August 2025

Abstract

The primary aim of our study was to address the problem of transcriptomic data complexity by introducing a novel transcriptomic response index (TRI), compressing the entire transcriptomic space into a single variable, and linking it with the inhaled multiwalled carbon nanotubes (MWCNTs) properties. This methodology allows us to predict fold change values of thousands of differentially expressed genes (DEGs) using a single variable and a single quantitative structure–activity relationship (QSAR) model. In the context of this work, TRI compressed 5167 DEGs into a single variable, explaining 99.9% of the entire transcriptomic space. Further TRI was linked to the properties of inhaled MWCNTs using a nano-QSAR model with statistics R² = 0.83, Q_CV² = 0.8, and Q² = 0.78, which show a high level of goodness-of-fit, robustness, and predictability of the obtained model. By training a nano-QSAR model on fold changes of thousands of DEGs using a single variable, our study significantly contributes not only to new approach methodologies (NAMs) focused on reducing animal testing but also decreases the amount of computational resources needed for work with complex transcriptomic data. Developed during this work, the software called ChemBioML Platform (https://chembioml.com) offers researchers a powerful free-to-use tool for training regulatory acceptable machine learning (ML) models without a strong background in programming. The ChemBioML Platform integrates the ML capabilities of Python with the advanced graphical interface of unreal engine 5, creating a bridge between scientific research and the game development industry.

New concepts

We introduce the transcriptomic response index (TRI) – a novel, biologically meaningful metric that compresses complex, high-dimensional gene expression data into a single predictive variable. This approach represents a conceptual breakthrough in the field of nanotoxicogenomics, enabling a direct, quantitative link between the structural properties of nanomaterials and global transcriptomic effects in exposed organisms. In contrast to existing models that either rely on isolated biomarkers or attempt to predict thousands of individual gene responses separately, TRI captures the entire transcriptomic variance (99.9%) in one interpretable parameter. This allows us to construct a robust, regulatorily compliant nano-QSAR model based on standard molecular descriptors (SiRMS), making transcriptomics scalable, reproducible, and predictive. This work uniquely bridges systems biology, chemoinformatics, and machine learning, offering a new paradigm for evaluating nanomaterial-induced biological perturbations. It further advances the principles of new approach methodologies (NAMs) by enabling mechanism-based hazard prediction without the need for animal testing or deep computational resources. The TRI concept opens new directions in nanoinformatics, particularly for safe-by-design strategies, high-throughput screening, and regulatory decision-making in nanotechnology.

Introduction

Given the structural similarities between multiwalled carbon nanotubes (MWCNTs) and asbestos fibers, coupled with their bio-persistence in lung tissue, concerns have been raised about the potential health effects of MWCNT exposure.¹ MWCNTs have been demonstrated to exhibit significant toxicity, with numerous studies indicating adverse health outcomes, particularly in the pulmonary system.^2,3 Inhalation of MWCNTs has been shown to cause sustained inflammation, fibrosis, and lung cancer.^2,3 The fibrogenic activity of MWCNTs has been linked to direct effects on fibroblasts, promoting their proliferation, differentiation, and collagen production.² Additionally, in animal studies, inhalation of MWCNTs has led to fetal malformations, fetal loss, behavioral changes in offspring, and delays in litter delivery.³

Although the adverse effects of MWCNTs are well-documented, the molecular mechanisms driving these effects remain largely unknown. In recent studies, researchers have started applying toxicogenomic approaches to close this gap and investigate the pulmonary toxicity of MWCNTs.^4–8 It was shown that MWCNTs can induce gene expression changes associated with lung fibrosis and inflammation, comparable to those caused by known fibrogenic agents like bleomycin and bacterial infections.⁴ Recent research highlights the potential of gene expression profiling in predicting the pulmonary toxicity of new nanomaterials,^4,5 offering a promising methodology for future research and risk assessment. Building on this genomic insight, advanced computational methods have emerged as powerful tools for correlating material properties with biological outcomes. For instance, Jagiello et al. established a methodology for constructing quantitative structure–activity relationship (QSAR) models based on canonical pathways. The authors proposed a grouping strategy for MWCNTs based on the aspect ratio affecting genes associated with specific pathways.⁶ Following this, Merugu et al. developed a QSAR model to predict canonical pathway induced by MWCNTs and triggering lung fibrosis and atherosclerosis, both initiated through acute immune response.⁷ Fortino et al. highlight the importance of these findings because MWCNTs cause pulmonary damage in the body after intratracheal instillation.⁸

In parallel, perturbation-based machine learning (PTML) modeling has gained traction as a complementary strategy for toxicity prediction. PTML offers a high degree of interpretability and supports the simultaneous modeling of multiple toxicological endpoints across diverse biological systems, including cells, microbes, and animal models. This flexibility is especially advantageous in nanotoxicology, where experimental conditions often vary and biological responses are heterogeneous.^9–12 By integrating diverse datasets, PTML enables multi-endpoint predictions that improve both the efficiency of hazard assessments and the prioritization of materials for further testing. Moreover, its potential to support regulatory decision-making through transparent and mechanistically informed predictions positions PTML as a valuable asset in the next generation of nanotoxicological research.

The objective of our study is to investigate the potential of combining supervised learning methods, such as ridge regression, with unsupervised methods, like principal component analysis (PCA), to generate comprehensive and accurate predictions for the thousands of differentially expressed genes (DEGs) affected by MWCNT exposure in mouse lungs. In contrast to previous approaches, such as the work by Jagiello et al. and Merugu et al., which linked MWCNT properties with the benchmark dose lower confidence limit (BMDL) of a single transcriptomic pathway,^6,7 here, we introduce a novel transcriptomic response index (TRI) derived from the fold change (FC) values of thousands of DEGs. By compressing the high-dimensional transcriptomic space into a single, informative variable, TRI measures all gene perturbations induced by MWCNT exposure. Biologically, a higher TRI value reflects a more pronounced deviation from baseline gene expression, which may correlate with the activation of key toxicological processes such as inflammation, fibrosis, and oxidative stress. We introduce a two-tiered analytical strategy to calculate TRI and link it to the exposure conditions. First, unsupervised PCA compresses the complex transcriptomic data into a comprehensive index (TRI) that describes overall gene expression changes. Subsequently, a supervised ridge regression model is used to correlate the structural properties of the MWCNTs, quantified using simplex representation of molecular structure (SiRMS) descriptors, with the TRI. Although SiRMS descriptors are traditionally applied to small molecules,^13–17 they have also been successfully adapted to predict metal oxide toxicity.¹⁸ Our study is the first to leverage the full version of SiRMS descriptors to predict the transcriptomic response to MWCNT exposure in mouse lungs.

To ensure the reliability and robustness of our predictive models, we developed the ChemBioML Platform – an integrated machine learning software that features robust data preprocessing, a self-optimizing algorithm for selecting optimal descriptors and hyperparameters, and OECD-compatible validation techniques.¹⁹ This platform is designed to generate accurate, reproducible results that satisfy regulatory and academic standards for QSAR and other ML model evaluations. Thus, our work advances scientific knowledge on the molecular mechanisms underlying MWCNT-induced lung toxicity and offers a practical tool for the academic community. This manuscript and the accompanying GitHub repository (https://chembioml.com, https://github.com/chembiodev/ChemBioML-Platform) provide a robust platform for training ML models that can elucidate the complex molecular interactions following MWCNT exposure.

Experimental

Data collection

The lung transcriptomic data from mice exposed to MWCNTs with varying properties (e.g., size, functionalization) have been previously published.^20,21 As this study uses an already published dataset, we recommend referring to the original publications for detailed experimental procedures. Here, only a brief summary is provided. Gene expression profiles were obtained from lung tissues of 5–7-week-old female C57BL/6 mice exposed intratracheally to nine different MWCNTs with varying properties. Samples were collected on one post-exposure day. Each MWCNT was administered at doses of 18 μg per mouse and 54 μg per mouse; additionally, NM-401 and NRCWE-26 were tested at a higher dose of 162 μg per mouse. Table 1 summarizes the names and key characteristics of the MWCNTs studied (e.g., size, concentration). We have used structurally modified MWCNTs with three different groups (–NH₂, –COOH, –OH), in addition to pristine MWCNTs. However, it is important to note that no additional external coatings or surface layers were applied to the MWCNTs in this work; thus, they should be regarded as uncoated nanomaterials. Comprehensive details on sample collection, animal exposure, and transcriptomic analysis are available in the original studies.^20–25 The exposure levels were selected based on the Danish occupational exposure limit of 3.5 mg m⁻³ for Printex90 carbon black particles.²⁶

Table 1 MWCNT types, their physicochemical characteristics, and exposure conditions

#	MWCNT name	Concentration (μg per mouse)	Length (nm)	Diameter (nm)	Aspect ratio	Functionalization
NP_1	NM-401	18	4050 (±2400)	67 (±26.2)	61	Pristine
NP_2	NM-401	54	4050 (±2400)	67 (±26.2)	61	Pristine
NP_3	NM-401	162	4050 (±2400)	67 (±26.2)	61	Pristine
NP_4	NRCWE-26	18	850 (±457)	11 (±4.5)	77	Pristine
NP_5	NRCWE-26	54	850 (±457)	11 (±4.5)	77	Pristine
NP_6	NRCWE-26	162	850 (±457)	11 (±4.5)	77	Pristine
NP_7	NRCWE-043	18	771.3 (±3471)	26.73 (±6.88)	29	Pristine
NP_8	NRCWE-043	54	771.3 (±3471)	26.73 (±6.88)	29	Pristine
NP_9	NRCWE-044	18	1330 (±2454)	32.55 (±14.44)	41	–OH
NP_10	NRCWE-044	54	1330 (±2454)	32.55 (±14.44)	41	–OH
NP_11	NRCWE-045	18	1553 (±2954)	28.07 (±13.85)	55	–COOH
NP_12	NRCWE-045	54	1553 (±2954)	28.07 (±13.85)	55	–COOH
NP_13	NRCWE-046	18	717.2 (±1214)	17.22 (±5.77)	42	Pristine
NP_14	NRCWE-046	54	717.2 (±1214)	17.22 (±5.77)	42	Pristine
NP_15	NRCWE-047	18	532.5 (±591.9)	12.96 (±4.44)	41	–OH
NP_16	NRCWE-047	54	532.5 (±591.9)	12.96 (±4.44)	41	–OH
NP_17	NRCWE-048	18	1604 (±5609)	15.08 (±4.69)	106	–COOH
NP_18	NRCWE-048	54	1604 (±5609)	15.08 (±4.69)	106	–COOH
NP_19	NRCWE-049	18	731.1 (±1473)	13.85 (±6.09)	53	–NH₂
NP_20	NRCWE-049	54	731.1 (±1473)	13.85 (±6.09)	53	–NH₂

To create a dataset for our study, we have filtered differentially expressed genes (DEGs) from two studies^20,21 with the criteria of p-value ≤ 0.05 and an absolute FC ≥ 1.5.^{21,22,27–30} DEGs that changed significantly in response to at least one MWCNT were filtered. Although a stricter threshold, such as |FC| ≥ 2, has been used in other studies,^31,32 we opted for |FC| ≥ 1.5 to remain consistent with the analysis approach used in the original datasets.^20,21 This ensured methodological alignment with the initial design of the transcriptomic data. This process resulted in a curated dataset containing 5167 DEGs with post-exposure expression data. For each of the 9 MWCNTs, gene expression was measured under 2 or 3 dosing conditions – 18 μg per mouse and 54 μg per mouse for all MWCNTs, with NM-401 and NRCWE-26 additionally tested at 162 μg per mouse – yielding a total of 20 distinct experimental conditions (see Table S1).

Unsupervised machine learning methods applied to TRI calculation

To reduce the dimensionality of the dataset containing all DEGs, we applied principal component analysis (PCA).^33,34 This method transforms correlated genes into a set of orthogonal variables, known as principal components (PCs), which are ranked by the amount of variance they capture from the data. The first principal component (PC1) captures the greatest variance in the data, with each subsequent component capturing progressively less. This process simplifies high-dimensional data, making it easier to visualize and analyze.³⁵ After performing PCA on the transcriptomic dataset, we obtained a set of PCs that capture independent dimensions of variability in the gene expression data. Each PC represents a distinct pattern of transcriptomic response, with the first few PCs explaining the majority of variance and subsequent PCs capturing more subtle yet potentially informative variations. By weighting each PC by its explained variance and summing over all PCs, the TRI integrates both dominant and nuanced expression patterns into a single composite metric. The calculation follows eqn (1), where n represents the total number of PCs. This weighted approach provides insight into the significance of each PC in determining the overall transcriptomic response.


TRI = PC1 × Variance₁ + PC2 × Variance₂ + ⋯ + PCn × Variance_n	(1)

Descriptors for carbon-based nanomaterials

In this study, we employed the SiRMS method to numerically characterize the investigated MWCNTs. SiRMS is a chemoinformatic tool widely applied in QSAR modeling to describe and analyze molecular structures, focusing strongly on stereochemistry. It represents molecules’ assets of simplex descriptors – tetratomic fragments defined by fixed composition, structure, chirality, and symmetry – effectively capturing the chemical properties of atoms and their interactions.^13–18 Integrating diverse physicochemical properties of atoms, such as electronic charge and lipophilicity, enhances the interpretability of QSAR models and provides comprehensive insights into molecular interactions after MWCNTs exposure.^36–38

For our calculations, we applied the open-source SiRMS Python package, available on GitHub [https://github.com/DrrDom/sirms]. This package allows weighting descriptors by various physicochemical properties, including electronic charge, lipophilicity (LogP), refractivity, and hydrogen bonding.^36–38 A concentration-dependent weighting of the simplex descriptors was applied to account for the varying concentrations of MWCNTs in the experiments (18 μg, 54 μg, and 162 μg per mouse). Specifically, we used the 18 μg dose as a baseline for concentration-dependent weighting. Since 54 μg is three times 18 μg and 162 μg is nine times 18 μg, the corresponding descriptors were multiplied by factors of 3 and 9, respectively. This concentration-weighting approach integrates the concentration of inhaled MWCNTs into the molecular descriptors, ensuring that they reflect the dose-dependent variations in the organism's reaction.

Supervised machine learning methods for linking TRI with MWCNTs’ structural properties

To ensure robust model performance, we have split the set of 20 data points from Table 1 into training and external validation sets. Four data points were randomly selected and excluded from the training set. These four data points represent the whole dataset, ensuring a reliable evaluation procedure. Further, these data points were used only to evaluate the external predictive performance of the model.

Ridge regressor was used with the genetic algorithm (GA) feature selection procedure to train the models predicting TRI. Ridge is a linear regression method that applies L2 regularization during the training of the model, allowing us to use cross-correlated descriptors³⁹ and to use a highly dimensional space of independent variables⁴⁰ for modeling. Furthermore, the GA is a feature selection method that allows us to identify the best set of descriptors for training the nano-QSAR model.⁴¹

A comprehensive validation was performed to address possible overfitting, which can occur when training the model on such a small dataset. The goodness-of-fit of the models was evaluated using the coefficient of determination (R²) and the root mean square error of calibration (RMSE_C). The robustness of the model was assessed through leave-one-out cross-validation (CV) using the coefficient of determination for CV (Q_CV²) and root mean square error for CV (RMSE_CV). Predictive performance was validated through the external validation coefficient and the root mean square error of external validation (RMSE_EXT), together with the coefficient of determination for the external predictions (Q²). The applicability domain of the models, defined by the range of responses and structural feature descriptors, was analyzed using the leverage approach and visualized with a Williams plot. This method is widely used in the evaluation of AD in QSAR modeling with well-defined, previously described algorithms.^42,43

Results & discussion

Transcriptomic-based data

Our study mainly aimed to explore the possibilities of applying unsupervised and supervised ML methods to compress and predict expression changes for thousands of DEGs using a single nano-QSAR model. We started the analysis by combining datasets from previously published studies.^20,21 First, the datasets were merged, and the DEGs with gaps for some MWCNTs were removed. Subsequently, we selected gene expression profiles from lung tissues of female C57BL/6 mice (aged 5–7 weeks) that were intratracheally exposed to MWCNTs, retaining only those genes with a p-value ≤ 0.05 and an absolute FC ≥ 1.5 in at least one MWCNT condition.^{21,22,27–30} By applying this filtering, we got a set of 5167 DEGs significantly affected by the MWCNTs exposure in the lungs (Table S1).

DEGs compression into a single TRI

To address the challenges posed by highly multidimensional and complex transcriptomic data comprising 5167 DEGs as in our study (Table S1), we developed the TRI and then linked this index with the properties of MWCNTs via a single nanoQSAR model. Traditional computational techniques typically train separate models for each gene, treating each DEG as a separate endpoint.⁴⁴ While feasible for smaller gene sets, this approach becomes computationally expensive when applied to datasets involving thousands of genes. In this context, TRI provides a single endpoint containing all the extractable information about the contributing DEGs.

The dimensionality reduction achieved by consolidating 5167 gene-specific endpoints into a single TRI variable substantially streamlines the modeling process. Rather than training thousands of individual nano-QSAR models, one for each DEG, the TRI approach enables the training of a single, integrated model. Beyond computational efficiency, the TRI approach enhances interpretability and generalizability. By summarizing the global transcriptomic response into a compact form, the TRI is better suited for predictive modeling and can be readily integrated with other omics data or phenotypic outcomes. This systems-level representation is especially advantageous in contexts where understanding coordinated gene activity is more informative than isolated gene-level effects.

To compute TRI, first, the PCA was applied to the dataset of 5167 DEGs. The calculated values of all principal components and the corresponding TRI scores are provided in Table S2. The first 19 PCs (PC1 through PC19) were selected for TRI calculation, as they collectively explain ∼99.9% of the total variance in the over 5000 gene expression matrix. This indicates that they capture nearly the entire structure of the dataset, making them a robust foundation for dimensionality reduction and feature integration.

In contrast, retaining only the first two PCs captures just ∼43% of the total variance, which would lead to substantial information loss and potentially weaken model performance. Nevertheless, PC1 and PC2 were utilized to construct a simplified two-dimensional “transcriptomic space” to visualize and explore similarities between different MWCNTs based on DEG fold-change patterns (Fig. 1). This 2D representation provides an intuitive overview of the underlying transcriptomic relationships, while the full set of 19 PCs ensures high-fidelity data representation for TRI construction.


	Fig. 1 Two-dimensional transcriptomic space, represented as a line map PC1:PC2.

Examination of Fig. 1 reveals that most inhaled MWCNTs cluster in the negative regions of both PCs, indicating a high degree of similarity. However, differences are evident; PC2 predominantly reflects the variance associated with DEGs influenced by pristine MWCNTs. In contrast, PC1 exhibits a cluster comprising multiple MWCNT variants with low aspect ratios, including those with –OH and –COOH functionalization and pristine forms. This distribution indicates that using either PC1 or PC2 alone as the endpoint for the nano-QSAR model is insufficient, as neither fully captures the transcriptomic response of all studied MWCNTs. To overcome this limitation, we computed the TRI, which integrates 19 principal components weighted by their respective explained variance. This approach ensures that TRI captures the multidimensional structure of the data, providing a more complete and biologically meaningful representation of global transcriptomic shifts across all studied MWCNTs.

TRI is a highly interpretable and reversible PCA-based index. As a linear combination of gene expression values, TRI enables the prediction of global transcriptomic shifts while maintaining the ability to trace individual gene contributions. Specifically, genes with positive loadings (Table S3) will be upregulated as TRI increases, whereas those with negative loadings will be downregulated. This interpretability makes TRI not only transparent but also biologically informative.

In addition to offering high interpretability through its linear structure and transparent formulation, the TRI is also straightforward to validate. Although unsupervised methods based on principal component analysis (PCA) inherently lack explicit quantitative performance metrics such as R² or RMSE, TRI benefits from robust qualitative validation. In our study, this included careful preprocessing of the transcriptomic dataset by filtering genes based on stringent significance thresholds (p-value ≤ 0.05 and |fold change| ≥ 1.5), ensuring the absence of outliers via inspection of the PC1–PC2 projection (Fig. 1), and confirming that TRI effectively captures the transcriptomic variance observed across all relevant differentially expressed genes (Table S1). These steps collectively reinforce confidence that TRI accurately represents the underlying biological signal.

Nevertheless, we recognize that the ideal number and configuration of principal components may differ depending on biological context, dataset heterogeneity, and modeling goals. The fixed use of 19 components in this study was tailored to balance comprehensiveness with interpretability. However, for broader applications, such as large-scale, multi-condition experiments or cross-species comparisons, further refinement of the TRI framework may enhance its performance and generalizability. Potential improvements could include exploring alternative dimensionality reduction approaches (e.g., independent component analysis, factor analysis), integrating non-linear manifold learning techniques such as t-SNE or UMAP, or employing data-driven component selection based on cross-validated model performance. Such refinements would not only optimize model efficiency but also strengthen TRI's utility as a generalizable tool for high-dimensional transcriptomic analysis.

Besides recognized limitations, we introduced the TRI approach as a novel framework for integrating chemical structure descriptors with transcriptomic data. It is important to contextualize how this approach compares with existing methods in the field. While traditional QSAR models can be effective for specific endpoints (Lethal Dose 50, e.g.), they often struggle to capture complex biological effects. They can suffer from well-known QSAR limitations (e.g., limited applicability domain and the “activity cliff” problem).⁴⁵ On the other end of the spectrum, recent advanced methods leverage high-dimensional transcriptomic data directly. For example, deep learning frameworks have been developed to predict entire gene expression profiles from chemical structures.^46,47 Such methods (e.g., CIGER and DeepCOP) use graph neural networks or other architectures to forecast genome-wide expression changes and have shown impressive performance on large datasets.^46,47 Our TRI approach offers a complementary strategy: instead of predicting thousands of individual gene responses, we condense the transcriptomic effect into an interpretable index and model that index using molecular descriptors for the inhaled MWCNTs.

Properties of inhaled MWCNTs and SiRMS descriptors for the carbon-based nanomaterials

Motivated by extensive research underscoring the impact of MWCNTs’ physicochemical properties on toxicity and the need for further investigation into their mode of action (MoA), we focused on calculating comprehensive descriptors for the experimentally investigated nanotubes. We combined nine different MWCNTs, each tested at various concentrations, to generate a total of 20 data points (Table 1). The size of these MWCNTs varies from 532.5 to 4050 nm (length) and 11 to 67 nm (width). The studied MWCNTs have four different functionalization options – pristine (unmodified), –OH, –COOH, and –NH₂. Testing MWCNTs at concentrations from 18 μg to 162 μg per mouse increased the training set size, which is crucial for QSAR models, and enabled dose–response analysis. Dose–response modeling is essential in nanotoxicity studies,⁴⁸ and by integrating these data with the MWCNTs’ properties, our approach enables quantification of the dose-dependent impact on the transcriptomic response. This combination promises a more comprehensive understanding of the MoA of pulmonary toxicity induced by inhaled MWCNTs compared to analyzing them separately.

SiRMS descriptors were calculated for the studied MWCNTs. This method is very commonly used for small molecules due to its ability to balance between the accuracy of predictions and the very comprehensive physicochemical interpretation of the obtained QSAR models.^13–17 The SiRMS methodology partitions the molecular structure into simplexes of 1 to 6 atoms, effectively capturing the molecule's fundamental structural elements. These simplexes represent essential features such as functional groups (e.g., hydroxyl, carboxyl groups) and bonding environments (e.g., single, double, and triple bonds). By decomposing the molecule in this manner, SiRMS enables identifying and quantifying specific chemical environments critical to the material's physico-chemical behavior. This detailed representation enhances the model's ability to predict molecular interactions with biological systems, thereby providing valuable insights into the MoA of pulmonary toxicity induced by inhaled MWCNTs.

Previously, SiRMS descriptors were calculated for metal-based engineered nanomaterials (ENMs) in a simplified form, utilizing a one-dimensional method that provided only a general overview of the structures.¹⁸ The simplification in this case was necessitated by the uncertainty surrounding the crystal structure of the ENMs used.¹⁸ In contrast, for MWCNTs, we do not face such structural limitations. This has allowed us to compute two-dimensional SiRMS descriptors, which offer significantly more profound insights into the studied structures.

Moreover, we leveraged the ability of the SiRMS methodology to weight descriptors by specific properties, including charge, LogP, refractivity, and hydrogen bonding, as detailed in the study published by the script's author.^36–38 Additionally, to account for the varying concentrations of MWCNTs used in our study, we implemented a concentration-dependent weighting of the simplex descriptors. Precisely, the number of simplexes represented in each descriptor was scaled proportionally to the amount of inhaled MWCNTs: for a concentration of 18 μg per mouse, the initial set of simplexes was used; for 54 μg per mouse, the descriptors were multiplied by a factor of three; and for 162 μg per mouse, they were scaled by a factor of nine. This weighting approach assumed that it reflects the proportional increase in exposure and thus enables the model to better account for dose-dependent biological responses. As a result, the complete set of approximately 3500 SiRMS descriptors calculated is provided in Table S4. These descriptors were subsequently utilized for training and interpreting the nano-QSAR model, which predicts changes in thousands of DEGs compressed in the TRI.

In this study, descriptor values for MWCNT exposures were linearly scaled based on concentration levels (18, 54, and 162 μg per mouse), under the assumption of a linear dose–response relationship. This approach was intended to approximate proportional biological effects corresponding to increasing concentrations. However, biological responses to nanomaterials can be nonlinear or exhibit threshold behaviors, and therefore, this linear scaling assumption requires future experimental validation. Further research should investigate nonlinear modeling approaches to assess the validity and limitations of this assumption.

Applying ML algorithms to link TRI to the MWCNTs’ structural properties

Prediction of TRI using the structural and dose differences is one of the main steps of this work, allowing for a better understanding and describing the MoA of pulmonary pathologies induced by MWCNTs inhalation. QSAR is a well-established and regulatory recognized technique that enables the prediction of biological outcomes from the interaction of a body or a cell with certain chemicals, as well as with ENMs.^{13,14,16,19,36–38,42,45,49–54} With proper validation and interpretation, QSAR can provide powerful insights into the MoA of different biological processes after interaction with chemicals or ENMs, which led us to use this method in our study. To obtain such models, we have integrated a multi-step modeling procedure into our developed software, ChemBioML Platform (https://github.com/chembiodev/ChemBioML-Platform). First, the data set of descriptors was pre-processed – we have removed all the descriptors with constant values. Conventional machine learning approaches, such as multiple linear regression (MLR), often force researchers to discard valuable descriptors when their cross-correlation exceeds a certain threshold (e.g., R > 0.8). This can result in the loss of important predictive information. To overcome this limitation, we employed ridge regression, which leverages L2 regularization to effectively manage multicollinearity. By penalizing large coefficients rather than removing correlated descriptors entirely, ridge regression preserves more information, improving both the robustness and predictive power of the resulting model.³⁹ Due to this upgrade, we have left the descriptors with high statistical correlation values.

To prove the obtained model's proper predictability, we randomly selected four data points and moved them to an external validation set. These data points represent the whole set and are not used in model training or any other model optimization procedure. The rest of the 16 data points were used for training and internal validation of the model. Predictions of the training compounds were evaluated with R² and RMSEc metrics, and the model showed a good goodness-of-fit. The robustness of the model was evaluated with the same metrics for CV and showed promising results. The Q² metric, an analog of R² for the training set, and RMSE_EXT were calculated for the predictability evaluation. These statistics can be seen in Table 2. Overall assessment of them shows a good level of goodness-of-fit, robustness, and external predictability. It is important to note that not only does the absolute value of these metrics matter, but the difference between R² and Q² also mustn’t exceed 0.3.^49,52 In the case of our nano-QSAR model, the difference is less than 0.1, which indicates that the model is not overfitted. A plot showing the distribution of predicted and observed values is shown in Fig. 2a. Although the model evaluation employed both external validation and CV, we acknowledge that the relatively small sample size (20 data points) may limit the generalizability of the findings. Further validation using larger, independent datasets is necessary to fully establish the model's predictive performance and mitigate overfitting risks.

Table 2 Statistics of the trained nano-QSAR model predicting TRI

Statistic	Value
R ²	0.83
RMSE_C	0.15
Q _CV ²	0.8
RMSE_CV	0.16
Q ²	0.78


	Fig. 2 (a) Observed-predicted plot presenting predictions’ accuracy in two-dimensional space; (b) Williams plot presenting the model AD.

Completing the pre-processing steps, we have applied a combination of GridSearch and genetic algorithm (GA) to search for the best ridge hyperparameters and descriptors to train the nano-QSAR model. As a result, we got the ridge regression model eqn (2), a linear regression equation linking the TRI to MWCNT structural properties with good goodness-of-fit, robustness, and predictability.


TRI = 0.53 × CHARGE_{A–A–B–B–A–A} − 0.57 × CHARGE_{B–B–B–C} + 0.37 × REFRACTIVITY_B.C − 0.46 × CHARGE_A.A	(2)

An essential criterion in QSAR model evaluation is the applicability domain (AD).¹⁹ This method allows us to identify how similar the structures from validation or prediction are similar to those used for the model training. Also, it helps to identify the outliers from training or validation sets that are structurally similar but are predicted inaccurately. To define the AD for our nano-QSAR model, we constructed a Williams plot that displays both the structural differences of the MWCNTs via leverages and the accuracy of predictions via standardized residuals (Fig. 2b).

In Fig. 2b, we can see that all predictions for training and validation sets have absolute standardized residuals less than ±3. This indicates that there are no significant outliers – points for which the model's predictions are incorrect, even if the structure is similar to others. One data point, NP_3, exhibits a leverage of h = 0.85 (Table S5), indicating that it is structurally distinct from the other MWCNTs in the training set. NP_3 corresponds to the largest MWCNT (NM-401) tested at 162 μg per mouse concentration. While this is not a significant issue, provided that the prediction is accurate, further investigation of other MWCNTs at similar concentrations could be valuable for advancing both experimental and computational nanotoxicity studies. Such an extension of the existing data will provide essential data for filling the gap in the applicability of models predicting the transcriptomic response to MWCNTs inhalation.

Thus, the developed software ChemBioML Platform successfully integrated essential capabilities required for training the modern QSAR model. It includes comprehensive data preprocessing, a GA-based feature selection algorithm, a hyperparameter optimizer, and a ridge regressor for training robust, well-predictive QSAR models. Applying L2 regularization addresses common limitations like predictor multivariance and high predictor space.^39,40 Furthermore, it evaluates the robustness and predictivity of the trained models by applying CV, external validation, and AD. Additionally, the ChemBioML Platform offers a graphical representation of the obtained models, as shown in Fig. 2, and demonstrates a significant optimization in combination with overall TRI methodology. A single model training, including feature selection and validation, took just 2 [thin space (1/6-em)] :40 minutes with the use of an AMD Ryzen 5 3500U CPU (which is a usual processor equipped in office laptops). For example, if we were training models for each gene separately, it could take ∼230 hours with the same computer, which showcases a strong decrease in the amount of computational resources needed for work with complex transcriptomic data.

In the nano-QSAR equation (eqn (2)), it is evident that descriptors weighted by charge exert the most significant impact on the model's predictions. This might be expected, as the charge is a well-known MWCNT parameter, affecting their toxicity.^55,56 Zeinabad et al. investigated the impact of the charge on the interaction between CNTs and biological systems such as cells and proteins, using the example of Tau protein. They hypothesized that the interaction mechanism depends on the charge distribution and protein surfaces.⁵⁷ Our study extends this understanding by demonstrating that the charge of MWCNTs may significantly influence biological outcomes following their inhalation. Furthermore, while “CHARGE_{A–A–B–B–A–A}” has a negative coefficient, the coefficients for other charge-weighted descriptors differ, suggesting that distinct charged regions contribute oppositely to the transcriptomic response. It is well known that the charge distribution on MWCNTs is uneven, higher in the tube ends.⁵⁸ Our model indicates that regions with differing charges can exert opposing effects on the transcriptomic response. Notably, the descriptors “CHARGE_{A–A–B–B–A–A}” and “CHARGE_A.A” predominantly characterize pristine MWCNTs, whereas “CHARGE_{B–B–B–C}” and “REFRACTIVITY_B.C” are more common in functionalized variants; no descriptor was found to be exclusive to any specific functional group (Table S5). This suggests that the possible impact of the functional groups, such as –OH, –COOH, and –NH₂ can be indirect, thereby affecting the charge and refractivity of the MWCNTs, which are crucial for the transcriptomic response. Importantly, in this study, we have used functionalized but uncoated MWCNTs. In the case of preparing nano-QSAR models for coated ENMs, an additional set of descriptors can be calculated by the Dragon software⁴³ or using the SiRMS method. This approach would ensure that possible parameters of the additional molecules are counted during the modelling.

Thus, our results highlight that charge-weighted descriptors (i.e., descriptors reflecting the distribution of partial charges in a structure) play a significant role in predicting transcriptomic responses. This finding suggests a mechanistic link between MWCNTs’ electronic structure and the genes they perturb. One plausible explanation is that charge distribution influences how a molecule interacts with cellular components. For instance, highly electrophilic MWCNTs are known to activate cellular stress pathways. Literature examples support this: reactive electrophiles can trigger robust gene expression responses such as heat-shock protein induction⁵⁹ and the activation of detoxification genes via the Nrf2-mediated electrophile response element.⁶⁰ Our observation that charge-based descriptors are predictive aligns with these examples – molecules or materials bearing charged or polar functional groups may more readily form covalent bonds or induce oxidative stress, leading to upregulation of defensive transcripts. Conversely, compounds with certain charge features might have reduced cell permeability or distinct subcellular localization, affecting which genes are up- or down-regulated. For example, strong polar or ionic compounds may accumulate in specific cellular compartments or interact with membrane proteins, eliciting unique transcriptomic signatures. This notion is consistent with general toxicological principles: baseline cytotoxicity often arises from hydrophobic compounds integrating into membranes,⁶¹ whereas specific gene responses often correlate with reactive functionalities. In line with this, our model's emphasis on charge-weighted descriptors echoes findings from other QSAR studies that identified surface charge properties as key determinants of bioactivity.⁶²

In addition to the TRI combined from PC1–19, we have tested the predictability of the index calculated with PC1–5, PC1–10, and PC1–15. The graph shown in Fig. 3 represents that R² and Q_CV² significantly increased for the TRI combined from PC1–19. This showcases that in our study, the full TRI not only explains 99.9% of the transcriptomic space consisting of 5167 genes, but also gives the best opportunities for training nano-QSAR models predicting the transcriptomic responses to MWCNTs exposure in lungs.


	Fig. 3 Estimation of R² and Q_CV² for TRI combined from PC1–5 to PC1–19.

Although the developed model provides a powerful framework for exploring the relationship between MWCNTs' characteristics and biological responses, it has limitations. One primary limitation of this study is the small sample size of compounds, which constrains the diversity of chemical space explored. A limited number of MWCNTs increases the risk of overfitting and reduces confidence that the observed structure–response relationships will hold broadly. Indeed, prior analyses have shown that using too few MWCNTs can yield unstable gene expression signatures and poor predictive performance.⁶³ Consequently, the AD of our model is narrow; the predictions may not generalize well beyond the MWCNTs represented in the training set.⁶² In our study, we provide and analyze the AD using the Williams plot from Fig. 2b. For all new predictions, AD also has to be calculated based on the studied MWCNTs structures, and only the predictions inside AD can be considered as accurate and used for further analysis. This restricted domain makes the model highly specialized and sensitive to outliers and underscores the need for a cautious interpretation of the results. Future experiments should expand the number and chemical diversity dataset to address this limitation. For example, including a broader range of MWCNTs would help ensure that the learned relationships are not idiosyncratic to a small subset. Likewise, collecting more samples per nanotube (e.g., across different concentrations) could improve the robustness of transcriptomic signals and reduce experimental variability. By explicitly acknowledging the sample size issue, we emphasize that further validation on larger, independent datasets is essential before the model's predictions can be generalizable. This limitation is not unique to our work – it mirrors challenges in many toxicogenomic modeling studies where a paucity of data can undermine external predictivity.⁶²

Conclusions

We have successfully developed the methodology that allows the compression of extensive transcriptomic data into a single endpoint, the so-called TRI, and linked it with the structural parameters of the studied MWCNTs. The trained nano-QSAR model can use TRI to predict overall transcriptomic patterns in response to MWCNTs inhalation with a very high level of predictability. Furthermore, our initial application of SiRMS descriptors in training nano-QSAR models for MWCNTs demonstrates significant potential for enhancing our understanding of the mode of action underlying interactions between inhaled MWCNTs and biological systems. Thus, the integration of PCA and ridge regression within the ChemBioML Platform showcases substantial potential for computational nano-toxicology applications. The proposed TRI showcases a strong potential for further research in the field of computational nanotoxicity, addressing the need to reduce animal testing⁶⁴ and computational resource usage in designing safe-by-design nanomaterials. In the meantime, the developed ChemBioML Platform (https://chembioml.com, https://github.com/chembiodev/ChemBioML-Platform) provides a powerful free-to-use tool for all researchers, thereby making ML research more accessible for laboratories without a strong programming background.

Author contributions

VM and KJ conceptualization. VM performed formal analysis, including data curation, the development of the model, and validation. VM designed and wrote the manuscript with the support of KJ. TP review and editing. TP and KJ funding acquisition and project administration. All authors discussed the results and commented on the manuscript. VM conceptualized, developed, and supports the ChemBioML Platform.

Conflicts of interest

There are no conflicts to declare.

Data availability

The data supporting this article have been included as part of the SI. Supplementary information: Table S1. Fold change values of 5167 DEGs in response to 20 MWCNTs in different combinations. Table S2. Values of 19 PCs with calculated TRI and explained variance. Table S3 Loading values of 5167 DEGs in PC1–19. Table S4. Set of SiRMS descriptors calculated for 20 MWCNTs. Table S5. Model statistics and the selected SiRMS descriptors. See DOI: https://doi.org/10.1039/d5nh00330j

Acknowledgements

This work was funded via the National Science Centre in the frame of the TransNANO project (UMO-2020/37/B/ST5/01894). Figures for this work were prepared using the software https://BioRender.com. VM acknowledges Epic Games, Inc. for providing access to Unreal Engine 5 under their standard licensing terms.

Notes and references

S. S. Gupta, K. P. Singh, S. Gupta, M. Dusinska and Q. Rahman, Nanomaterials, 2022, 12(10), 1708 Search PubMed.
G. Vietti, D. Lison and S. van den Brule, Part. Fibre Toxicol., 2016, 13, 11 CrossRef PubMed.
N. Kobayashi, H. Izumi and Y. Morimoto, J. Occup. Health, 2017, 59, 394–407 Search PubMed.
J. Nikota, A. Williams, C. L. Yauk, H. Wallin, U. Vogel and S. Halappanavar, Part. Fibre Toxicol., 2016, 13, 25 Search PubMed.
C. Nishida, H. Izumi, T. Tomonaga, J. Takeshita, K.-Y. Wang, K. Yamasaki, K. Yatera and Y. Morimoto, Nanomaterials, 2020, 10, 2032 Search PubMed.
K. Jagiello, S. Halappanavar, A. Rybińska-Fryca, A. Willliams, U. Vogel and T. Puzyn, Small, 2021, 17(15), 2003465 CrossRef CAS PubMed.
S. Merugu, K. Jagiello, A. Gajewicz-Skretna, S. Halappanavar, A. Willliams, U. Vogel and T. Puzyn, Small, 2025, 2501185 CrossRef CAS.
V. Fortino, P. A. S. Kinaret and M. Fratello, et al. , Nat Commun., 2022, 13, 3798 Search PubMed.
R. Concu, V. V. Kleandrova, A. Speck-Planche and M. N. D. S. Cordeiro, Nanotoxicology, 2017, 11, 891–906 CrossRef CAS PubMed.
F. Luan, V. V. Kleandrova, H. González-Díaz, J. M. Ruso, A. Melo, A. Speck-Planche and M. N. D. S. Cordeiro, Nanoscale, 2014, 6, 10623 RSC.
V. V. Kleandrova, F. Luan, H. González-Díaz, J. M. Ruso, A. Melo, A. Speck-Planche and M. N. D. S. Cordeiro, Environ. Int., 2014, 73, 288–294 CrossRef CAS PubMed.
V. V. Kleandrova, F. Luan, H. González-Díaz, J. M. Ruso, A. Speck-Planche and M. N. D. S. Cordeiro, Environ. Sci. Technol., 2014, 48, 14686–14694 Search PubMed.
P. Polishchuk, E. Mokshyna, A. Kosinskaya, A. Muats, M. Kulinsky, O. Tinkov, L. Ognichenko, T. Khristova, A. Artemenko and V. Kuz’min, Structural, Physicochemical and Stereochemical Interpretation of QSAR Models Based on Simplex Representation of Molecular Structure, 2017, pp. 107–147 Search PubMed.
V. M. Alves, T. Bobrowski, C. C. Melo-Filho, D. Korn, S. Auerbach, C. Schmitt, E. N. Muratov and A. Tropsha, Mol. Inf., 2020, 40(1), 2000113 CrossRef PubMed.
K. Khan, V. Kumar, E. Colombo, A. Lombardo, E. Benfenati and K. Roy, Environ. Int., 2022, 170, 107625 CrossRef CAS.
V. Kuz’min, A. Artemenko, L. Ognichenko, A. Hromov, A. Kosinskaya, S. Stelmakh, Z. L. Sessions and E. N. Muratov, Struct. Chem., 2021, 32, 1365–1392 CrossRef PubMed.
M. Nesterkina, V. Muratov, L. Ognichenko, I. Kravchenko and V. Kuz’min, Open Chem., 2021, 19, 1184–1192 CrossRef.
V. E. Kuz’min, L. N. Ognichenko, N. Sizochenko, V. A. Chapkin, S. I. Stelmakh, A. O. Shyrykalova and J. Leszczynski, Research Anthology on Synthesis, Characterization, and Applications of Nanomaterials, IGI Global, 2021, pp. 317–329 Search PubMed.
(Q)SAR Assessment Framework: Guidance for the regulatory assessment of (Quantitative) Structure − Activity Relationship models, predictions, and results based on multiple predictions, 2023.
S. S. Poulsen, A. T. Saber, A. Williams, O. Andersen, C. Købler, R. Atluri, M. E. Pozzebon, S. P. Mucelli, M. Simion, D. Rickerby, A. Mortensen, P. Jackson, Z. O. Kyjovska, K. Mølhave, N. R. Jacobsen, K. A. Jensen, C. L. Yauk, H. Wallin, S. Halappanavar and U. Vogel, Toxicol. Appl. Pharmacol., 2015, 284, 16–32 CrossRef CAS PubMed.
S. Halappanavar, L. Rahman, J. Nikota, S. S. Poulsen, Y. Ding, P. Jackson, H. Wallin, O. Schmid, U. Vogel and A. Williams, NanoImpact, 2019, 14, 100158 CrossRef.
S. Halappanavar, A. T. Saber, N. Decan, K. A. Jensen, D. Wu, N. R. Jacobsen, C. Guo, J. Rogowski, I. K. Koponen, M. Levin, A. M. Madsen, R. Atluri, V. Snitka, R. K. Birkedal, D. Rickerby, A. Williams, H. Wallin, C. L. Yauk and U. Vogel, Environ. Mol. Mutagen., 2015, 56, 245–264 Search PubMed.
A. Saber, N. Jacobsen, A. Mortensen, J. Szarek, P. Jackson, A. Madsen, K. Jensen, I. K. Koponen, G. Brunborg, K. Gützkow, U. Vogel and H. Wallin, Part. Fibre Toxicol., 2012, 9, 4 Search PubMed.
A. Saber, A. Mortensen, J. Szarek, N. Jacobsen, M. Levin, I. Koponen, K. Jensen, U. Vogel and H. Wallin, Hum. Exp. Toxicol., 2019, 38, 11–24 CrossRef CAS.
H. Wallin, Z. O. Kyjovska, S. S. Poulsen, N. R. Jacobsen, A. T. Saber, S. Bengtson, P. Jackson and U. Vogel, Mutagenesis, 2017, 32, 47–57 Search PubMed.
J. A. Bourdon, A. T. Saber, N. R. Jacobsen, K. A. Jensen, A. M. Madsen, J. S. Lamson, H. Wallin, P. Møller, S. Loft, C. L. Yauk and U. B. Vogel, Part. Fibre Toxicol., 2012, 9, 5 Search PubMed.
D. J. McCarthy and G. K. Smyth, Bioinformatics, 2009, 25, 765–771 CrossRef CAS.
M. J. Peart, G. K. Smyth, R. K. van Laar, D. D. Bowtell, V. M. Richon, P. A. Marks, A. J. Holloway and R. W. Johnstone, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 3697–3702 CrossRef CAS PubMed.
A. Raouf, Y. Zhao, K. To, J. Stingl, A. Delaney, M. Barbara, N. Iscove, S. Jones, S. McKinney, J. Emerman, S. Aparicio, M. Marra and C. Eaves, Cell Stem Cell, 2008, 3, 109–118 CrossRef CAS PubMed.
S. Halappanavar, P. Jackson, A. Williams, K. A. Jensen, K. S. Hougaard, U. Vogel, C. L. Yauk and H. Wallin, Environ. Mol. Mutagen., 2011, 52, 425–439 CrossRef CAS.
M. Schena, D. Shalon, R. Heller, A. Chai, P. O. Brown and R. W. Davis, Proc. Natl. Acad. Sci. U. S. A., 1996, 93, 10614–10619 CrossRef CAS PubMed.
Y. Zhao, D.-F. Guo and K. Rahmouni, Physiology, 2024, 39(S1) DOI:10.1152/physiol.2024.39.S1.881.
J. Lever, M. Krzywinski and N. Altman, Nat. Methods, 2017, 14, 641–642 Search PubMed.
I. T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 2002 Search PubMed.
D. Saha and A. Manickavasagan, Curr. Res. Food Sci., 2021, 4, 28–44 CrossRef CAS PubMed.
P. Polishchuk, O. Tinkov, T. Khristova, L. Ognichenko, A. Kosinskaya, A. Varnek and V. Kuz’min, J. Chem. Inf. Model., 2016, 56, 1455–1469 CrossRef CAS.
V. E. Kuz’min, A. G. Artemenko, P. G. Polischuk, E. N. Muratov, A. I. Hromov, A. V. Liahovskiy, S. A. Andronati and S. Y. Makan, J. Mol. Model., 2005, 11, 457–467 Search PubMed.
V. E. Kuz’min, A. G. Artemenko and E. N. Muratov, J. Comput.-Aided Mol. Des., 2008, 22, 403–421 CrossRef PubMed.
M. S. Mohammed, N. Paraman, A. A.-H. A. Rahman, F. A. Ghaleb, A. Al-Dhamari and M. N. Marsono, IEEE Access, 2021, 9, 124087–124099 Search PubMed.
D. Dai, F. Javed, P. Karlsson and K. Månsson, Ann. Oper. Res., 2025 DOI:10.1007/s10479-025-06486-y.
F.-A. Fortin, U. Marc-André Gardner, M. Parizeau and C. Gagné, DEAP: Evolutionary Algorithms Made Easy François-Michel De Rainville, 2012, vol. 13 Search PubMed.
A. O. Aptula, N. G. Jeliazkova, T. W. Schultz and M. T. D. Cronin, QSAR Comb. Sci., 2005, 24, 385–396 Search PubMed.
S. Sengottiyan, A. Mikolajczyk, K. Jagiełło, M. Swirog and T. Puzyn, ACS Nano, 2023, 17, 1989–1997 Search PubMed.
V. Muratov, K. Jagiello, A. Mikolajczyk, P. H. Danielsen, S. Halappanavar, U. Vogel and T. Puzyn, J. Hazard. Mater., 2025, 493, 138240 CrossRef CAS.
Q. Chen, L. Wu, W. Liu, L. Xing and X. Fan, Molecules, 2013, 18, 10789–10801 Search PubMed.
T.-H. Pham, Y. Qiu, J. Liu, S. Zimmer, E. O’Neill, L. Xie and P. Zhang, Patterns, 2022, 3, 100441 Search PubMed.
G. Woo, M. Fernandez, M. Hsing, N. A. Lack, A. D. Cavga and A. Cherkasov, Bioinformatics, 2020, 36, 813–818 Search PubMed.
I. Iavicoli, V. Leso, L. Fontana and E. Calabrese, Int. J. Mol. Sci., 2018, 19, 805 CrossRef PubMed.
L. Eriksson, J. Jaworska, A. P. Worth, M. T. D. Cronin, R. M. McDowell and P. Gramatica, Environ. Health Perspect., 2003, 111, 1361–1375 CrossRef CAS PubMed.
A. Tropsha, Mol. Inf., 2010, 29, 476–488 CrossRef CAS.
V. E. Kuz’min, A. G. Artemenko, E. N. Muratov, P. G. Polischuk, L. N. Ognichenko, A. V. Liahovsky, A. I. Hromov and E. V. Varlamova, Virtual Screening and Molecular Design Based on Hierarchical Qsar Technology, 2010, pp. 127–176 Search PubMed.
T. Puzyn, B. Rasulev, A. Gajewicz, X. Hu, T. P. Dasari, A. Michalkova, H.-M. Hwang, A. Toropov, D. Leszczynska and J. Leszczynski, Nat. Nanotechnol., 2011, 6, 175–178 CrossRef CAS.
G. K. Jillella, P. K. Ojha and K. Roy, Aquat. Toxicol., 2021, 238, 105925 CrossRef CAS.
L. N. Ognichenko, V. E. Kuz’min and A. G. Artemenko, QSAR Comb. Sci., 2009, 28, 939–945 CrossRef CAS.
R. Li, X. Wang, Z. Ji, B. Sun, H. Zhang, C. H. Chang, S. Lin, H. Meng, Y.-P. Liao, M. Wang, Z. Li, A. A. Hwang, T.-B. Song, R. Xu, Y. Yang, J. I. Zink, A. E. Nel and T. Xia, ACS Nano, 2013, 7, 2352–2368 CrossRef CAS PubMed.
M.-H. Jang and Y. S. Hwang, PLoS One, 2018, 13, e0194935 Search PubMed.
H. A. Zeinabad, A. Zarrabian, A. A. Saboury, A. M. Alizadeh and M. Falahati, Sci. Rep., 2016, 6, 26508 CrossRef CAS PubMed.
K. Lönnecke, O. Eberhardt and T. Wallmersperger, Acta Mech., 2023, 234, 1–16 CrossRef.
M. Muench, C.-H. Hsin, E. Ferber, S. Berger and M. J. Mueller, J. Exp. Bot., 2016, 67, 6139–6148 Search PubMed.
H. Sasaki, H. Sato, K. Kuriyama-Matsumura, K. Sato, K. Maebara, H. Wang, M. Tamba, K. Itoh, M. Yamamoto and S. Bannai, J. Biol. Chem., 2002, 277, 44765–44771 CrossRef CAS PubMed.
J. Huchthausen, J. Braasch, B. I. Escher, M. König and L. Henneberger, Chem. Res. Toxicol., 2024, 37, 744–756 Search PubMed.
Z. Cai, M. Zafferani, O. M. Akande and A. E. Hargrove, J. Med. Chem., 2022, 65, 7262–7277 Search PubMed.
C. Stretch, S. Khan, N. Asgarian, R. Eisner, S. Vaisipour, S. Damaraju, K. Graham, O. F. Bathe, H. Steed, R. Greiner and V. E. Baracos, PLoS One, 2013, 8, e65380 CrossRef CAS PubMed.
European Chemicals Agency., New approach methodologies in regulatory science – Proceedings of a scientific workshop – Helsinki,19–20 April 2016, ECHA, 2016.

Click here to see how this site uses Cookies. View our privacy policy here.