Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

tianran ding; Gustavo  Larrea-Gallegos; Federico  Busio; Antonino  Marvuglia; Thomas Schaubroeck

doi:10.1039/D6SU00023A

Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

tianran ding, Gustavo Larrea-Gallegos, Federico Busio, Antonino Marvuglia and Thomas Schaubroeck

Abstract

The rapid expansion of registered chemicals, coupled with persistent data gaps, poses a major challenge for toxicity assessment in Life Cycle Assessment (LCA) and Safe and Sustainable by Design (SSbD). This study proposes a data-driven framework to directly predict toxicity characterization factors (CFs) from molecular Simplified Molecular Input Line Entry System (SMILES), using the Environmental Footprint (EF) v3.1 database as the training benchmark. We systematically evaluate five machine-learning (ML) and deep-learning (DL) approaches—random forest, XGBoost, Gaussian process, deep neural networks (DNN), and graph neural networks via message-passing neural networks (MPNN)—across three molecular representations: handcrafted physicochemical descriptors (Mordred), molecular graphs, and large-scale pretrained molecular embeddings (GROVER). Predictive performance is strongly target-dependent, with ecotoxicity CFs showing consistently higher predictability (R² = 0.57–0.67) than human toxicity CFs (R² = 0.40–0.60). Mordred-based ML models, particularly XGBoost, exhibit robust performance across multiple targets, while graph-based GNN models—especially multi-target MPNNs trained on graph-only representations—achieve comparable or, for several ecotoxicity targets, superior performance. GROVER embeddings reach competitive performance primarily when coupled with DL architectures. These results demonstrate that graph-based and pretrained molecular representations can effectively capture complex structure–toxicity relationships, reducing reliance on manual feature engineering. The framework further integrates applicability domain analysis and chemical clustering to enable domain-consistent, optimized model selection. A textile-sector case study illustrates how predicted CFs for chemicals previously without, can be incorporated into LCA, revealing that excluding toxicity impact due to missing CFs can lead to substantial underestimation of toxicity impacts—by up to an order of magnitude in the examined case.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

DOI: https://doi.org/10.1039/D6SU00023A
Article type: Paper
Submitted: 12 Jan 2026
Accepted: 31 May 2026
First published: 05 Jun 2026
This article is Open Access

Download Citation

RSC Sustainability, 2026, Accepted Manuscript

Permissions

Request permissions

Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

T. ding, G. Larrea-Gallegos, F. Busio, A. Marvuglia and T. Schaubroeck, RSC Sustainability, 2026, Accepted Manuscript , DOI: 10.1039/D6SU00023A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

RSC Sustainability

Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

Abstract

Supplementary files

Transparent peer review

Article information

Download Citation

Permissions

Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

Social activity

Search articles by author

Spotlight

Advertisements