Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

Abstract

The rapid expansion of registered chemicals, coupled with persistent data gaps, poses a major challenge for toxicity assessment in Life Cycle Assessment (LCA) and Safe and Sustainable by Design (SSbD). This study proposes a data-driven framework to directly predict toxicity characterization factors (CFs) from molecular Simplified Molecular Input Line Entry System (SMILES), using the Environmental Footprint (EF) v3.1 database as the training benchmark. We systematically evaluate five machine-learning (ML) and deep-learning (DL) approaches—random forest, XGBoost, Gaussian process, deep neural networks (DNN), and graph neural networks via message-passing neural networks (MPNN)—across three molecular representations: handcrafted physicochemical descriptors (Mordred), molecular graphs, and large-scale pretrained molecular embeddings (GROVER). Predictive performance is strongly target-dependent, with ecotoxicity CFs showing consistently higher predictability (R² = 0.57–0.67) than human toxicity CFs (R² = 0.40–0.60). Mordred-based ML models, particularly XGBoost, exhibit robust performance across multiple targets, while graph-based GNN models—especially multi-target MPNNs trained on graph-only representations—achieve comparable or, for several ecotoxicity targets, superior performance. GROVER embeddings reach competitive performance primarily when coupled with DL architectures. These results demonstrate that graph-based and pretrained molecular representations can effectively capture complex structure–toxicity relationships, reducing reliance on manual feature engineering. The framework further integrates applicability domain analysis and chemical clustering to enable domain-consistent, optimized model selection. A textile-sector case study illustrates how predicted CFs for chemicals previously without, can be incorporated into LCA, revealing that excluding toxicity impact due to missing CFs can lead to substantial underestimation of toxicity impacts—by up to an order of magnitude in the examined case.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
12 Jan 2026
Accepted
31 May 2026
First published
05 Jun 2026
This article is Open Access
Creative Commons BY license

RSC Sustainability, 2026, Accepted Manuscript

Characterizing Chemical Toxicity for Life Cycle Assessment Using Machine Learning and Deep Learning Models Based on Environmental Footprint – Methodological Comparison & textile case study

T. ding, G. Larrea-Gallegos, F. Busio, A. Marvuglia and T. Schaubroeck, RSC Sustainability, 2026, Accepted Manuscript , DOI: 10.1039/D6SU00023A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements