Issue 11, 2026, Issue in Progress

Integrated machine learning and positive matrix factorization for the source-specific contamination and predictive risk assessment of potentially toxic elements in multi-land-use soils around an active coal mine

Abstract

Investigating the distribution, sources, and risks of potentially toxic elements (PTEs) in mining-impacted soils is critical for effective environmental monitoring and human health protection. However, traditional assessments often fail to integrate spatial, source-oriented, and predictive approaches limiting a comprehensive understanding. In this study, 120 soil samples were collected from five land-use types surrounding an active opencast coal mine in the Godavari Valley coalfields, India. Pollution indices revealed severe multi-metal contamination, with Co and Cd emerging as the most consistently enriched elements across land uses, while Zn showed pronounced but spatially restricted enrichment, particularly in coal mine soils. An integrated framework combining positive matrix factorization (PMF), machine learning, and geospatial analysis was developed to identify source-specific contamination patterns. A robust four-factor PMF solution identified mixed industrial-mining activities as the dominant source (∼49%) of contamination. A random forest (RF) model integrating soil properties, spatial variables, and PMF-derived source contributions demonstrated strong to moderate predictive performance (average R2 = 0.82) with an average root mean square error (RMSE) of 19.6 mg kg−1. Geostatistical mapping highlighted coal mines and adjacent agricultural areas as persistent contamination hotspots. Ecological risk assessment indicated Cd and Hg as the principal contributors to high ecological risks, particularly in agricultural and roadside soils. Probabilistic health risk assessment revealed unacceptable risks for the local population, with children being the most vulnerable. Cr was identified as the primary driver of carcinogenic risk, contributing ∼81% in children, while Co-dominated non-carcinogenic risks resulting in hazard indices for children approaching unacceptable thresholds across all land-uses. Our findings provide a precise and scientific framework for source-specific risk assessment to target soil remediation and environmental management in mining-impacted landscapes worldwide.

Graphical abstract: Integrated machine learning and positive matrix factorization for the source-specific contamination and predictive risk assessment of potentially toxic elements in multi-land-use soils around an active coal mine

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
18 Dec 2025
Accepted
03 Feb 2026
First published
19 Feb 2026
This article is Open Access
Creative Commons BY license

RSC Adv., 2026,16, 10158-10178

Integrated machine learning and positive matrix factorization for the source-specific contamination and predictive risk assessment of potentially toxic elements in multi-land-use soils around an active coal mine

Z. Bashir, D. Raj and R. Selvasembian, RSC Adv., 2026, 16, 10158 DOI: 10.1039/D5RA09789D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements