Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

Ruyue Jin; Yuzhen Liang; Zhenqing Shi

doi:10.1039/D5EM00029G

Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins†

Ruyue Jin,^ab Yuzhen Liang

*^ab and Zhenqing Shi

^ab

Author affiliations

* Corresponding authors

^a School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China

^b The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China

Abstract

This study aims to improve predictions and understanding of dissolved organic carbon–water partitioning coefficients (K_DOC), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with R_train² > 0.771, R_valid² > 0.602, R_test² > 0.629, and RMSE_test ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced K_DOC, while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log K_DOC in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP > 6, Mor29m between 0.45 and 0.52, MATS2m < −0.015, and SlogP_VSA1 < 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D5EM00029G
Article type: Paper
Submitted: 11 Jan 2025
Accepted: 29 May 2025
First published: 30 May 2025

Download Citation

Environ. Sci.: Processes Impacts, 2025,27, 1889-1901

Permissions

Request permissions

Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

R. Jin, Y. Liang and Z. Shi, Environ. Sci.: Processes Impacts, 2025, 27, 1889 DOI: 10.1039/D5EM00029G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Environmental Science: Processes & Impacts

Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

Social activity

Search articles by author

Spotlight

Advertisements