Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

Abstract

This study aims to improve predictions and understanding of dissolved organic carbon–water partitioning coefficients (KDOC), a crucial parameter in environmental risk assessment. A dataset encompassing 709 datapoints across 190 unique organic pollutants and various types of dissolved organic matter (DOM) was compiled. Molecular descriptors were calculated to characterize each compound's properties and structures using Multiwfn, PaDEL and RDKit. Individual machine learning models were established for four different DOM origins: all DOM, natural aquatic DOM, natural terrestrial DOM and commercial DOM. These models exhibited excellent goodness-of-fit, internal stability, and predictive performance with Rtrain2 > 0.771, Rvalid2 > 0.602, Rtest2 > 0.629, and RMSEtest ranging from 0.413 to 0.580. Shapley additive explanation analysis identified CrippenLogP and MATS2m as the most influencing factors. CrippenLogP, reflecting hydrophobicity, positively influenced KDOC, while MATS2m, characterizing molecular branching and compactness, had a negative effect. Mor29m, where lower values indicate a higher abundance of heteroatoms such as halogens, also showed a negative impact, likely due to enhanced interactions with polar DOM groups. SlogP_VSA1, another descriptor related to hydrophobicity, demonstrated a positive correlation with log KDOC in natural aquatic DOM, while its negative correlation in all DOM may reflect the great diversity of DOM properties in that group. Partial dependence plots revealed that when CrippenLogP > 6, Mor29m between 0.45 and 0.52, MATS2m < −0.015, and SlogP_VSA1 < 7, organic pollutants tended to partition more into DOM. These findings support the application of machine learning models for assessing pollutant interactions with DOM, contributing to improved environmental risk predictions.

Graphical abstract: Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

Supplementary files

Article information

Article type
Paper
Submitted
11 Jan 2025
Accepted
29 May 2025
First published
30 May 2025

Environ. Sci.: Processes Impacts, 2025, Advance Article

Machine learning prediction of DOC–water partitioning coefficients for organic pollutants from diverse DOM origins

R. Jin, Y. Liang and Z. Shi, Environ. Sci.: Processes Impacts, 2025, Advance Article , DOI: 10.1039/D5EM00029G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements