Machine-learning prediction of DCAA and TCAA concentrations in drinking water

Abstract

Drinking water disinfection by-products (DBPs) are of significant concern due to their carcinogenic, teratogenic, and mutagenic properties, making real-time monitoring essential for ensuring water safety. However, the typically low concentrations of DBPs and the high cost and complexity of conventional detection methods have led researchers to increasingly turn to predictive modeling using easily measurable water quality parameters. This study systematically evaluates the feasibility of machine learning (ML) methods in predicting the concentrations of dichloroacetic acid (DCAA) and trichloroacetic acid (TCAA): multiple linear regression (MLR), while computationally efficient, is limited by its linear assumptions and exhibits poor predictive performance (test set N25 = 23–54%, R2 = 0.353–0.640). Support vector regression (SVR), leveraging kernel functions, provided only marginal improvement (N25 = 46–69%, R2 = 0.442–0.595). The backpropagation neural network (BPNN) significantly enhanced prediction accuracy through flexible configuration of the hidden layer structure, number of nodes, and activation functions. For DCAA and TCAA, with one hidden layer and 15 nodes, BPNN outperformed both MLR and SVR (test set N25 = 89%, R2 = 0.850). Nevertheless, BPNN still suffers from inherent limitations, such as slow convergence due to a fixed learning rate and a tendency to converge to local optima caused by random initialization. To address these issues, this study introduced particle swarm optimization (PSO) to globally optimize the weights of BPNN, further increasing the prediction accuracy to over 98%. The results demonstrate that high-precision prediction can be achieved using only eight conventional water quality parameters, offering an economical, convenient, and reliable technical approach for monitoring DBPs in water supply systems.

Graphical abstract: Machine-learning prediction of DCAA and TCAA concentrations in drinking water

Article information

Article type
Paper
Submitted
11 Jul 2025
Accepted
10 Sep 2025
First published
07 Oct 2025

Environ. Sci.: Water Res. Technol., 2025, Advance Article

Machine-learning prediction of DCAA and TCAA concentrations in drinking water

M. Liu, W. Huang, C. Yu, H. Zhang, L. Lin and Z. Jiang, Environ. Sci.: Water Res. Technol., 2025, Advance Article , DOI: 10.1039/D5EW00644A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements