A Machine Learning Framework to Predict PPCP Removal Through Various Wastewater and Water Reuse Treatment Trains
Abstract
The persistence of pharmaceuticals and personal care products (PPCPs) through wastewater treatment and resulting contamination of aquatic environments and drinking water is a pervasive concern, necessitating means of identifying effective treatment strategies for PPCP removal. In this study, we employed machine learning (ML) models to classify 149 PPCPs based on their chemical properties and predict their removal via wastewater and water reuse treatment trains. We evaluated two distinct clustering approaches: C1 (Clustering based on the most efficient individual treatment process) and C2 (Clustering based on the removal pattern of PPCPs across treatments). For this, we grouped PPCPs based on their relative abundances by comparing peak areas measured via non-target profiling using ultra-performance liquid chromotography-tandem mass spectrometry through two field-scale treatment trains. The resulting clusters were then classified using Abraham descriptors and log Kow as input to the three ML models: Support Vector Machines (SVM), Logistic Regression, and Random Forest (RF). SVM achieved the highest accuracy, 79.1%, in predicting PPCP removal. Notably, a 58-75% overlap was observed between the ML clusters of PPCPs and the Abraham descriptor and log Kow clusters of PPCPs, indicating the potential of using Abraham descriptors and log Kow to predict the fate of PPCPs through various treatment trains. Given the myriad of PPCPs of concern, this approach can supplement information gathered from experimental testing to help optimize the design of wastewater and water reuse treatment trains for PPCP removal.