Development of prediction models on the degradation kinetics parameters of antibiotics in aquatic environments with machine learning methods
Abstract
Antibiotics, as emerging contaminants, are increasingly detected in aquatic environments, raising significant concerns about their ecological risks. However, the lack of hydrolysis rate constants (kH) and aqueous hydroxyl radical degradation rate constants (kOH) limits the environmental persistent assessment of antibiotics. The present study addresses this gap by developing prediction models using multiple linear regression and three machine learning algorithms (i.e., random forest, support vector machine, and extreme gradient boosting (XGBoost)), based on a dataset of 69 kH and 80 kOH values. The XGBoost models, identified as optimal, were employed to fill in missing data in the original dataset. Subsequently, a multi-task model capable of simultaneously predicting kH and kOH values was developed with good performance. The application domain was characterized by Williams plots. Furthermore, Shapley Additive exPlanations analysis was employed to identify key molecular descriptors influencing degradation rates, which provides insights into the underlying degradation mechanisms. This approach not only facilitates the simultaneous prediction of kH and kOH values for various new pollutants, but also enhances the understanding of how molecular structure affects their synergistic degradation kinetics in aquatic environments, thereby significantly contributing to the assessment of environmental persistence of emerging contaminants.