Combining predictive and analytical methods to elucidate pharmaceutical biotransformation in activated sludge †

While man-made chemicals in the environment are ubiquitous and a potential threat to human health and ecosystem integrity, the environmental fate of chemical contaminants such as pharmaceuticals is often poorly understood. Biodegradation processes driven by microbial communities convert chemicals into transformation products (TPs) that may themselves have adverse ecological e ﬀ ects. The detection of TPs formed during biodegradation has been continuously improved thanks to the development of TP prediction algorithms and analytical work ﬂ ows. Here, we contribute to this advance by (i) reviewing past applications of TP identi ﬁ cation work ﬂ ows, (ii) applying an updated work ﬂ ow for TP prediction to 42 pharmaceuticals in biodegradation experiments with activated sludge, and (iii) benchmarking 5 di ﬀ erent pathway prediction models, comprising 4 prediction models trained on di ﬀ erent datasets provided by enviPath, and the state-of-the-art EAWAG pathway prediction system. Using the updated work ﬂ ow, we could tentatively identify 79 transformation products for 31 pharmaceutical compounds. Compared to previous works, we have further automatized several steps that were previously performed by hand. By benchmarking the enviPath prediction system on experimental data, we demonstrate the usefulness of the pathway prediction tool to generate suspect lists for screening, and we propose new avenues to improve their accuracy. Moreover, we provide a well-documented work ﬂ ow that can be (i) readily applied to detect transformation products in activated sludge and (ii) potentially extended to other environmental studies.


Introduction
The fate of an anthropogenic chemical in the environment is to a large extent determined by its intrinsic capability to be biotransformed by microorganisms. Biodegradation leads to the transient or permanent presence of transformation products (TPs), which can, like their parent compounds, be characterized by their behavior in the environment in terms of persistence, mobility, toxicity, and their ability to bioaccumulate. In certain cases, TPs have been found to be more persistent, mobile and/ or toxic than their parent compound, 1-3 which further highlights the importance of considering TPs in the environmental risk assessment of chemicals. Biodegradation studies identifying half-lives and biotransformation products are mandatory for certain classes of chemicals, i.e., pesticides. 4,5 For pharmaceuticals, in contrast, only the characterization of human metabolites is required by regulation in the European Union, 6 leading to a knowledge gap regarding the fate of active pharmaceutical ingredients (APIs) in the environment. As most APIs reach wastewater treatment plants (WWTP), understanding their fate in activated sludge is primordial. However, the iden-tication of TPs is challenging because (i) the TP structures are not known in advance, and (ii) oen no analytical standards are available to conrm the exact structure.  rst addressed systematically the issue of TP identication. 7 To detect previously unknown degradation products of micropollutants in activated sludge, the authors presented a workow combining computational and analytical approaches: (i) automatic generation of a suspect list of potential TPs for each compound, (ii) spiking activated sludge reactors with parent compounds, and (iii) screening the sludge samples for suspected TPs using liquid chromatography coupled to high-resolution tandem mass spectrometry (LC-HR-MSMS). In the rst step (i), expert-curated biochemical transformation rules were iteratively applied to a chemical structure of interest to predict biodegradation pathways involving potential TPs. Typical pathway prediction tools are PathPred, 8 BNICE, 9 RetroPathRL 10 or the University of Minnesota Pathway Prediction System (UM-PPS). 11 The UM-PPS, which was used in Helbling et al., 7 is specically designed for biodegradation studies and can prioritize likely over less likely biotransformations using prioritization rules (also called relative reasoning rules) to yield biochemically plausible biotransformation pathways and corresponding TPs. 12 In the third step (iii), the generated suspect list was then used to extract single ion chromatograms for matching masses, which were further analyzed for peak formation over time, isotopic t and shared fragments between parent compound and TPs. 13 Still today, the main challenges of this approach are the high number of false positives in the suspect list leading to a low prediction precision, i.e., a low number of correctly predicted TPs per total number of predicted TPs, and the need for individual inspection of the extracted ion chromatogram (XIC), and MS and MS/ MS spectra for each candidate. Without reference standards, considerable efforts are still needed for the resolution of TPs' isomeric structures and TP quantication, such as the development of advanced identication workows and even the development of novel approaches (e.g., machine learning models) to predict ionization efficiencies that can improve the detection of more candidate TPs and the estimation of their concentration. 14 In the past years, this workow was applied and modied by different research groups to identify TPs in samples from biotransformation experiments. In particular, the prediction methods and underlying biodegradation databases have evolved to yield more accurate TP predictions: in 2012, the University of Minnesota Biodegradation/Biocatalysis Database and Pathway Prediction System (UM-BBD/PPS) 15 was moved to Eawag and renamed to EAWAG-BBD/PPS, while keeping its original pathway prediction tool (PPS) and biodegradation data obtained from pure or enrichment cultures (BBD). In 2015, Wicker et al. re-implemented the EAWAG-BBD/PPS platform as enviPath, and the original BBD database was transferred to the new platform as EAWAG-BBD data package. 16 In 2017, Latino et al. collected soil biodegradation data for 317 pesticides from regulatory reports and compiled them into the EAWAG-SOIL data package. 17 The latest data addition to enviPath is EAWAG-SLUDGE, which contains biodegradation data for 91 micropollutants in activated sludge collected from scientic literature (https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a). Compared to its predecessors, enviPath not only holds more data, but also provides an improved pathway prediction system where the expert-curated reaction prioritization rules were replaced with a machine-learning algorithm that learns the relative reasoning rules directly from the data. 18,19 On the analytical side, new solutions emerged that facilitate the identication of TPs, in particular to decrease the workload of manually investigating mass matches for long suspect lists: different automated tools (Sieve, Compound Discoverer™ by Thermo Fisher Scientic™, among others) now address this issue by peak prioritization based on intensity, isotopic pattern, mass defect, time course of peak formation and predicted retention time (RT) by quantitative structure retention relationships (QSRR). 20,21 Furthermore, the interpretation of MS spectra is facilitated by spectral library search (e.g., MassBank, 22 NIST, 23 mzCloud 24 ) and in silico fragmentation tools (e.g., Mass Frontier, SIRIUS, 25 CFM, 26 MetFrag 27 ).
These recent developments require a systematic analysis of previous studies to form an accurate picture of the current stateof-the-art in TP identication in biodegradation experiments, and to benchmark the performance of newly available tools against the original methods. To address this need, we (i) provide an overview of previous publications on TP identication in activated sludge or wastewater, (ii) present an updated, partially automated workow for TP identication (Fig. 1), (iii) apply it to elucidate biotransformation processes of 42 pharmaceuticals, for many of which no TPs have been reported before, in a batch experiment with activated sludge, and (iv) evaluate the accuracy of ve different TP prediction algorithms to guide future applications.

Literature search
The objective of the literature search was to collect all publications on TP identication experiments in activated sludge or samples from wastewater treatment plants (WWTP) that used pathway prediction to generate suspect lists. The search terms "biotransformation", "sludge" or "waste water", "pathway prediction system" or "in silico metabolism prediction" or the name of a prediction system (e.g., "Pathpred") or "suspect screening" were used in Reaxys (https://www.reaxys.com, last accessed 29/08/2022) and Clarivate Web of Science (https:// www.webofscience.com, last accessed 01/09/2022). Furthermore, a Scopus (https://www.scopus.com, last accessed 02/09/ 2022) search for citation of the articles by Helbling et al.  16 was performed. For each relevant article presenting results on TP identication, we extracted (i) the number of predicted and identied TPs, (ii) the substance class, (iii) the initially spiked concentration of test chemicals (if applicable), (iv) the pathway prediction method, (v) the experimental setup, (vi) the analytical method, and (vii) whether the TP identication was solely based on a suspect list (suspect screening) or whether additional TPs were identied by comparing full-scan MS data from different time points to detect emerging metabolites (non-target screening). Reviews were analyzed separately to identify general trends in analytical and computational methodologies.

TP identication workow
The overall workow for suspect TP identication included six steps ( Fig. 1): (i) predicting TPs using pathway prediction tools, (ii) compiling a suspect list and annotating structures with MSrelevant information, (iii) performing biotransformation experiments, (iv) analyzing samples using liquid chromatography coupled to high-resolution tandem mass spectrometry, (v) identifying TPs from HR-MS data (including suspect screening and assignment of condence levels), (vi) compiling identied TPs into pathways. Each step is described in detail in the next subsections. Compared to the original workow proposed by Helbling et al., 7 the following steps were updated ( Fig. 1, see green circular arrows): (i) suspect and mass list generation, (ii) second LC-MS measurement with stepped collision energy, (iii) spectral library search within Compound Discoverer, (iv) assignment of condence levels according to Schymanski et al., 28 (v) prediction of conjugation reactions using Compound Discoverer, (vi) feedback of curated TP pathways into enviPath.

TP prediction tools
Suspect lists were obtained from EAWAG-PPS and enviPath. For enviPath, 4 pathway prediction models were trained on different combinations of the publicly available enviPath data packages EAWAG-BBD, EAWAG-SOIL, and EAWAG-SLUDGE to study the effect of adding different training data sets on the prediction performance. The following machine-learning models were trained using the respective data packages from EAWAG for provide different purposes: (i) ML-ECC-BBD was trained on pathway data in EAWAG-BBD and considered the standard, reference model. (ii) ML-ECC-BBD + SOIL was trained on pathway data from both EAWAG-BBD and EAWAG-SOIL to study the effect of biasing the model towards biodegradation in soil. (iii) ML-ECC-BBD + SLUDGE was trained on EAWAG-BBD and EAWAG-SLUDGE to study the effect of biasing the model towards biodegradation in activated sludge. (iv) ML-ECC-BBD + SOIL + SLUDGE was trained on all three data packages to see if including a maximum number of training pathways increases model performance. Table 1 shows the size and composition of the training sets used for the different models. All models used MACCS ngerprints as molecular descriptors and were trained using multi-label Ensemble Classier Chains (ECC). Further details on the training of relative reasoning models can be found elsewhere. 16 For the TP prediction, EAWAG-PPS was run in batch mode using relative reasoning for three iterations with a neutral aerobic likelihood cut-off. The enviPath TP prediction was also run in batch mode (for details, see Methods section on "Data availability"), with a cut-off at 50 TPs per parent compound. The search algorithm employed a greedy pathway search in a weighted network, where the nodes are compounds and the edges are biotransformation reactions weighted with the predicted probability of the reaction to happen, given available data and competing reactions. The reaction probability p edge is obtained from the ML-based relative reasoning algorithm. For a child node n generated during the pathway search, the probability p node,n is calculated as the product of p edge,n−1/n of the reaction producing the TPs with the probability of its direct parent node (p node,n−1 ). During the search, the nodes are expanded in order of decreasing combined probability until the maximum of 50 TPs is reached, or no more TPs with a combined probability greater than zero are available for further expansion. The node and reaction probabilities are reported for each predicted TP, indicating their probability to be observed experimentally given the underlying relative reasoning model. The pathway search algorithms used by EAWAG-PPS and enviPath are illustrated in Fig. S1 (ESI-I). †

Compilation of suspect list
Python (version 3.6.13) scripting was used to combine the TPs predicted by the ve different models into one suspect list, and to determine their monoisotopic mass, chemical formula, InChIKey, CAS number and structure as mol le using the Python libraries RDKit (version 2020.09) and PubChemPy (version 1.0.4). Some TPs were predicted for several parent compounds, in which case they were merged in the suspect list used for screening but counted separately in the method evaluation and comparison. From the suspect list, we extracted the charged masses for HRMS measurements (inclusion list), and the formulae and Molles for TP identication in Compound Discoverer (mass list).

Experimental setup of sludge reactors
The experimental setup of the sludge reactors was adapted from Gulde et al. 29 In short, sludge-seeded and aerated bottle reactors were spiked with the mixture of 46 selected compounds at an initial concentration of 8 mg L −1 (details in ESI-I Table S3 †). The APIs were selected based on commercial availability, expected measurability using HPLC-HRMS/MS, and predictability of the corresponding TPs. The selected substances show a wide range of structural moieties and diversity in their functional groups. Only irbesartan, 30 valsartan, 7,31-34 metformin 35 and hydrochlorothiazide 34 were previously investigated in biotransformation experiments in activated sludge or wastewater samples. Further, olanzapine, mirtazapine, rivastigmine, aliskiren, atazanavir, efavirenz and rosuvastatin were screened for in waste water samples. [36][37][38] The environmental fate of the remaining 35 APIs has not been investigated to the best of our knowledge. Control experiments were used to reveal abiotic degradation, sorption processes, and matrix background (ESI-I Table S4 †). The airow of half of the reactors was augmented with CO 2 to assess biotransformation at pH 6 in addition to the native pH of Table 1 Training sets used to build relative reasoning models for pathway prediction in enviPath

TP identication
The Compound Discoverer™ soware (Thermo Scientic™, Version 3.2) was used for TP suspect screening. The procedure included compound detection, comparison to suspect mass list, in silico prediction of fragments and (spectral) library search (mzCloud, ChemSpider), described in more detail in the ESI-I (Section S5 †). The entries of plausible candidates were reviewed manually based on peak shape, isotopic pattern and chromatographic area evolution over time and comparison to controls. Condence levels were reported according to Schymanski et al.: 28 1 (conrmed structure by reference standard), 2a (probable structure by spectral library match), 2b (probable structure by diagnostic evidence), 3 (tentative candidate with reasonable MS2), 4 (unequivocal molecular formula found), 5 (exact mass found). Finally, molecular structures were drawn based on structural evidence.
Compound Discoverer™ was further used to identify TPs resulting from conjugation reactions. N-Acetylation and N-succinylation were shown to be highly relevant for primary and secondary amines, 40 but their prediction is beyond the scope of biodegradation tools, which focus on the breakage (and not formation) of molecular bonds. Conjugation reactions (acetylation, formylation, fumarylation, malonylation and succinylation) were therefore predicted using the Expected Compounds nodes of Compound Discoverer. In addition, we also screened literature to nd TPs reported in previous studies. While we did not include TPs arising from conjugation reactions and TPs reported in literature in the suspect list, we still searched for their presence in the LC-HRMS measurements. These TPs were analyzed separately to avoid interfering with our evaluation of TP prediction methods and are therefore discussed separately as manual suspects.

Comparison of prediction methods
To evaluate and compare the performance of the different TP prediction methods, we calculated how many TPs we would have found by applying each method separately. For each method, we determined the precision according to eqn (2). Next, we wanted to know if we could have obtained a better performance in terms of precision if we had stopped the prediction algorithm earlier. To answer this question, we generated smaller suspect lists by only keeping TPs that would have been obtained with a given cut-off threshold, and we evaluated the number of correctly predicted TPs and the precision of these reduced suspect list. By varying the cut-off threshold for the number of generations for all methods, we obtained the prediction performance for TPs generated in 1, 2 and 3 generations. We further varied the cut-off threshold for the maximum number of TPs to predict from 1 to 50. As EAWAG-PPS does not support setting a threshold for the maximum number of TPs, the analysis of TP ranks was performed for enviPath methods only. The analysis was implemented in Python (see Data availability section for details).

EAWAG-PPS is the most popular TP prediction tool
To assess the current state-of-the-art in suspect screening of TPs in wastewater or activated sludge systems, we performed a literature search for the timespan between 2010 and 2022, and we found 27 publications that used predicted TPs to screen samples (Table 2 and ESI-I Table S1 †). The most widely used tools for generating suspect lists were UM-PPS and EAWAG-PPS, which were applied in 7 and 12 studies, respectively. PathPred 8 (2 studies, both in combination with EAWAG-PPS) and Metab-olitePredict 41 (2 studies, one in combination with EAWAG-PPS) were also applied, even though these tools are not specic to biodegradation and represent general biochemistry and human metabolism. Each one of Metaprint2D, 42 O3-PPS (specic to ozonation reactions) 43 and Metabolitepilot (commercial soware) were used in one study only. From this review, we conclude that the UM-PPS and its successor EAWAG-PPS are the most popular tools for TP prediction in activated sludge, as both tools combined were used in 89% of the studies considered. The most common analytical method is LC-HRMS (Q-TOF and Orbitrap technologies, 14 and 12 studies, respectively). Bottle incubations are the most common experimental setup (14 studies), followed by WWTP inuent and effluent sampling (8 studies). Most authors combine suspect and non-target screening using LC-MS techniques. In some cases, the analytical method was extended by an NMR spectroscopy approach 44 or by the use of HILIC in addition to reverse-phase columns to improve retention and separation of hydrophilic compounds andin some casesisomers. 45 Most common substance classes are pharmaceuticals (18), pesticides (5) or just micropollutants (4) in general. Even though enviPath is publicly available since 2016, it has not been used so far to predict biodegradation pathways in wastewater samples, but it has been applied for TP prediction in soil and surface water samples. 46,47 To evaluate the overall success of suspect screening across biodegradation studies, we compared their performance in terms of detected TPs per parent compounds. As some studies only looked at very few parent compounds and performed the TP screening in greater detail, we only looked at studies with more than 10 parent compounds for a fair comparison with the workow presented here. The eight studies that fullled these criteria had an average ratio of found TPs per parent compound of 1.5, ranging between 0.3 and 5.3. Finally, it should be noted that our search may have missed relevant articles that did not contain our search terms in the title or abstract.
The search also revealed ve relevant articles that review available tools for pathway prediction from three different angles: (i) metabolite prediction methods for drug metabolism, 48,49 (ii) pathway prediction methods in the context of pathway design for metabolic engineering, 50 and (iii) TP prediction for environmental contaminants. [51][52][53] Comprehensive overviews of existing tools for eld-specic applications are hence available from the indicated reviews and are therefore not further discussed here. Interestingly, some of the tools such as PathPred and EAWAG-PPS/enviPath were mentioned across scientic elds, while others were exclusively applied in their eld of origin. Also, Sveshnikova et al. point out that only few predictive biochemistry frameworks are being actively maintained and continuously applied in experimental work, 50 which is crucial to ensure reproducibility and continued evaluation of the prediction method. Out of the prediction tools applied to TP prediction in activated sludge, only UM-PPS/EAWAG-PPS/ enviPath, PathPred and MetabolitePredict are actively maintained. Out of these, only UM-PPS/EAWAG-PPS/enviPath are specic to microbial biodegradation prediction. As these tools are also the most widely applied methods for TP prediction in the context of environmental chemistry, they are the focus of our study.

Thousands of potential TPs predicted by EAWAG-PPS and enviPath
Based on the results from the literature search, we focused on EAWAG-PPS and its successor platform enviPath to generate suspect lists and to evaluate their respective performances in correctly predicting TPs. We chose EAWAG-PPS as a benchmark and compared it to the four enviPath models trained on different data packages. The enviPath models were trained on four different combinations of the following data packages: EAWAG-BBD containing 220 pathways, EAWAG-SOIL containing 317 pathways, and EAWAG-SLUDGE containing 91 pathways. Models were trained on BBD only, BBD + SOIL, BBD + SLUDGE, and BBD + SOIL + SLUDGE packages (Table 1).
To obtain a suspect list, we applied the ve pathway prediction models to the 46 pharmaceuticals. All the prediction systems combined generated a total of 5570 TPs, out of which 348 (6.25%) TPs were predicted by all methods. The EAWAG-PPS predicted an average of 47 TPs per compound, ranging from four to 441 TPs. For example, ngolimod only has two hydroxyl moieties acting as reactive sites, resulting in four predicted TPs. In contrast, naloxegol features a long polyethylene glycol chain that can be cleaved at alternative reactive sites according to reaction rules, leading to 441 predicted TPs. The four enviPath models were limited to a maximum of 50 TP per compound, which was reached for almost all compounds. One of the exceptions is metformin, for which the enviPath pathway expansion converged at three TPs, meaning that no more reactions occurred according to the available biotransformation and relative reasoning rules. However, metformin may be a special case, as this small molecule only has a few reactive sites and a particular structure that may not be well represented in the training data.

Biodegradation behavior observed for 34 compounds
A total of 42 out of the 46 spiked compounds were detected in the bottle reactors using the Compound Discoverer workow. Acalabrutinib, ceritinib and orlistat were ltered out by the Compound Discoverer workow due to low intensity of m/z ions and could only be found by manual exploration of the chromatograms and mass spectra in the raw les of sludge samples or in freshly spiked calibration samples. Ridaforolimus was detected only in pure aqueous standards at 1 mg L −1 . This behaviour could be explained by low ionization efficiencies, instability of the API or rapid losses such as volatilization or sorption to glass and/or plastic materials. We therefore excluded these four APIs from further analysis.
Six other APIs, atovaquone, clotrimazol, efavirenz, mometasone, nilotinib and regorafenib were detected in the samples from the sludge reactors; however, in the biotransformation reactors no clear degradation trend was observed over the time course of the experiment, and in the sorption control reactors these APIs show a decrease in the area by at least one order of magnitude from time-point 0 h to 24 h (ESI-II, Sections S4.2, S4.3, S4.5, S4.7, S4.9 and S4.10 †). All these six compounds have a (predicted) soil adsorption coefficient log K oc between 3 and 5.5 (ESI-I Table S3 †), which would be consistent with noticeable losses by sorption to sludge. Substantial sorption to soil organic material hinders microbial biotransformation, and hence the formation of TPs, due to low bioavailability. 67 Mometasone and nilotinib were also dissipated abiotically in the high pH abiotic controls (ESI-II, Sections S4.7 and S4.9). Finally, atomoxetine, duloxetine, mirtazapine, rivastigmine and terbinane, all APIs with amine moieties, show non-linear kinetics in the biotransformation reactors at high pH (ESI-II, Sections S3.4, S3.11, S3.19, S3.26 and S3.29 †), which could indicate that some level of ion-trapping occurred in parallel to biotransformation. 68 For the remaining 31 pharmaceuticals, we obtained clear trends of decreasing concentration over time (for details, see ESI-II †). However, we proceeded with TP identication for all APIs, independently of their biotransformation behavior.

Suspect screening identies 67 TPs
A total of 79 TPs were tentatively identied, out of which 67 were found with the help of the suspect list and twelve additional TPs were tentatively identied using the list of manual suspects (see Methods section for details). TPs were found for 31 parent compounds. Condence levels were assigned to the TPs according to Schymanski et al. during the screening process (Fig. 2). 28 The structures of only seven TPs (9%) were conrmed with a reference standard (level 1) and one additional TPs (1.3%) showed a good match with the spectral library mzCloud (level 2a). Diagnostic evidence (level 2b) was found for the structures of eleven TPs (14%). Most TPs (56, 71%) were reported with tentative structures (level 3) and for four (5%) the MS 2 spectra were not conclusive (level 4). Levels 3 and 4 include TPs for which several possible isomeric structures were considered possible. For example, Clp_TP_3 is the oxidation product of clopidogrel. Hydroxylation, N-oxidation, S-oxidation or oxidative N-dealkylation are plausible reaction mechanisms for the observed modication to the chemical formula, but not enough structural evidence was found to determine a specic structure and its corresponding reaction mechanism (Fig. 2). Three TPs (Val_TP_5, level 4; Val_TP_7, level 1; and Val_TP_12, level 3) were assigned to both valsartan and irbesartan, since they could originate from both parents and the experimental setup did not allow for distinguishing their origin. These three TPs were counted double in the results, as they could originate from both parent compounds. The condence levels depend on the availability of reference standards and database spectra, as well as on the quality of reported and measured MS 2 data. For 34 TPs, the best fragmentation was achieved using a stepped collision energy approach, where the analyte is exposed to three different collision energies for each data-dependent scan.
In a next step, tentatively identied TPs were manually assembled into pathways with the help of the suspect lists, which contain information on the biotransformation that is responsible for the formation of each TP (ESI-II †). In the manually drawn pathways given in ESI-II, † ambiguous isomeric structures were reported as a general structure with possible modications on specic moieties. All the resulting pathways and associated experimental parameters have also been made available on enviPath, where they were integrated into the EAWAG-SLUDGE package (https://envipath.org/package/ 7932e576-03c7-4106-819d-fe80dc605b8a). Because enviPath requires unambiguous structural information for compounds, ambiguous isomeric structures are represented by all possible alternative structures, which were merged into a single compound entry in the EAWAG-SLUDGE package. Finally, CAS numbers were found for 27 TPs (34%), out of which 21 TPs (27%) have been previously reported in the context of their parent compound. Of these, 13 (16%) TPs have been found in sludge or waste-water in previous studies (the 3 common TPs of valsartan and irbesartan are counted double). Therefore, 54 TPs associated with 24 APIs are reported here for the rst time.
Our suspect screening resulted in a ratio of 1.5 tentatively identied TPs per parent compound, which is similar to the average ratio found in other studies with more than 10 parent compounds (1.5 found TPs per parent) ( Table 3 and ESI-I Table  S1 †). It should be recognized that this similar ratio was obtained in this work despite performing no systematic non-target screening, and despite operating at low API and, consequently, TP concentrations. For example, the study with the highest ratio of found TPs per parent (5.3) involved non-target screening at a spike concentration of 120 mg L −1 . Increasing the concentration could improve the chances of observing TPs, but it would not represent the real WWTP inuent concentration of most APIs, 69,70 and degradation kinetics vary at different initial spiked or unspiked concentrations of micropollutants. 71 Thus, the conditions used here are likely more conducive to identify biotransformation pathways from activated sludge experiments that are relevant to full-scale WWTPs.

enviPath model trained on BBD + SOIL performed best
To evaluate the performance of the different pathway prediction models, we compared their total number of correctly predicted TPs and we found that enviPath models performed best, predicting around 50 identied TPs, while EAWAG-PPS only predicted 43 correctly (Fig. 3). Out of the four enviPath models, those including additional biodegradation data from soil and/ or sludge performed slightly better, indicating that additional data can improve model performance. We then traced back which TPs were predicted by which method and found that 22 (32.8%) of all TPs were predicted by all prediction methods. Another twelve (17.9%) of TPs found were correctly predicted by all enviPath methods, which hints at their similarity in predicting TPs. In other words, suspect screening could identify roughly half of the TPs by using any of the enviPath methods. However, some of the TPs were exclusively predicted by one method. Most notably, the EAWAG-PPS exclusively predicted ve (7.5%) identiable TPs that were not covered by any envi-Path method. Thus, combining multiple prediction methods leads to the most comprehensive suspect list.
However, a long suspect lists increases the manual workload, and it is therefore crucial to balance the number of detected TPs with the number of suspects to search for. The prediction precision indicates the number of found TPs per predicted TP and can be used as a metric to describe the efficiency of the prediction method. The overall precision of the TP prediction was found to be 1.35%, meaning that more than one in hundred predicted TPs was correctly predicted ( Table 3). As the number of predicted TPs is comparable for all substances (except for metformin), the precision mainly reects the number of correctly predicted TPs. The precision varied for different APIs: for some compounds, such as quetiapine, the precision was as high as 5%, indicating that this compound has many stable transformation products and its structural features were well represented in the training data of the pathway prediction models, therefore leading to a high number of correctly predicted TPs. All models performed similarly with a prediction precision between 2 and 2.6%, with enviPath models generally performing better than EAWAG-PPS (Table 4). The model trained on the BBD and SOIL packages had the best overall performance regarding the number of TPs found (53) and, consequently, also precision (2.58%).

View Article Online
It should be noted that these low values for precision represent a worst-case scenario, as the suspect list can be further ltered to increase the precision. For example, removing compounds with a mass below the quantication limit of the analytical method (100 g mol −1 ) slightly increases the prediction precision of the suspect list from 1.35 to 1.37%. If a small suspect list is required, the precision can be further increased by adapting the parameters of the pathway search: In EAWAG-BBD, the generation threshold can be set to 1, 2 or 3, and in enviPath the maximum number of TPs to predict can be dened. However, limiting the number of generations or TPs to predict comes at the cost of losing correctly predicted TPs. To characterize this trade-off, we analyzed the effect of different thresholds for these two parameters on the precision and the number of correctly predicted TPs. For the number of generations, the threshold analysis showed that the precision peaks at the rst generation for all methods (5.4-7.3%), where EAWAG-PPS correctly predicts 19 TPs and the enviPath models between 26 and 29 TPs (Fig. 3). Regarding the threshold of the maximum numbers of TPs to predict, the precision peaks between 10.9 and 13.0% if only the top 2 TPs are predicted. The number of correctly predicted TPs reaches a plateau at a threshold of 30 predicted TPs, beyond which the workload increases but not a TPs that were not predicted by any of the evaluated prediction methods but found in literature or using Compound Discoverer's conjugation reaction prediction are here called manual suspects. b TP count without duplicate TPs from irbesartan and valsartan. many more TPs are identied. This characterization of the trade-off between precision and correctly predicted TPs can be used as a guide to select the parameters that are best suited to the objective and the resources of a suspect screening project. To give a practical example, the workload of manual TP conrmation can be cut in half by setting the maximum TP threshold to 25, while still obtaining 86.3-92% of correctly predicted TPs at the maximal threshold explored here (50).

Observed TPs can be explained by 24 biotransformation rules
A total of 114 different biotransformation rules were applied to predict potential TPs. Interestingly, 24 of these rules were sufficient to predict the biodegradation pathways leading to the overall 45 well-dened and 34 ambiguous TP structures found (Fig. 4, ESI-II Section S3.1 †). The products of oxygenation reactions (+O) turned out to be the most challenging to assign a well-dened structure to due to the multitude of possible isomers. For example, the use of the oxidative N-dealkylation rule (bt0063) only lead to well-dened structures in 48% of the cases, because the resulting TPs could not be distinguished from other possible oxidation products. The prediction of hydroxylation of methylene (bt0242) only lead to ambiguous structures for the same reason. Elucidating structures from these kinds of reactions would be especially important, because 70% of all found reactions belong to this category. Resolving the structures of TPs that resulted from hydration (+H 2 O) or hydrolysis (+H 2 O-X) was less challenging and lead to well-dened structures in 85% of the cases due to few plausible  reaction sites or characteristic cleavage moieties. Desaturationtype reactions (−H 2 ) were only predicted and found for the oxidation of primary (bt0001) and secondary alcohols (bt0002). The type of reaction could be determined through the atomic modications relative to the precursor molecule, but the site of transformation was only identiable in 62% of the cases. The beta-oxidation process (bt0337) was observed once and was not considered in Fig. 4, because it does not t into any of the proposed categories.
Complementary approaches reveal and ll knowledge gaps in TP prediction models Careful analysis of the time trends in chromatogram areas revealed TP-like behavior for several unidentied compounds, indicating that not all formed TPs were predicted by the employed pathway prediction methods. To identify the structures of analytes with TP behavior, we searched literature for known TPs, and we predicted conjugation reactions. APIs are particularly prone to undergo conjugation, as they oen contain primary and secondary amines. However, this type of transformation is not covered by any of the TP prediction methods analyzed here, because they all focus exclusively on catabolic reactions. As a result, we tentatively identied four TPs were that underwent either N-acetylation or N-succinylation. For conjugation reactions, the MS 2 spectra are closely related to those of the parent because they share the same molecular backbone, thus facilitating TP identication. Therefore, screening for conjugates can help identify additional TPs by considering reaction classes that are beyond the scope of the TP prediction tools. Another eight TPs were either previously reported in literature or derived by expert logic (e.g., suspected hydroxylation when observing corresponding mass signature and TP-like behavior over time). Three of them were previously reported in literature and reference standards were available to the authors, but they were neither predicted nor part of any of the used databases. For example, the TP guanylurea of metformin was not predicted, even though it is known to literature. 72 These cases highlight the importance of expanding the databases towards more diversity in terms of chemical structure, application class, and biodegradation environment. In the particular case of pharmaceuticals, it could be helpful to also consider metabolites produced by human metabolism or human microbiomes, because of the potential overlap of degradation mechanisms present in human and wastewater systems. For example, the only detected TP of aliskiren was not predicted by any TP prediction model but reported to also occur in human metabolism. 52 Computational tools for drug metabolite prediction could therefore be applied to complement environmental TP prediction with prediction tools for human drug metabolism (e.g., Metabolitepredict, 41 NICEdrug.ch, 73 Biotransformer 3.0 (ref. 74)).

Conclusion
We present an updated workow to identify TPs in activated sludge biodegradation experiments using suspect screening. We applied the workow to 46 pharmaceutical substances and tentatively identied 79 TPs for 31 parent compounds. Of these, 66 (83%) are TPs reported for the rst time in activated sludge, and only 13 TPs have previously been reported in similar or wastewater studies. We further compared our workow with a comprehensive list of similar studies, and we discussed limitations of the analytical and computational methodology.
This workow was applied to a specic biotransformation experiment and achieved a good ratio of found TPs per parent despite having an initial spiked concentration of 8 mg L −1 only, which is more than an order of magnitude lower than the concentrations of the original experiment conducted by Helbling et al. 7 and the majority of studies reviewed here. Regarding the analytical methods, 15 out of the 27 analyzed studies complemented suspect screening with non-target screening to detect more TPs. Since conjugation reactions are not currently predicted by the EAWAG-PPS or enviPath, we suggest to complement the suspect list with TPs formed by acetylation, formylation, fumarylation, malonylation and/or succinylation. Another approach to detect more TPs would be to perform a systematic literature review on each parent compound to expand the suspect list towards TPs found in environmental biodegradation studies or mammalian metabolism.
Although our prediction precision is comparable to the precision reported by other studies and sufficient to perform a successful suspect screening, a higher precision would decrease the manual effort required to verify mass spectra. A systematic approach to improve the precision of the TP prediction methods would involve the collection of more highquality biodegradation data to better cover the chemical diversity of organic micropollutants, and hence to increase the prediction accuracy of the machine learning models. However, if resources are limited, predicting 30 TPs per parent compound with the currently available models will achieve reasonable predictions without any signicant loss in sensitivity. Currently, the training data sets for BBD, SOIL and SLUDGE together contain 623 degradation pathways, which only represents a small fraction of the chemical compound space. The combination of all these and the incorporation of the EAWAG-PPS led to the most comprehensive suspect list.
To share our results with the scientic community in a computer readable format, we enriched the EAWAG-SLUDGE data package with the newly obtained biodegradation pathways for 34 pharmaceuticals in activated sludge, thus feeding our learnings back into the design-build-test-learn cycle to evolve towards robust biotransformation prediction tools adapted to different environmental situations. As data acquisition is crucial to develop better models, future work will focus on improving the integration of the prediction platform enviPath with MS screening tools and on facilitating systematic and standardized data upload to enviPath. We hope that our work can guide TP identication efforts in the future and encourage researchers to share biodegradation data openly to improve prediction models.

Disclaimer
This manuscript only reects the authors' views and the JU is not responsible for any use that may be made of the information it contains.

Data availability
The biotransformation pathways were uploaded to the enviPath database and integrated into the publicly accessible EAWAG-SLUDGE package available at https://envipath.org/package/ 7932e576-03c7-4106-819d-fe80dc605b8a. Results are further detailed in the ESI-I and II (Supplementar-y_Information_I.docx and Supplementary_Information_II-TP_data.docx). † Raw MS output can be obtained from the authors upon reasonable request. All scripts used to predict TPs, create suspect lists, and analyze data are publicly available at https://github.com/FennerLabs/TP_predict. The TP prediction uses the enviPath platform and therefore requires the installation of the enviPath python API (enviPath-python version 0.2.0, https://github.com/enviPath/enviPath-python). Detailed instructions can be found in the README le of the git repository. This resource also provides the code to convert the output of the enviPath pathway prediction and EAWAG-PPS into suspect lists that are compatible with the Compound Discoverer soware.

Author contributions
CC, KF and JH designed the study. CC and LT performed sludge experiments, LC-MS measurements and analysis in Compound Discoverer. LT and JH performed data conversion and analysis. JH predicted transformation products. LT, CC, KF and JH wrote the manuscript. KF reviewed all the TP structural assignments and acquired the funding.

Conflicts of interest
There are no conicts of interest to declare.