Rapid direct analysis of river water and machine learning assisted suspect screening of emerging contaminants in passive sampler extracts †

A novel and rapid approach to characterise the occurrence of contaminants of emerging concern (CECs) in river water is presented using multi-residue targeted analysis and machine learning-assisted in silico suspect screening of passive sampler extracts. Passive samplers (Chemcatcher®) con ﬁ gured with hydrophilic – lipophilic balanced (HLB) sorbents were deployed in the Central London region of the tidal River Thames (UK) catchment in winter and summer campaigns in 2018 and 2019. Extracts were analysed by; (a) a rapid 5.5 min direct injection targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) method for 164 CECs and (b) a full-scan LC coupled to quadrupole time of ﬂ ight mass spectrometry (QTOF-MS) method using data-independent acquisition over 15 min. From targeted analysis of grab water samples, a total of 33 pharmaceuticals, illicit drugs, drug metabolites, personal care products and pesticides (including several EU Watch-List chemicals) were identi ﬁ ed, and mean concentrations determined at 40 (cid:1) 37 ng L (cid:3) 1 . For targeted analysis of passive sampler extracts, 65 unique compounds were detected with di ﬀ erences observed between summer and winter campaigns. For suspect screening, 59 additional compounds were shortlisted based on mass spectral database matching, followed by machine learning-assisted retention time prediction. Many of these included additional pharmaceuticals and pesticides, but also new metabolites and industrial chemicals. The novelty in this approach lies in the convenience of using passive samplers together with machine learning-assisted chemical analysis methods for rapid, time-integrated catchment monitoring of CECs.


Introduction
Over 350 000 chemicals and mixtures of chemicals are currently registered for commercial production and use globally. 1onsequently, chemical contamination of the aquatic environment via point and diffuse sources of pollution is increasingly evident.][4] However, more comprehensive, exible and rapid chemical characterisation approaches are required to improve understanding of the environmental risks such chemicals may pose.
The majority of studies characterising CEC occurrence in aquatic media have focussed on the use of grab or composite sampling.Whilst these methods enable near real-time monitoring of CECs, they require time and labour-intensive monitoring campaigns using repeated sampling to capture the breadth of CEC occurrence and their uctuation.As an alternative, passive sampling enables time-weighted average occurrence characterisation over extended periods.Analyte accumulation in the sorbent can also improve the analytical performance through enhanced sensitivity using appropriately selective chemistries.However, passive samplers oen fail to capture pulsed sources of CECs.Several different passive sampling approaches and formats exist, for CECs, mixed-mode sorbents within metal or plastic housings congured with porous membranes are popular, including polar organic chemical integrative samplers (POCIS) or Chemcatcher® devices.
With the advent and increased availability of full-scan highresolution accurate mass spectrometry (HRMS), the potential for simultaneous targeted, untargeted and suspect screening of environmental samples for larger numbers of CECs in all environmental compartments has been realised. 5,6For the latter in particular, suspect screening with HRMS now offers the ability to retrospectively interrogate acquired full-scan sample data to potentially identify additional compounds post hoc.However, by comparison with its application to water (e.g., either directly or following solid-phase extraction), 5,7 few reports of untargeted or suspect screening of passive sampler extracts exist for CECs.Soulier and colleagues analysed POCIS extracts for CECs and demonstrated occurrence of $30 industrial chemicals, pesticides, pharmaceuticals and personal care products across two selected sites in France using database matching by retention time and HRMS. 6The likelihood of larger numbers of contaminants being present in passive sampler extracts is high, as demonstrated by the complexity of the untargeted data analysis subsequently performed by these authors.Rimayi et al. recently used Chemcatcher® samplers for suspect screening of CECs in South Africa, revealing the occurrence of >200 compounds including general medicines and psychotropic compounds in wastewater impacted river catchments, of which $180 were detected for the rst time. 7gain, suspect matching was performed using large databases incorporating accurate mass AE5 ppm, isotopic t and AE0.5 min retention time thresholds.However, in many cases, retention data is either not available for such large numbers of compounds in databases, or the analytical methods used do not match the chromatographic datasets, rendering them unusable for matching.For unknowns, HRMS allows the collection of full-scan data at high sensitivity, mass accuracy and resolution [8][9][10] enabling in silico tentative identication to be performed in many cases, either by exact mass matching or through comparison with accurate-mass databases. 10,11Current mass spectral libraries are extensive, containing reference data for thousands of compounds, thus allowing for a single sample to be screened and deliver a list of potentially matching contaminants in a relatively short period of time.However, in many cases, identication of suspects using HRMS libraries still requires a reference chromatographic retention time for comparison.Obtaining reference standards for suspect compounds can be costly and these are not always commercially available and particularly for metabolites or transformation products. 12In these circumstances, predictive retention time models have recently shown to be useful tools to raise assurance much further for shortlisted suspect contaminants identied using HRMS spectral libraries and where retention data does not exist or is not usable.For example, previous work in our laboratory showed the application of retention time prediction reduced the number of suspects shortlisted in untreated wastewater by one third, allowing prioritisation for reference standard purchase. 13In this way, the combination of a suitably accurate matched predicted retention time and appropriate MS criteria could arguably elevate a lower level match to Level 2(a) "probable structure" classication according to the widely adopted framework proposed by Schymanski et al. 14 Multiple methods for in silico prediction of liquid chromatography retention times have been published in the literature, ranging from simple log P based models 12,15,16 to complex multivariate quantitative structure-retention relationships (QSRR) models. 17,182][23] We have extensively evaluated the latter and even demonstrated good generalisability across multiple reversed-phase LC methods, instruments and sample types for >1100 unique compounds. 21ecently, we applied this approach to identify retrospectively 37 additional CECs in inuent and effluent wastewaters in London in LC-HRMS data. 13Given that the River Thames is subject to regular wastewater impact from CECs arising from combined sewer overows (CSOs), 24 the potential combination of passive sampling and machine-learning assisted high-resolution suspect screening analysis could present a powerful new method for CEC characterisation, including the ability to utilise HRMS databases more fully where LC methods do not match or where analyte retention data is lacking.With the constant development and improving performance of analytical tools and methods especially for large numbers of compounds, better prediction of gradient retention time is now possible.
The aim of this work was to improve understanding of the occurrence of CECs using passive samplers deployed in the River Thames (UK) using both targeted LC-MS/MS analysis and machine learning-assisted in silico LC-HRMS suspect screening.To achieve this, the objectives were: (a) to perform differential targeted analysis of river water and passive sampler extracts using a rapid, direct injection LC-MS/MS method; 25 (b) to develop and apply an ANN-based model for multi-analyte retention prediction in a gradient reversed-phase LC method and (c) application of the developed LC-HRMS suspect screening workow to the occurrence of new and additional CECs in two river monitoring campaigns in winter and summer in 2018/19.This new approach is likely to improve the value of passive sampler extract data as a more rapid in silico shortlisting step for new or additional CECs.

Materials and reagents
All reagents were of analytical grade or purer.Acetonitrile (MeCN) and methanol (MeOH) were from Sigma-Aldrich (Gillingham, Dorset, UK).LC-MS grade formic acid, ammonium formate and hydrochloric acid (37% v/v, HCl) were purchased from Millipore (Millipore, Bedford, USA), Agilent Technologies UK Ltd. (Santa Clara, CA, USA), and Sigma-Aldrich (Steinheim, Germany), respectively.Ultra-pure water was obtained from an 18.2 U cm Millipore Milli-Q water purication system.A mix of n ¼ 164 analytical standards and n ¼ 34 deuterated internal standards (SIL-IS) (purity $97%) were used for targeted analysis including conrmatory identication and quantication (full details, including all sources, can be found in the ESI S1 †).

Passive sampler preparation, deployment and extraction procedures
Chemcatcher® housings were obtained from AT Engineering (Tadley, UK) and were cleaned as per Castle et al. 26 Briey, Supor poly(ether sulfone) (PES) 0.2 mm membranes (Pall Europe, Portsmouth, UK) were cut to size (52 mm) using a wad punch.Membranes were soaked for 24 h in MeOH to eliminate manufacturing residues. 27Post soak, the membranes were washed using fresh MeOH and then soaked for an additional 24 h in water.Hydrophilic-lipophilic balanced (HLB) sorbent disks (47 mm diameter) were purchased from Biotage (Uppsala, Sweden) and Affinisep (Val de Reuil, France).HLB disks were conditioned with MeOH (50 mL) and water (50 mL) before assembly.Chemcatcher® samplers were prepared by placing the HLB disk onto the sampler body and overlaying with a PES membrane before screwing the retaining ring in place.Prior to deployment, assembled samplers were stored in ultrapure water.
Chemcatcher® samplers were deployed on two occasions in the River Thames UK at two proximal sites located in Central London.This region of the river is tidal, brackish and CEC concentrations at both sites were previously found not to be statistically different over a weeklong grab sampling period. 24his sampling area is also close to several CSO vents, which discharge untreated wastewater into the Thames with a frequency of roughly once a week, especially during times of heavy rainfall.During both deployments, Chemcatcher® samplers were fastened via drilled pilot holes and cable ties to 34 Â 15 cm solid plastic boards.These were then affixed to pontoons and submerged at a relatively consistent 1 m depth underwater using a 3 kg dive weight.The rst campaign (winter, 21 st December 2018-6 th January 2019) was performed at the London Fire Brigade (LFB) Lambeth River Fire Station pontoon (51 29 0 35.1 00 N; 0 07 0 19.9 00 W) using four Chemcatcher® devices.This site allowed secure access away from the shore and over the holiday period to deploy and collect samplers, as needed.The second campaign (summer, 27 th August 2019-9 th September 2019) was located $2 km downriver at the Transport for London (TFL) Blackfriars Pier (51 30 0 38.5 00 N; 0 06 0 00.6 00 W) again allowing access to a Central London region of the catchment and three devices were deployed.A eld blank was exposed during both deployments and retrieval and analysed using LC-MS/MS (as for deployed samplers using LC-HRMS).Aer the deployment periods, the Chemcatcher® housing was disassembled, the PES membranes discarded and the HLB disks were removed and air-dried overnight at room temperature alongside the eld blank to account for contamination before storage at À20 C in the dark until analysis.HLB disks (samples and eld blanks) were eluted using 40 mL of MeOH at ambient temperature under vacuum.The use of successive elution steps with solvents of different pH was not considered here to minimise complexity, but could be used to potentially increase the number of compounds eluted from the sorbent.Extracts were dried using a Genevac centrifugal rotary evaporator (SP Scien-tic, Ipswich, UK) at 40 C for 2 h.Prior to instrumental analysis samples were reconstituted in 1 mL of MeOH.The full procedure is described in Taylor et al. 28 2.3 River water sampling, preparation and CEC quantication procedures Grab samples (500 mL) were collected in pre-rinsed Nalgene® bottles (Sigma-Aldrich) at the start of each passive sampler deployment and were transported to the laboratory, acidied (to pH 2 with HCl) and frozen until analysis.Water samples were prepared for direct injection LC-MS/MS analysis as described by Ng et al. 25 In brief, river water samples (10 mL sub-sample, n ¼ 3) were rst centrifuged at 2000 rpm for 10 min.Aliquots (900 mL) of supernatant were spiked with 100 mL of SIL-IS (prepared in MeOH) to give a nal concentration of 500 ng L À1 .For quantication, background-corrected external matrix-matched calibration was performed using 900 mL of pooled river water from each campaign spiked with constant 100 mL volumes, again containing each analytical standard and SIL-IS (at 500 ng L À1 used only for quality control in LC-MS/MS) to yield nal concentrations over the range 10-2000 ng L À1 .Following this, samples were then ltered directly into deactivated HPLC vials (Agilent A-Line) using BD Plastipak™ syringes (Fisher Scientic UK Ltd., Loughborough, UK) coupled to 0.2 mm Teon membrane lters.

Instrumentation
For suspect screening, the analysis was performed using an Agilent 1290 (Innity II) LC system coupled to a 6546 LC/Q-TOF mass spectrometer.Analytical separations were performed on an Agilent Zorbax Eclipse Plus C 18 , 2.1 Â 100 mm, 1.8 mm column.A 15 min binary gradient of 0.1% formic acid and 5 mM ammonium formate (mobile phase A -MPA) to 0.1% formic acid and 5 mM ammonium formate in MeOH (mobile phase B -MPB).The elution gradient consisted of 100% MPA from 0 to 1 min, followed by a linear increase to 100% MPB from 1 to 12 min.The re-equilibration time was 3 min at 100% MPA.The column was maintained at 40 C with a constant ow rate of 0.4 mL min À1 and an injection volume of 6 mL.All passive sampler extracts were run separately in both positive and negative mode with the same mobile phases over a scan range of m/z 50-1000.The data were acquired at 10 GHz giving a resolution range of 30 000-60 000 full width at half maximum (FWHM) over the measured mass range and the scan rate was 3 Hz.Sheath and drying gas settings were both 12 L min À1 , with the temperatures at 350 C and 250 C, respectively.All data were acquired using data-independent acquisition (DIA) using alternating collision energies of 0 eV and 20 eV to collect alternating mass spectra for all ions with and without fragmentation, respectively.All data were acquired in centroid mode and processed using Agilent MassHunter soware.
Targeted direct injection LC-MS/MS analysis of water samples and extracts from the passive samplers was performed using a Nexera X2 LC system coupled to an LCMS-8060 (Shimadzu Corp., Kyoto, Japan) tted with an electrospray ionisation source.Rapid separations (Fig. S1 †) were performed on a 5 Â 3.0 mm, 2.7 mm Raptor biphenyl guard column (Restek, Pennsylvania, USA).The LC method comprised a binary gradient of 0.1% v/v aqueous formic acid (mobile phase C -MPC) and 0.1% v/v formic acid in 50 : 50 MeOH : MeCN (mobile phase D -MPD).The elution prole consisted of 10% MPD for 0.2 min, 10-60% MPD from 0.2 to 3.0 min, and 100% MPD to 4.0 min.The re-equilibration time was 1.5 min at 10% MPD.The column was kept at ambient temperature with a ow rate of 0.5 mL min À1 and an injection volume of 10 mL.Where possible, two transitions for each analyte were monitored using multiple reaction monitoring (MRM) with the dwell time varying between 1 to 20 ms depending on the analyte (Table S1 †).The threshold for a retention time match to a reference standard was set to 0.2 min.Further method details can be found in Ng et al. 25 For river water samples, these guard columns were replaced aer every 3000 injections, approximately.Qualitative and quantitative method performance for detected compounds in water samples were assessed in accordance with the tripartite guidelines published by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). 29

Procedures for suspect screening
For LC-QTOF-MS/MS data, retrospective suspect screening was performed on all passive sampler extracts using three commercial Agilent mass spectral libraries (Forensic Toxicology, Pesticides and Water Screening), each containing curated MS/MS spectra for matching to shortlisted suspects (each also acquired using collision energies of 0 and 20 eV).All samples were screened through each database using the 'ndby-formula' match criteria, where MS features are selected for against the mass spectral databases.Initially, a broad data screen of all samples was performed based on the [M + H] + or the [M À H] À ion accurate m/z and to within 5 ppm tolerance without a co-eluting fragment ion.The weighting of the scoring criteria was 100 for the mass accuracy, and 10 for each of isotope spacing (the distance between the ions of the isotope pattern) and abundance (observed height compared to the theoretical).The minimum threshold for MassHunter soware to return an identication was set to a 25% match, incorporating these weightings.Subsequent curation and shortlisting of this original search was implemented by further selecting the [M + Na] + , [M + NH 4 ] + and [M + HCOO] À adducts with a coelution of at least 75% overlap to the [M + H] + or the [M À H] À ion.In terms of match criteria, the weightings for isotope abundance and spacing were increased to 60 and 50, respectively.Similarly, the overall threshold for reporting an identication was then increased to a match of 90%.All extracted ion chromatogram (EIC) peaks for compounds were retained, including those with more than one peak.Assigning the correct EIC peak would be done through retention modelling.
Following this, and to rene this initial shortlist of compounds further, ANN-based retention time prediction was employed. 21Measured retention data used to train the model were generated from LC-QTOF-MS/MS measurements for a mix of 239 pesticide standards (see S2 for details †) injected in triplicate commercially available through Agilent Technologies UK Ltd.Simplied molecular-input line-entry specications (SMILES) from Pub-Chem were used to generate data on each compound for 16 molecular descriptors.These descriptors were selected based on a combination of curated descriptors relevant to reversed-phase liquid chromatography mechanisms, correlation with t R and genetic feature selection.In addition, and based on 239 compounds used for model development, the ratio of cases to inputs far exceeded the 5 : 1 ratio threshold proposed by Topliss and Costello. 30Full details can be found in Mollerup et al., 31 Munro et al. 24 and Miller et al. 22 Dragon version 7.0 (Kode Chemoinformatics srl, Pisa, Italy) was used to generate data for hydrophilic factor (Hy), unsaturation index (Ui), Ghose-Crippen and Moriguchi log P (Alog P, Mlog P), number of benzene-like rings (nBnz), number of oxygen and carbon atoms (nC, nO), number of double and triple bonds (nDB, nTB) and number of 4-9 membered rings (nR04-nR09).For log D (mobile phase pH ¼ 3.0), data were generated using Percepta PhysChem Proler (ACD Laboratories, Ontario, Canada).See Table S2 † for all molecular descriptor data and selection of such descriptors was based on previous work.The data were used as inputs to train a three-layer multilayer perceptron (3MLP) using Trajan v6.0 (Trajan Soware Ltd., Lincolnshire, UK) with a 16-4-1 architecture (optimised) and with retention time as the output.The training of models was performed in two phases.In Phase 1, the dataset was split into 70 : 15 : 15 (training : verication : test) and, using random sampling, the most appropriate neural network type selected from linear models, probabilistic neural networks (PNNs), generalised regression neural networks (GRNNs), radial basis functions (RBFs), and 3MLP and four-layer multilayer perceptron (4MLPs).Thousands of models were built and evaluated over several separate 10 min training phases and the performance of the best 50 summarised in each case.The best model type was then selected based on the lowest and most consistent error returned and across each set.In Phase 2, the architecture of the best model was further optimised.The dataset was partitioned into 70 : 30 and bootstrap sampling applied and in ten replicated rounds of training of 5 min intervals each.The best multilayer perceptron model used conjugated gradient descent and backwards propagation to optimise performance 32,33 (in this case, a 3MLP with a 16-4-1 architecture).

Rapid targeted CEC occurrence in water using direct injection LC-MS/MS
Abiotic river conditions including pH, temperature, dissolved oxygen, ammonium concentration and ow as well as rainfall data are given in the ESI (S4).† Importantly, no sewer overows were reported before either campaign.Following targeted analysis of water samples across both campaigns, a total of 33 unique CEC compounds were detected (20% of the total number included in the method, see Table 1).These were broadly classied as pharmaceuticals and personal care products (PPCPs), pesticides, controlled drugs, industrial chemicals and drug metabolites (Fig. 1).Within this, 18 contaminants were conrmed in winter river water samples and of these, 12 were quantiable (a compound was considered quantiable if the calculated concentration was greater than the LLOQ presented in Table 1).In summer, 33 compounds were conrmed in water samples and 19 were quantiable.Eleven compounds identied in the water samples (amitriptyline, benzoylecgonine, carbamazepine, cocaine, diclofenac, ketamine, propranolol, sulfapyridine, temazepam, tramadol and trimethoprim) were consistent with previous studies monitoring the Central London region of the Thames. 24,346][37] However, the reported concentrations were higher in comparison to this work, between 17 and 140 ng L À1 , and this is perhaps unsurprising given the temporal variability associated with grab sampling. 24hree contaminants (acetamiprid, diclofenac and imidacloprid) were/are listed on the European WFD Watch Lists, 38 with the neonicotinoids clothianidin and imidacloprid in particular now banned for use in the European Union. 39In several cases, controlled substances were also detected in water including abused drugs (e.g.benzodiazepines and ketamine) and pesticides (e.g., fenuron, present in all samples at 37 AE 6 ng L À1 ), which was also consistent with previous reports of UK river waters. 40The average concentrations of all quantied compounds were 33 AE 23 ng L À1 and 44 AE 44 ng L À1 for winter and summer water samples, respectively.In both campaigns' tramadol was, in most cases, the compound with the highest concentration at an average of 164 AE 65 ng L À1 .Overall, this direct-injection analytical method performed better in this river Table 1 Selected performance data for all compounds detected in river water using direct injection LC-MS/MS along with maximum and minimum concentration of CECs quantified in both winter and summer water samples.LLOQ compared with other direct injection methods in surface/river water.For additional accuracy and precision metrics, see  water matrix than previous assessments in wastewater.Mean AE standard deviation for limits of detection (LOD) and lower limits of quantication (LLOQ) across all compounds detected were 5 AE 3 ng L À1 and 14 AE 10 ng L À1 , respectively.This represented a signicant improvement (p < 0.05) over the LLOQs determined in wastewater previously (43 ng L À1 , on average) 25 and was most likely due to lower sample complexity. 25The performance of this method was compared to other directinjection methods for surface waters found in the literature (Table 1) though it was challenging to nd methods with signicant analyte commonality.A method by Boix et al. 41 included nine compounds in common with LLOQs ranging from 0.2-37.5 ng L À1 , and was more sensitive on the whole by comparison.However, this method employed a ten-fold larger injection volume (100 mL).In comparison to the Hermes et al. 42 method, two compounds are comparable in terms of performance (tramadol and trimethoprim) and six exhibited higher sensitivity than our method (carbamazepine, diclofenac, imidacloprid, lidocaine, terbutryn, venlafaxine).This method also employsed a larger injection volume (80 mL).It is unclear how such large injection volumes impact analytical performance across batch analyses of large numbers of samples over extended periods.Martínez Bueno et al. 43 used the same injection volume (10 mL) with six common compounds, the LLOQ reported in this study were signicantly higher for three compounds (amphetamine, MDMA and nicotine) by 8-17 fold.
For the remaining three, the LLOQs were comparable.In addition to sensitivity at the lower limits of range, the linearity of the method used herein was excellent for most compounds and over three orders of magnitude.This performance was likely aided by the removal of added imprecision from advanced sample preparation steps such as solid-phase extraction.Lastly, the speed of this targeted LC-MS/MS method potentially enables both grab and time-integrated sampling to be performed simultaneously on a much larger catchment scale if necessary (with approximately 260 injections possible in a single 24 h period).

Rapid targeted LC-MS/MS analysis of passive sampler extracts
Combination of this rapid analytical method with passive sampling offered a new approach to catchment chemical contaminant proling for potentially large numbers of devices and water samples.Unsurprisingly, and likely due to the enhanced detection capability offered by using an HLB-sorbent in Chemcatcher® samplers, a larger number of CECs was sequestered by passive samplers during both deployments (i.e., n ¼ 65 unique compounds across both seasonal deployments, see Fig. S4 for details †).Quantication was not performed for CECs in passive sampler extracts.Extracts of eld blanks contained traces of six to eight compounds, but peak intensity was <10 4 in all cases and negligible in comparison to measured signals in deployed samplers in the river.An additional 43 and 38 compounds were identied in winter and summer passive sampler extracts, respectively, that were not present in their corresponding water samples, most likely due to insufficient method sensitivity.The LLOQ and LOD of the passive samplers were not assessed as no quantication was performed using matrix-matched calibrants.With respect to detection, a signalto-noise ratio greater than 3 : 1 was checked manually in all cases and to ensure that all detected features represented a chromatographic peak.
In addition to those EU Watch List compounds detected in water samples, the macrolide antibiotics (azithromycin and clarithromycin), neonicotinoid pesticides (clothianidin and thiacloprid) and the triazine herbicide (ametryn) were identied in passive sampler extracts in across campaigns. 385][36][37] Conversely, 17 and 8 compounds were not detected in the winter and summer Chemcatcher® extracts, respectively, that were present in water samples (18 unique compounds in total).Unfortunately, given the timeintegrated averaging nature of passive sampling, pulse introduction of contaminants are missed, which could partly explain this.However, there were no reported sewer overow events during either deployment (S3 for more details †).
5][46][47][48][49] The log D of all 18 compounds unique to water samples covered a range of À0.3 to 4.21.Despite methanol being a recognised solvent for passive sampler sorbent elution, 28 incomplete elution or ion suppression for some compounds may have occurred.However, a stronger solvent is likely to elute more heavily retained matrix components and successive elution using different solvents or at different pH was considered excessive for practical application.For matrix effects, LC-MS signals stability was relatively low for most analytes following direct measurement of river water samples, despite their brackish nature (Table 1). 25Therefore, despite this limitation for passive sampler extracts, the combination of both direct injection and passive sampler methods was still considered to be very useful for rapid targeted monitoring of river catchments for a relatively large number of CECs.

In silico suspect screening of passive sampler extract with LC-QTOF-MS/MS
Passive sampler extracts were subjected to suspect screening using machine LC-QTOF-MS and subsequent data mining.Direct injection LC-QTOF-MS suspect screening of the water samples was not considered but could provide added information for comparison with passive sampler data where the method is suitably sensitive.
The rst step of the suspect screening workow involved comparing passive sampler extract data to the Agilent MS databases (forensic toxicology database ¼ 9002 compounds; pesticide database ¼ 1684 compounds; and water screening database ¼ 1451 compounds).This resulted in an initial shortlist of 8485 unique possible compounds in extracts.When these data were further curated using the methods described in 2.5, this was reduced to 237 unique compounds identied across all passive sampler extracts (149 in winter and 157 in summer).Within this set, multiple matches were returned for 95 compounds.The scale of this occurrence data not only demonstrates the advantages of using HLB-type passive samplers for time-integrated catchment occurrence characterisation but also that the scale of data generated would make routine monitoring impractical.Thus, to prioritise rapidly potential compounds present and increase condence in compound identity, machine learning was employed to predict retention time as a further data curation process to reduce the number of candidates to a practicable number for risk management purposes.Of course, candidate shortlists are all dependent on the database selected.Larger databases such as the US EPA CompTox Chemicals Dashboard would have returned more suspect candidates.Nevertheless, the use of the vendor-supplied database in the rst instance was taken as a starting point to demonstrate the proof of concept.As sensitivity was expected to be poorer for CECs than that of the targeted method, suspect screening of directly injected water samples on the LC-QTOF/MS was not performed.However, this could prove benecial for wider xenobiotic exposure characterisation in future work as technology advances.
The optimised model (a 16-4-1 3MLP) for the prediction of retention time showed excellent correlation and agreement across training, verication and blind test data (coefficient of determination, R 2 ¼ 0.885, 0.871 and 0.874, respectively, (Fig. S5(a) †)).The mean average error (MAE) across all cases in the training, verication and blind test sets were 26, 26 and 29 s, respectively (Fig. S5(b) †).The applicability domain of the prediction model was dened by investigating the molecular descriptors used to generate the prediction modes using principal component analysis (PCA) in Python (Fig. S6(a) †).Following mass spectral database suspect shortlisting, the model was applied to all 237 compounds tentatively identied in the passive sampler extracts (Table S3 †).For all compounds, the retention time difference (Dt R ) between the measured (t R ) and predicted (t P R ) retention times were calculated.Compounds with Dt R outside the 75 th percentile of model error (52 s) were discarded, as previously proposed by our group. 13Predictions may have been improved if a more diverse set of training case examples were used including other classes of chemicals.Furthermore, ab initio molecular descriptor selection for this specic method was considered, which may have also been similarly successful.However, these descriptors were previously found to generalise well across several reversed-phase LC-based methods and was the preferred option. 21his process resulted in a shortlist of 59 (n ¼ 43 in winter and n ¼ 37 in summer) compounds across all passive sampler extracts with Dt R data within this threshold (Fig. 2 and Table S4 †).The majority of compounds clustered well within a 95% condence interval of PCA data for molecular descriptors used to dene the applicability domain (Fig. S6(b) †).A range of classes was tentatively identied including ame retardants, PPCPs, controlled drugs, pesticides, industrial chemicals and metabolites.Of all 59 compounds detected, 21 were common to both winter and summer.The largest class of compounds detected in common overall were PPCPs.Eight compounds were present in all sampler extracts in each campaign and of these, two were present in all samplers from both campaigns, i.e., O-desmethylvenlafaxine (a metabolite of the antidepressant, venlafaxine) and tri-(2-chloroisopropyl)phosphate (TCPP, a ame retardant).Others were only prevalent in the winter campaign, including 4-hydroxyphenyl-pyruvic acid (an intermediate metabolite of phenylalanine), butylacetanilide (insect repellent), aniline (industrial synthetic precursor) and dicamba (a broad-spectrum herbicide).Unique to summer were amisulpride (an antiemetic and antipsychotic) and dilaurylthiodipropionate (an antioxidant prevalent in food and cosmetics).Importantly, nine shortlisted compounds could not be found in the literature for river water (Table 2).
Among those tentatively identied were a few interesting cases to illustrate the performance of the new in silico suspect screening workow.Firstly, an active metabolite of lidocaine (3hydroxylidocaine, 3-HL) was shortlisted in passive sampler extracts.A clear precursor ion was detected at m/z 251.1762 [M + H] + (Fig. 3(a)).Based on this ion alone, four chromatographic peaks were detected.Application of the predictive retention time model isolated a single chromatographic peak within a 19 s error which also corresponded to the presence of its qualier fragment at m/z 89.0964 ([CH 2 N(CH 2 CH 3 ) 2 ] + ). 50This, therefore, allowed a 2(a) identication according to the Schymanski et al. framework.3-HL is formed in humans from cytochrome P450 enzymes 1A2 and 3A4 but has not been reported in river water before, but it is unsurprising given that lidocaine itself was detected in the targeted analysis of river water in both campaigns.Lidocaine is widely used as a local anaesthetic in both animals and humans and is available on prescription and as an over-the-counter medication to treat teething pain in children, skin burns/irritations, poisonous stings/bites and haemorrhoids.Lidocaine is also regularly used as an adulterant in illicit street drugs, such as cocaine. 51A second novel metabolite tentatively identied using in silico suspect screening was 8-hydroxyefavirenz, the primary metabolite of the antiretroviral, efavirenz, 52 used to treat HIV-1 infection in the UK.A matching [M À H] À isotope abundance, several fragment ions and predicted retention time were all detected (Fig. 3(b)).To our knowledge, this is the rst reported environmental occurrence of this metabolite in river water.In human liver microsomal studies, CYP2B6 was shown to play a major role in efavirenz clearance via 8-hydroxylation ($77% (ref.53)).Globally, reports of efavirenz occurrence are limited, 54 but recently, concentrations as high as 37.6 mg L À1 have been measured in wastewater effluent in South Africa despite high sorption potential via sludge treatment.Lastly, tris(1-chloro-2propyl)phosphate (TCPP) (Fig. 3(c)) was identied and is an organophosphate ame retardant that has multiple applications, including electronics and in furniture manufacture.In these applications TCPP is typically used in a lm coating format rather than chemically bonded to the material, thus is prone to release into the environment. 55TCPP has been previously reported at ng L À1 concentrations in seawater and is known to cause detrimental effects in multiple animal taxa. 55,56n zebrash, the lethal concentration (LC50) 96 h post fertilisation of TCPP was observed to be 3.7 mg L À1 . 57Exposure to TCPP has resulted in decreases in neurobehavioral responses in sh, invertebrate and rodent species [58][59][60] as well as endocrine disruption, and developmental and reproductive toxicity. 56hen human cells have been exposed to TCPP through in vitro experiments, studies report inhibition in cell viability, growth rate, protein synthesis and cell cycle arrest. 56As such, TCPP is classied as a high hazard by the US EPA. 61Again the [M + H] + ion was detected at m/z 327.0081 along with two fragments at m/ z 98.9842 ([H 4 PO 4 ] + ) and m/z 174.9921 ([C 3 H 6 ClO 4 PH 3 ] + ).No other compound was shortlisted that corresponded to mass spectral data alone, but retention prediction was again accurate to within 13 s of the detected peak in the extract.Previous work focussing on evaluation of retention time prediction models for suspect screening in wastewater showed a success rate of between 83-73%. 24f all compounds tentatively identied using suspect screening, 15 more were conrmed using curated database entries which included retention time data.Of these, one compound was conrmed using database retention times within the Agilent Forensic database (phenytoin).Passive sampler extracts were also analysed on a separate LC-QTOF-MS method which held curated database LC retention time and accurate MS data for 14 more compounds and their presence was conrmed in all cases (see S4 for method details †).These included amisulpride, atenolol, bicalutamide, celiprolol, disopyramide, erythromycin, ecainide, irbesartan, O-desmethylvenlafaxine, practolol, proguanil, sotalol, sulpiride, tapentadol.Therefore, with respect to the Schymanski et al. identication framework, 14 the compounds initially shortlisted using the Agilent HRMS databases were mostly classied between Level 4 (unequivocal molecular formula) and Level 2(a) (probable structure), depending on the presence of unique fragment ions.With the addition of predicted and curated library retention time data, we propose that matching compounds which had only one positive library spectrum match could be elevated to Level 2(a).However, to elevate compounds to Level 1 (conrmed structure), conrmation with an analytical standard is still required.That being said, the workow presented above rapidly and efficiently aided compound occurrence conrmation workows in environmental samples.Furthermore, according to the manufacturer, the LC-MS/MS instrument used for targeted analysis is capable of monitoring 555 transitions simultaneously and there is sufficient scope to add these and several more compounds to the targeted analytical method if required, including multiple transitions for each (to this point, 292 transitions were monitored including two for each compound and at least one for each SIL-IS).Even where the number of transitions to be   exible and rapid capability for near time-integrated catchment monitoring of CECs and potentially at large scale.

Conclusion
A new methodology that successfully integrated new, exible and more rapid approaches for CEC identication and monitoring in river water was demonstrated in two separate river monitoring studies.In particular, the potential for targeted direct LC-MS/MS analysis of river water for 164 CECs was possible for the subsequent detection of 33 compounds at lowmid ng L À1 concentrations using small volume injection in 5.5 min.To boost sensitivity even further and to perform timeintegrated catchment monitoring, passive sampling was also successfully used with this new rapid targeted extract analysis method together with an in silico LC-QTOF-MS/MS suspect screening workow to detect 65 CECs and subsequently shortlist an additional 59 compounds across both campaigns, respectively, including new compounds and metabolites.Specically, the inclusion of retention time prediction reduced the number of suspects by roughly two thirds in comparison to the use of HRMS database searching alone, offering a new approach to rapidly prioritise reference standard acquisition for conrmation.This new workow offers a new capability to perform near real-time catchment monitoring and/or triage impacted sites for potential in-depth time-integrated monitoring of river sites impacted by CECs.

a
Represents n $ 5 calibrants measured in river water matrix and all tested over the range 10-2000 ng L À1 .b LOD determined using 3 Â standard deviation of the regression line divided by the slope.c LLOQ determined as 3.3 Â LOD.d Represents the mean of n ¼ 6 replicate measures of the percentage of background-subtracted responses measured for a 1000 ng L À1 spiked Thames river water sample compared to a standard at the same concentration (negative values represent suppression and vice versa).e Frequency represents the number of passive sampler extracts where occurrence was conrmed for that compound.f Hermes et al. (2018) LLOQ in surface waters. 42g Boix et al. (2015) LLOQ in surface waters. 41h Martínez Bueno et al. (2011) LLOQ in surface waters; 43not detected.

Fig. 1 (
Fig. 1 (Top) Mean concentration (ng L À1 ) of compound classes detected directly in river water using rapid targeted LC-MS/MS analysis for winter (a) and summer (b) samples (standard deviation indicated by the grey rings).Concentrations represented by concentric rings on logarithmic scales for clarity.(Bottom) Proportions of each chemical class detected in passive sampler extracts in winter (c) and summer (d), again using targeted analysis (qualitative only).n ¼ number of unique compounds within each class.

Fig. 2 (
Fig. 2 (Top) Frequency of detection of compounds in replicate passive sampler extracts (blue ¼ winter; n ¼ 4, green ¼ summer; n ¼ 3 samplers) identified using in silico suspect screening.(Bottom) Differences in compound occurrence between campaigns and overall proportion based on chemical classification.n ¼ number of unique compounds within each class.

a
Error in retention time prediction.b [M + H] + adduct.c [M À H] À adduct.d 'Probable structure' based on precursor ion + at least one product ion with and only one database match from MassBank.e 'Unequivocal molecular formula' based on precursor ion + isotope pattern match to the library.f 'Tentative candidate(s)' based on precursor ion + product ions.This journal is © The Royal Society of Chemistry 2021 Anal.Methods, 2021, 13, 595-606 | 603 Paper Analytical Methods Open Access Article.Published on 30 December 2020.Downloaded on 12/22/2023 6:42:20 PM.This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Online monitored exceeds this threshold, the speed of the LC-MS/MS method leaves scope for the incorporation of multiple rapid injections of the same small sample using different target analyte sets, each of several hundred CECs.Using passive sampling together with both targeted analysis and machine learning-assisted suspect screening, therefore, offers a new,

Fig. 3
Fig. 3 Extracted ion chromatograms of (a) 3-hydroxylidocaine, (b) 8-hydroxyefavirenz and (c) TCPP in the passive sampler extracts.(Left) Extracted ion chromatograms of the [M + H] + (a and c) and [M À H] À (b) ion and relevant fragments measured by DIA.Right: denotes the isotopic fit of the [M + H] + (a and c) and [M À H] À (b) and matching predicted retention times.

Table 2
New compounds tentatively identified for the first time in river water using the in silico LC-QTOF-MS/MS workflow in passive sampler extracts