Provenance of drinking water revealed through compliance sampling †

Understanding water hydrochemistry is essential for maintaining safe drinking supplies. Whilst targeted research surveys have characterised drinking water hydrochemistry, vast compliance datasets are routinely collected but are not interrogated amidst concerns regarding the impact of mixed water sources, treatment, the distribution network and customer pipework. In this paper, we examine whether compliance samples retain hydrochemical signatures of their provenance. We ﬁ rst created and subsequently undertook the ﬁ rst hydrochemical analysis of a novel national database of publically available drinking water compliance analyses ( n ¼ 3 873 941) reported for 2015 across England and Wales. k -means cluster analysis revealed three spatially coherent clusters. Cluster 1 is dominated by groundwater sources, with high nitrate concentrations and mineralisation, and lower organic carbon, residual chlorine and THM formation. Cluster 2 was dominated by surface water sources and characterised by low mineralisation (low conductivity and major ion concentrations), low nitrate and high organic carbon concentrations (and hence residual chlorine and THM formation). Cluster 3 shows a mixture of groundwater overlain by con ﬁ ning layers and super ﬁ cial deposits (resulting in higher trace metal concentrations and mineralisation) and surface water sources. These analyses demonstrate that, despite extensive processing of drinking water, at the national scale signatures of the provenance of drinking water remain. Analysis of compliance samples is therefore likely to be a helpful tool in the characterisation of processes that may a ﬀ ect drinking water chemistry. The methodology used is generic and can be applied in any area where drinking water chemistry samples are taken. ectinggroundwater of the provenance remain. Analysis of routine compliance samples is therefore likely to be a helpful tool in the characterisation of environmental processes occurring that may a ﬀ ect drinking water quality. water supply zone clusters and suggested 14 large scale water transfers.


Introduction
Access to safe drinking water is a human right and a requirement for life. 1 In the developed world, the quality of water supplies has improved substantially in the past 25 years, largely through the introduction of regulation and advances in treatment. 2 In Europe, implementation of the European Union Drinking Water Directive (EUDWD, European Commission 3 ) has resulted in compliance levels of over 99% in 2016. 4 Similar directives are also in place internationally (e.g. Australia, 5 USA 6 and China 7 ).
Against a backdrop of climate change and increased demand, 8 water utilities are increasingly considering the use of raw and treated water transfers to supply customers. 9 Feasibility studies of local, small scale water transfers in the UK are required to establish the viability of a transfer in terms of environmental water resource availability and both drinking water and environmental water quality. 10 However, outside of the UK this is not always the case, as highlighted by the recent Flint Water Crisis. 11 In this case, the addition of highly corrosive surface water into a distribution system without corrosion control resulted in a signicant public health incident. 12 Outside of the UK switching of supply water chemistry may be done without any systematic evaluation, 13 and assessing the impacts of drinking water chemistry on potential future large scale raw and potable transfers is considered a signicant research need. 14 The hydrochemical analyses required in order to support assessment of the water quality implications of transfers are complex. Changes in water quality associated with the mixing of raw water sources, treatment processes, passage through a utilities' distribution system and customer plumbing make unambiguous interpretation of drinking water chemistry data challenging. 15 Despite this, numerous studies have characterised drinking water hydrochemistry using specic sampling and laboratory analyses for research purposes. [15][16][17][18][19][20][21][22][23][24] A number of studies taking this approach have shown a strong link between drinking water hydrochemistry and raw water sources. Dinelli et al. 17 and Demetriades 25 showed a clear inuence of bedrock geology and aquifer composition on major and trace elements in drinking waters in Italy and Greece respectively. Birke et al. 23 showed uranium concentrations in drinking water to have a strong geological control. At the European scale, Banks et al. 21 and Flem et al. 15 showed that drinking water hydrochemistry can be interpreted in terms of source water hydrogeology and land use, as these factors inuence raw water chemistry. These authors concluded that drinking water sampling is a highly cost-effective approach to characterise controls on water chemistry at the European scale, with condent interpretation of numerous parameters in terms of hydrogeochemical processes. Stable oxygen and hydrogen isotopes of drinking water have also been shown to be a useful tracer of source waters and hydrological processes both at the national [26][27][28] and city scale 29 in the USA and China. In the UK, national scale drinking water trends broadly following the same spatial pattern as unconned groundwaters. 30 There have been substantial reductions in funding for environmental regulators in recent years in some developed countries. 31,32 Consequently, environmental monitoring programmes have declined. 33 In England and Wales the number of water chemistry measurements taken by the environmental regulator has declined by 40% between 1993 and 2014. 34 Environmental water chemistry monitoring is typically devolved to a regional level which results in substantial spatial bias in sampling, as well as both spatial and temporal variability in sampling methodologies, laboratory methods, standards, reporting procedures and data quality assurance. 35 With a limited and reducing spatiotemporal extent of environmental water chemistry monitoring, it is essential that other data sources are considered for the characterisation of water chemistry required to assess the viability of raw and treated water transfers. In addition to drinking water datasets collected specically for research purposes, large drinking water chemistry datasets have been and continue to be collected for regulatory compliance across the developed world (e.g. Europe 4 and USA 36 ). Under the EUDWD, around 100 000 water supply zones are routinely sampled for regulatory compliance across Europe. 3 The need for data for regulatory compliance results in consistent laboratory standards, extensive data quality assurance and a large spatiotemporal sampling extent. 3,37 These datasets have never been analysed in terms of their hydrochemical characteristics and, potentially, represent a vast and powerful dataset that could complement environmental water chemistry datasets and specic national 17,25 and continental scale drinking water research surveys. 15,21 If water transfers are to be developed to meet future demand, it is essential that the hydrochemistry of current the drinking water distribution is better understood. Moreover, beyond water quality compliance reports, very little public information is available from water utilities on drinking water sources and associated hydrochemistry. To this end, we examined whether drinking water samples for regulatory compliance retain the hydrochemical signatures of their provenance? In this study we present the rst national-scale assessment of the hydrochemistry of drinking water based on compliance sampling. Applied to England and Wales, we derived spatially distributed water chemistry datasets based on published water company reports. We then undertook spatial and statistical analyses to determine the likely factors controlling the spatial variation in drinking water chemistry. Finally, we provide an outlook on the use of these datasets for future analysis of drinking water hydrochemistry.

Study area and regulatory context
The countries of England and Wales were used as a study area for the research reported here (Fig. 1). Drinking water supplies are obtained from both surface water and groundwater sources, approximately in the ratio 60 : 40 overall, 38 with raw water characteristics and treatment requirements reecting these different sources. Most water utilities supply water from both surface water and groundwater sources, although in very different proportions depending on geographical location and underlying geology. The most important aquifers used for water supply in the study area are the Chalk and the Permo-Triassic rocks (referred to as Permo-Triassic or PT herein), are shown in Fig. 1. At one extreme in East Anglia, one utility draws drinking water supplies only from groundwater and predominantly from the Chalk aquifer, 39 whereas in Wales over 90% of water supplied is from surface water sources. 40 As previously discussed, drinking water quality is regulated under the EUDWD. This is transposed into UK law through primary legislation and regulations as the Water Supply (Water Quality) Regulations. 41 Water is deemed to be wholesome if it does not contain substances which contravene the concentrations listed in the Directive or National monitoring categories in ESI Table 1. † A further group of substances (indicator parameters) are also monitored and reported. Non-regulated substances, such as calcium, magnesium and alkalinity, are measured less frequently and reporting of results is not required.

Water quality sampling
The 27 individual water utilities in England and Wales undertake water quality compliance sampling to meet the requirements of the EUDWD. Measurements are made either at the customer's tap, at a supply point (SP) or at the water treatment works (WTW) exit as set down in the regulations and agreed with the UK Drinking Water Inspectorate (DWI). Monitoring at WTW and service reservoirs (SR) is to quantify levels of residual disinfectant, and control of microbiological parameters and nitrite. Substances can be monitored at designated SPs instead of taps where concentrations are not deemed to change in the distribution network. ESI Table 1 † shows both compliance and indicator parameters and location of sampling points. Guidance on the analysis of samples to ensure consistency is provided by the DWI, for a full range of aspects including analyst training, suitable equipment and calibration, method specication, internal and external analytical quality control and record retention. 37 Pesticides and microbiological parameters are not considered in this assessment.

Data extraction, collation and statistical analysis
Under the Water Supply (Water Quality) Regulations, 41 the water supply utilities in England and Wales provide the results of the routine water quality sampling detailed above as PDF reports to customers on their websites. Water utility supply areas are divided based on operational factors into designated water supply zones (WSZ), which supply up to 100 000 people, have approximately uniform quality and can comprise a combination of small communities in rural areas. Each water quality report is for a dened WSZ and, under normal conditions, on request all customers within a WSZ receive the same report. We extracted WSZ reports using a similar approach to that reported by Ascott et al. 42 Reports can be downloaded using a postcode search. The locations of WSZ boundaries are sometimes available but not consistently across the study area. We downloaded all WSZ water quality reports for water companies in England and Wales for 2015. Where WSZ boundary mapping was not available, we derived WSZ areas based on postcode data. We divided England and Wales into a series of 1 km square grid cells. For each grid cell, the postcode in the centre of the cell was extracted and the name of the corresponding WSZ recorded. We then merged the areas returning the same WSZ report to derive the WSZ area outlines. The downloaded water quality reports for each WSZ were then converted using the tabula soware 43 and collated in a MS Access database.
A large number of parameters are reported in the WSZ water quality reports as listed in ESI Table 1. † From this list we used the following criteria to exclude parameters which are unlikely to reect water provenance at the national scale: Copper, iron, aluminium, uoride, lead and manganese, as these are all parameters that may be signicantly impacted by water treatment, the distribution network and customer pipework.
Phosphorus was not considered further due to the widespread practice of phosphate dosing during water treatment. 44 Whilst chlorine and THMs are also artefacts of water treatment processes, these parameters were included in the analysis as chlorination (and subsequent THM formation) is more extensive in treatment of surface waters than groundwaters 15 and thus may be an indicator of provenance.
No substantial data gaps at the national scale (<5% of water supply zones with missing data for a certain parameter). As analysis for individual pesticides is assessed on a risk basis, monitoring is not consistent across all WSZs so these were excluded.
Applying these criteria resulted in 17 parameters that are likely to reect provenance, as shown in Table 1. We then undertook further statistical analysis of these parameters. Some authors 45 have advocated the use of compositional methods 46 to analyse water quality samples. These approaches acknowledge that the concentrations of constituents in a sample sum to a whole and thus artefacts can arise in standard analyses because an increase in the concentration of one constituent leads directly to a decrease in the concentrations of the other constituents. Also, the sum of independent predictions of each constituent do not generally sum to the whole. In a compositional approach these artefacts are avoided since the concentrations are transformed to relative ratios of (oen log-transformed) constituents or products of constituents. We do not believe that such an approach is required here for a number of reasons. First, quantities such as pH, turbidity and conductivity do not form part of composition and could not be included in a compositional analysis. Also, the magnitude of changes to the concentration of one constituent that result from a change to the concentration of a second constituent are not likely be large because the concentrations of the second constituent will also be small relative to the amount of water in the sample. Furthermore, the primary purpose of compliance monitoring is to determine whether concentrations of individual constituents are above pre-specied thresholds. Breaches of these thresholds will be harder to interpret if the analysis is conducted in a transformed space which focuses on the ratio of concentrations of different constituents of a sample rather than the magnitude of the concentrations.
The statistical analysis required measurements of all parameters in all water supply zones. Of the 17 parameters, data were missing for an average of 2.85% of water supply zones. Where data were missing we inlled using the median value of the same parameter at other sites. The median is a robust measure of the expected value that is not unduly inuenced by outliers, and the proportion of data requiring inlling is very small. Thus this inlling is unlikely to introduce artefacts into the eventual clusters. The mean and standard deviation was calculated for each determinand spilt up by aquifer type (Chalk, Permo-Triassic rocks, less productive and non-aquifers). The data were not suitable for a conventional analysis of variance because they were spatially correlated and non-normally distributed. We therefore normalised the data and then followed the approach described by Lark and Cullis 47 to test the signicance of any differences in the mean values of each variable for each rock type. Briey, we transformed the observations of each variable to a normal distribution by a non-parametric (normal-scores) approach and then estimated a linear mixed model of the transformed variable. The xed effects of that linear mixed model were categorical variables corresponding to the three rock types and the random effects were assumed to have an exponential spatial covariance function. A series of Wald tests were then applied to test for signicant differences in the mean value of the transformed variable for each pair of rock types. The spatial distribution of each parameter was assessed qualitatively by developing national scale maps of the determinands with the outcrop of the principal aquifers overlain. These maps show the raw data across the areal extent of WSZs, with no interpolation undertaken. The 17 parameters were standardised and we then undertook k-means cluster analysis for k ¼ 2 to k ¼ 5 (ref. 48) using R. 49 As the choice of an appropriate number of clusters is somewhat subjective, we developed a parsimonious, rule based approach. We identi-ed the smallest number of clusters which (1) produces spatially coherent cluster membership at the national scale and (2) the spatial patterns of cluster membership correspond to areas of groundwater and surface water supplies. Using this approach, 3 clusters were identied as representing drinking water provenance on the basis of groundwater and surface water at the national scale. Increasing the number of clusters above 3 resulted in incoherent patterns of cluster membership. Such patterns are likely to represent more local scale hydrochemical processes effecting tap water chemistry which are not the focus of this national scale study.

Database statistics and regulatory compliance
The database developed covers 1539 supply zones across England and Wales. Based on the downloaded water quality reports a total of 3 873 941 water chemistry samples were reported in 2015. There are 190 unique determinands within the database. For each determinand within a WSZ, a maximum, minimum and mean concentration is reported, in addition to the number of samples taken in the year and the number that exceeded the drinking water limit. For each water supply zone the number of determinands varies substantially. The maximum and median number of determinands reported for a WSZ was 272 and 75 respectively. This wide range in the number of determinands is the result of different water supply zones having different reporting requirements associated with different population levels. Water companies operating water supply zones which have experienced water quality problems associated with certain parameters may have a regulatory obligation to report these parameters. This is oen the case with individual pesticides, which cover 111 of 190 determinands. The sample data, however, show a high level of compliance to DWI and EUDWD standards, with 99.94% of samples compliant. This agrees well with the reported compliance statistics presented by Drinking Water Inspectorate 2 for 2014 (99.96% for England).

Spatial distribution of determinands
In this section, the spatial distribution of concentration data for key parameters within drinking water is presented. Determinands have been grouped based on similarity in their spatial distribution. Table 1 shows the mean and standard deviation of the determinands analysed split by principal aquifers (Chalk and Permo-Triassic Rocks) and less productive aquifers and non-aquifers. Also shown are the results of the signicance test of Lark and Cullis. 47 Statistically signicant differences were observed between the rock types for 10 out of the 17 parameters (p < 0.001, for PT-Chalk, PT-other and Chalk-other).
3.2.1 Nitrate. Fig. 2 shows the spatial distribution of nitrate concentrations in drinking waters in England and Wales. High nitrate concentrations are present in south and east England corresponding broadly to the outcrop of the Chalk aquifer and some parts of the Permo-Triassic rocks. Analyses of drinking waters from areas of the Chalk show a very different nitrate concentration distribution to those from the Permo-Triassic sandstones, with higher mean values (25.2 mg L À1 ) and samples most frequently in the 20-40 mg L À1 range for Chalk compared to 10.8 mg L À1 and samples in the 0-10 mg L À1 range for the Permo-Triassic. Low concentrations are present where the Chalk is overlain by low-permeability Palaeogene and supercial deposits (primarily till) in East Anglia. Areas which are shown in white show returned no drinking water quality report. These areas can be considered to be where no mains supply is present and drinking water is obtained from local private supplies.

Nickel and selenium.
Concentrations of trace substances (Ni, Se, As) are low over most of England and Wales. Elevated concentrations of substances such as Ni and Se, are found in areas of East Anglia where the Chalk is not at outcrop (Fig. 3). Mean Ni and Se concentrations are very low from supplies on the Permo-Triassic and approximately double from the Chalk (Table 1).
3.2.3 TOC, chlorine, THMs and turbidity. Fig. 4 shows TOC, chlorine, THMs and turbidity concentrations for drinking water in the study area. Elevated TOC concentrations (of up to 3 mg L À1 ) are measured in the northeast coast of England, Anglesey, southwest England, Essex, and an area of Central England around Bedford, Northampton and Peterborough (Fig. 4). Average concentrations in supplies located on the aquifers of the Chalk and the Permo-Triassic are similar, about 1 mg L À1 , whereas the average for less productive aquifers and non-aquifers is higher (1.62 mg L À1 , Table 1).
The highest residual chlorine concentrations are seen in northwest England (the Lake District Coast and Cheshire) and parts of Wales and southwest England (Fig. 4). Supplies from Chalk areas have the lowest average residual chlorine (0.28 mg L À1 ), with increasing concentrations on the Permo-Triassic and on less productive aquifers and non-aquifers (0.38 mg L À1 , Table 1). Elevated THM concentrations of up to 50 mg L À1 occur in south Wales and southwest England, the Weald, easterly East Anglia and the Pennines (Fig. 4). Average concentrations in supplies on the Chalk are 12.1 mg L À1 , whereas on the Permo-Triassic and less productive aquifers and non-aquifers they are in the range 24 to 26 mg L À1 (Table  1). Turbidity values are higher in southwest England and parts of Wales (up to 0.3 NTU) than eastern England (Fig. 4). Average values are similar across the study area with the lowest for the Permo-Triassic (0.03 NTU) and highest on less productive aquifers and non-aquifers (0.06 NTU) ( Table 1).
3.2.4 Conductivity, chloride, sodium and sulphate. Drinking water conductivity is lowest along the west coast and highest in eastern East Anglia where values of up to 900  Table 1 Mean and standard deviation for determinands for drinking water samples classified according to bedrock geology (principal aquifers (Permo-Triassic (PT) and Chalk) and less productive aquifers and non-aquifers). Results of the significance test of Lark and Cullis 47 are shown in the last 6 columns. Positive sign indicates that the parameter is greater in the first rock type is greater than the second mS cm À1 are recorded (Fig. 5). Mean conductivity values are considerably higher from areas on the Chalk than on the Permo-Triassic or less productive aquifers and non-aquifers (Table 1). Chloride concentrations follows a similar pattern to conductivity but with additional elevated concentrations in Cheshire and the East Midlands (Fig. 5). Mean chloride concentrations are higher on the Chalk and less productive aquifers and non-aquifers (34-35 mg L À1 ) than on the Permo-Triassic (24.6 mg L À1 ) ( Table 1). Sodium also follows this pattern although average concentrations do not behave similarly. Mean sodium concentrations are higher on less productive aquifers and non-aquifers (23.1 mg L À1 ) than on the Permo-Triassic and on the Chalk (18-19 mg L À1 ). Sulphate is also similar with less obvious elevation of concentration in East Anglia and more in the East Midlands and Yorkshire. Average concentrations are in the range 20-30 mg L À1 . Like sodium, mean concentrations are considerably higher on less productive aquifers and non-aquifers (49.6 mg L À1 ) than on the Permo-Triassic and the Chalk (34-39 mg L À1 ) ( Table 1).

Other factors.
A small group of the 17 parameters only provide limited insight into hydrochemical processes. Ammonium concentrations are slightly elevated in conned areas of the Chalk in the London area and in East Anglia with some concentrations above 0.05 mg L À1 . Average concentrations range from 0.03 mg L À1 on less productive aquifers and non-aquifers to <LOD in the Permo-Triassic. Arsenic concentrations are elevated in a few localities, in Cheshire, and the Bristol area. Average values are highest in the Permo-Triassic where it can be naturally occurring and lowest in the Chalk (Table 1). Average antimony concentrations are very low (0.04-0.08 mg L À1 ) but also exhibit locally higher concentrations in Cheshire. Boron concentrations are also very low (0.01-0.03 mg L À1 ) with highest concentrations in the Weald and in southern East Anglia. Fig. 6 shows the results of the cluster analysis; three spatially coherent clusters can be identied. Cluster 1 comprises WSZs in the south east of England and some parts of the Midlands, with signicant areas overlapping the outcrop of Chalk and Permo-Triassic aquifers. Cluster 2 WSZs are located in Wales and the southwest and the north of England, where there are limited groundwater resources. Cluster 3 is more spatially variable, covering parts of East Anglia and the southeast, the East Midlands and northeast England. In these areas there is a combination of groundwater resources (including the conned Chalk of East Anglia and Jurassic limestones) and surface water resources. The centroids (Fig. 7) show the differences between clusters for key determinands. Cluster 1 has high nitrate concentrations and conductivity, low organic carbon, chlorine and THM concentrations in comparison to cluster 2. Cluster 2 has low nitrate concentrations, conductivity, sodium and chloride concentrations and higher chlorine and THM concentrations. Cluster 3 has higher conductivity, sodium, chloride and sulphate concentrations in addition to higher boron, antimony, nickel and selenium concentrations. Cluster 3 also has relatively low chlorination and THMs, despite higher TOC concentrations.

Hydrogeochemical controls on drinking water typologies
In this section we relate the spatial distributions presented in Section 3.2 to potential controlling factors in water provenance. It should be noted that water utilities use a number of options for ensuring that drinking water is compliant with the water quality regulations. These can include removal/reduction of determinands by water treatment which can result in regulated substances exhibiting a truncated distribution of concentrations. In this analysis, it is assumed that water comes either from groundwater or surface water. However, in the future drinking water may also be obtained by desalination. In England currently there is only one plant used to desalinate water using the reverse osmosis (RO) process for public supply, on the Thames Estuary, and which has operated since 2010 providing up to 150 ML per day during peak times. 50 Drinking water derived from this source will differ signicantly in terms of hydrochemistry compared to that from groundwater or surface water sources, because it is derived from the tidal zone of the Thames and has undergone demineralisation. 51 The spatial distribution of nitrate concentrations (Fig. 2) shows a clear inuence of both underlying hydrogeology and land use, identiable in cluster 1 (Fig. 6). Large areas of southern and eastern England obtain the majority of their supplies from groundwater. 52 The high nitrate concentrations in drinking waters derived from the Chalk may reect the storage of nitrate in the thick Chalk unsaturated zone and slower ushing of nitrate following changes in agricultural management practices. [53][54][55][56][57] This assessment does not include areas of the Chalk where it is not at outcrop, e.g. the eastern part of East Anglia where some elevated values are shown in Fig. 2. Drinking water chemistry demonstrates a residual land use/ geology signature despite treatment of water for elevated nitrate. 58 This is unsurprising given that nitrate removal by ion exchange is unlikely to be undertaken on raw waters where concentrations are below 50 mg NO 3 L À1 . It would be anticipated that phosphate would be similarly useful were its distribution not obscured by treatment for plumbosolvency. 44,[59][60][61] The spatial distribution of nickel and selenium (Fig. 3) reects geochemical processes occurring as recharge occurs through overlying supercial deposits. For example, Ander et al. 62 showed that oxidation of sulphide minerals (e.g. pyrite) in overlying till deposits in East Anglia is the primary source of high nickel concentrations in Chalk groundwater.

View Article Online
Total organic carbon and other associated parameters (Fig. 4) shows a clear inuence of surface water, identiable in cluster 2 (Fig. 6). Higher concentrations of total organic carbon (TOC) would be expected to occur in areas of hard-fractured rocks or sandstones where supercial deposits may be peaty and/or supplies may be predominantly from surface water. 63 These areas correspond to the predominance of surface water supply. Trihalomethanes (THMs) are a long-recognised byproduct of water disinfection by chlorine and result from reaction of chlorine with organic carbon. 64 The reaction is enhanced in the presence of bromide. 65,66 Higher dosing of chlorine is required in water with a higher TOC content to obtain an acceptable residual chlorine concentration. In this dataset, the spatial distribution of THMs shows a qualitative relationship to that of TOC (Fig. 4). Although quantitatively the relationship has substantial scatter (R 2 ¼ 0.21), this is broadly in agreement with the ndings of Valdivia-Garcia et al. 67 which showed dissolved organic carbon to be an important predictor variable in the spatial distribution of THMs. Together these substances (TOC, chlorine and THMs) provide a clear indication where water derived from surface water predominates in drinking water.
Conductivity and associated parameters (Fig. 5) show a strong east-west spatial trend likely to be associated with recharge processes. Rainfall for England and Wales is predominantly from the southwest with highest amounts recorded on upland areas of Wales and the Lake District and low values in Eastern England, including London, East Anglia and Lincolnshire. The distribution of conductivity values appears to be inversely related to recharge 68 and therefore predominantly reects meteorological setting. High chloride concentrations in Cheshire may be associated with halite 4.2 Drinking water compliance sample data for hydrochemical characterisation: an outlook 4.2.1 Benets and limitations. As previously discussed, the interpretation of drinking water datasets for hydrochemistry has been shown to be challenging due to mixing of water sources, treatment, the distribution network and sampling point location. Nevertheless, the cluster analysis and the data discussed above clearly shows that compliance samples do reveal drinking water provenance in terms of the raw water sources that dominate water supply in the study area. For a number of these determinands, a relatively condent interpretation of the environmental controls on the spatial trends can be made. Flem et al. 15 suggested that sampling and centralised analysis of drinking water may be an effective low cost method for gaining insights into processes effecting drinking water chemistry. Building on this, here we suggest that signicant further understanding into these processes can be gained from analysis of compliance samples. Uniform analytical, sampling and reporting standards mean that datasets from different water companies can be compared. The use of compliance water company samples for hydrochemical characterisation over specic centralised sampling 15,21 for research has both advantages and disadvantages. Compliance samples cover a much denser sampling network both spatially and temporally than specic samples. However, the parameter range for routine samples is restricted to determinands which are of concern for human health. Consequently, there are a signicant number of parameters which are not consistently reported which would be of signicant hydrogeochemical interest (e.g. alkalinity, dissolved oxygen, calcium, magnesium, potassium). As a result, it is unlikely that data from compliance sampling could be used in conventional hydrogeochemical analyses and modelling (e.g. development of Piper/Durov diagrams, PHREEQC modelling). For example, Shand et al. 71 report on baseline groundwater chemistry for England and Wales focussing on major and minor aquifers and Smedley 72 examined UK bottled water chemistry, which tends to reect the relatively minor aquifers. These studies, to which this work is complementary, discuss primarily major ion chemistry and a range of trace elements not necessarily represented in drinking water regulatory monitoring. 4.2.2 Applications and further work. Drinking water compliance data have been used extensively in regulatory reporting. Detailed hydrochemical analysis and interpretation of this data has never before been reported. We consider there to be a wide range of potential applications of the dataset used in the research reported here.
The data could be used to support management decisions regarding the potential water chemistry implications of raw and treated water transfers. Fig. 8 shows the location of the clusters identied in this study and suggested raw and treated water transfers. 14 Where transfers are between clusters, addition of water of different hydrochemical typologies may have signicant implications for both human and environmental health. Without further water treatment, transfers of corrosive surface waters into areas previously supplied by groundwater may result in dissolution of metals from water mains. Where mains water leakage is signicant, transfers may result in a ux of water that is hydrochemically different to the water in the environment. Recent work has shown mains water leakage to be a signicant source of phosphorus (P) to the environment. 60,61,73,74 Transfer of phosphorus dosed mains water into an area without historic P  dosing and subsequent leakage into the environment could represent a signicant additional source of P. Application of the data presented in this study would be an ideal high-level screening tool to evaluate the water quality implications of water transfers at the national scale. At the level of individual transfers, substantial additional work would be required considering the water quality of both the transferred water and the current water in a supply zone, the distribution network age, material type and location. The datasets could be reviewed in the context of national scale health datasets. The Environment and Health Atlas 75 provides detailed maps of both environmental agents and health conditions in England and Wales. This already includes trihalomethanes but could be extended to consider other potential environmental agents which are reported in the drinking water dataset. Drinking water in England and Wales are compliant with current regulations but such an approach could perhaps provide evidence to be used in future drinking water quality reviews.
The data collated in this study could also be compared against raw untreated water samples. This has been undertaken at a continental scale in Europe by Flem et al. 76 but only using a small sample of drinking waters analysed centrally rather than routine compliance samples. This would give an indication of the efficiency of treatment processes. Comparison with groundwater and surface water data would also give an indication of whether water lost through leakage would be signicantly different from the water in the environment. In some cases (e.g. phosphorus addition), leakage may be a source of nutrients to the environment. In contrast, in cases where treatment has removed contaminants from the water, leakage may dilute the concentration of pollutants already existing within groundwater or surface water.
In addition to the parameters reported here, there are a large number of other non-standard parameters reported on a case by case basis. The majority of these parameters (58%) are pesticides. Reporting for pesticides is risk-based and thus some determinands may only be reported for a small number of supply zones. The sporadic nature of these reports would make a statistical analysis such as the methodology presented here challenging, but an overall qualitative interpretation would be possible. Other authors advocate the use of compositional statistical methods to water compliance data 46 which may yield different insights.
Further work could also explore changes in drinking water chemistry through time. The dataset reported in this study is for 2015. Historically water utilities have reported similar datasets to regulators back to 1993. 77 A wide range of factors are likely to be controlling changes in water quality through time such as changes in source water quality, treatment processes and water source blending. Consequently unambiguous interpretation of such time series data is likely to be challenging.
The use of compliance samples to characterise drinking water provenance is likely to be broadly applicable across much of the developed world. In Europe, the EUDWD 3 requires member states to report a number of determinands. High level compliance summaries are reported by the European Commission e.g. European Commission. 4 In the USA, national databases 78,79 are available which report compliance failures. Whilst a few countries hold publically accessible national scale databases for drinking water quality data (e.g. France 80 ), in both the USA and large parts of Europe water quality data are held at the water company level. Given the high level of fragmentation in the water sector in both USA and Europe (>50 000 utilities in USA, 81 >6200 in Germany alone 82 ) data collation from individual companies would be an extremely labour intensive task. Given that water utilities already report compliance data to regulators, it would be helpful if regulators consistently provided these reports to the public in addition to high-level compliance summaries.

Conclusions
This study has shown that compliance samples reveal the hydrogeochemical provenance of drinking waters for the rst time at the national scale. Despite extensive modication of source waters through treatment, blending and pipework, compliance data still show a hydrochemical signature of the source waters. The use of cluster analysis reveals a distinct groundwater-surface water split. The spatial distribution of a number of parameters which control this cluster partition (nitrate, nickel and selenium, TOC, THMs, conductivity) can be interpreted relatively unambiguously in terms of the source water hydrogeology. The approach used in this study is low cost and utilises existing datasets. It is highly generic and can be applied anywhere where compliance drinking water sampling is undertaken. The limited range of determinands measured during compliance sampling make this approach complementary to targeted hydrochemical investigations. The datasets developed have a wide range of applications including high level screening of the hydrochemical impacts of future water transfers, assessment of the impacts of water mains leakage on nutrient uxes into the environment and comparison with national public health datasets.

Conflicts of interest
There are no conicts to declare.