SARS-CoV-2 RNA abundance in wastewater as a function of distinct urban sewershed size †

During the COVID-19 pandemic, wastewater-based epidemiology has emerged as a promising approach for monitoring SARS-CoV-2 prevalence on a community-level. Despite much being known about the utility of making these measurements in large wastewater treatment plants, little is known about the correlation with finer geographic resolution, such as those obtained through sewershed sub-area catchments. This study aims to identify community wastewater surveillance characteristics between sewershed areas that affect the strength of the association of SARS-CoV-2 RNA detection in a metropolitan area. For this, wastewater from 17 sewershed areas were sampled in Louisville/Jefferson County, Kentucky (USA), from August 2020 to April 2021 ( N = 727), which covered approximately 97% of the county's households. Solids were collected from the treatment plants from November 2020 to December 2020 ( N = 42). Our results indicate that the sewersheds differ in SARS-CoV-2 trends; however, high pairwise correlation spatial trends were not observed, and the mean SARS-CoV-2 RNA concentrations of smaller upstream community sewershed areas did not differ from their respective treatment centers. Solid samples could only be collected at treatment plants, therefore not allowing us to evaluate SARS-CoV-2 abundance as a function of the sewershed scale. The population size sensitivity of SARS-CoV-2 concentration detection is non-linear: at low population levels the measures are either too sensitive and generate a high level of variability, or at high population levels the estimates are dampened making small changes in community infection levels more difficult to discern. Our results suggest selecting sampling sites that include a wide population range. This study and its findings may inform other system-wide strategies for sampling wastewater for estimating non-SARS-CoV-2 targets.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a respiratory infection known to often, though not universally, be present in the feces of coronavirus disease 2019 (COVID-19) infected patients. 1,2 Although relatively new, wastewater-based epidemiology (WBE) for monitoring the presence of SARS-CoV-2 on a community-level is rapidly expanding. 3 In a piped sanitation system, the generation point for tracking the feces of COVID-19 infected patients is a flush toilet whereby fecal matter travels in one direction from a household, and depending on topography neighborhood level raw wastewater potentially passes through a pumping station, and ultimately ending at a centralized treatment plant. Based on sanitary sewer system design and function, obtaining a finer resolution of a sub-population involves sampling locations along these pipes from geographically distinct areas (i.e., the catchment areas or sewersheds). 4 Even before the COVID-19 pandemic, the advantage of monitoring wastewater across different sized sewershed areas has been demonstrated in previous studies on nutrient removal and energy and resource recovery in wastewater treatment, and on monitoring trace organic chemicals. [5][6][7] Community-level sampling scales of SARS-CoV-2 concentration in wastewater have been studied, [8][9][10][11][12] but without explicit consideration for the utility of nested sewershed areas along a range of metropolitan population levels, from thousands to hundreds of thousands, contributing to a common treatment plant to determine how sample pooling affects target properties. Furthermore, existing wastewater regulatory compliance monitoring is not typically conducted at sewershed sub-area catchments scales; rather, it is conducted at centralized treatment facilities or in specific industrial area effluent, which have convenient, controlled, access points for sampling. Moreover, little information is available regarding the utility of sampling large urban sewers across mixed sewershed sizes for epidemiological surveillance.
This study aimed to identify community wastewater surveillance characteristics between large wastewater treatment plants and finer geographic resolutions obtained through sewershed sub-area catchments that affect the strength of the association of SARS-CoV-2 RNA detection in a metropolitan area. Establishing the optimal scale for the sampling sewershed area may help identify locations where the virus could be lingering at a sub-population level as defined by a smaller sewershed area and determine how viral levels of SARS-CoV-2 change when measured at subsequent centralized treatment facilities.

Sewer network areas
The piped sewer system network geographic information system (GIS) shapefiles were provided by the wastewater utility, Louisville/Jefferson County Metropolitan Sewer District (MSD). Based on this, idealized sampling sites including wastewater treatment plants and finer geographic resolutions obtained through corresponding (nested) sewershed sub-area catchments across a range of population levels were selected. After the creation of catchment areas in GIS, the population was quantified based on the 2018 U.S. Census Bureau American Community Survey (ACS) 13 for individuals whose residences fell within each sewershed. For the full sample site selection protocol, see Yeager et al. 4

Wastewater sample collection and handling
The 17 locations in Louisville/Jefferson County, Kentucky, USA ( Fig. 1; Table 1), were sampled over 34 weeks to monitor how SARS-CoV-2 levels in wastewater correspond to sewershed area types and sizes. Five of these sites were influent in the wastewater treatment plants. Twelve were upstream corresponding (nested) sewershed sub-area catchments at community sewer line locations or intermediate pump stations that were primarily residential, with minimal industrial inputs. Samples (N = 823) were collected from August 17, 2020 to April 5, 2021, with the exception of site MSD17, which was added on September 15, 2020. Additionally, no samples were collected in the last two weeks of December (ESI A †). Samples collected by MSD personnel typically comprised of 125 ml of subsample from a 24 h time-weighted composite sampler in an ice bath. In the event of a composite sampler equipment malfunction, a grab sample was collected. Samples were transported on ice to the University of Louisville for analysis.

Solids sample collection and handling
Solid samples (N = 42) were collected from five wastewater treatment plants twice a week from November 30, 2020 to December 16, 2020. Site MSD01 uses an anaerobic digestion process, and three sample types within this treatment system were collected: primary sludge, wastewater activated sludge, and centrifuge cake. The four regional treatment plants (MSD02, MSD03, MSD04, and MSD05) use an aerobic digestion process (with atmospheric air added to keep odors down), only one type of solid sample could be collected.

Wastewater sample concentration, extraction, and PCR quantification
Our wastewater sample analysis method was published by Rouchka et al. 14 within the guidelines developed by Pecson et al. 15 In brief, we processed samples within 12 h of collection, used polyethylene glycol (PEG) precipitation, and performed quantification in triplicate by quantitative reverse   16 for N1 assays was 7.5 copies per ml.

Solids sample concentration, extraction, and PCR quantification
The solid samples were shipped overnight on ice and analyzed at Verily Life Sciences (California, USA), following the method described by Graham et al. 17 In brief, the process involves concentrating the solids in the sample, removing the water, and then extracting the RNA from the solids. Samples were run with 10 extractions, each with 10 droplet digital polymerase chain reaction assays.
Coronavirus recovery was assessed using bovine coronavirus (BCoV). Data for the SARS-CoV-2 nucleocapsid (N) gene and PMMoV targets were reported as gc g −1 dry weight. Although the N gene assay is not the same as N1 or N2, it overlaps some with N1. 18

Residences without sewer connections
To calculate the number of homes without sewer connections, and thus residences excluded from wastewater monitoring in the study area, a list of known addresses without a public service connection was provided by MSD for Louisville/Jefferson County as of February 16, 2021. Data included addresses, associated treatment plants, and property class codes. Property class codes of residential households that were most likely to have flushing toilets were isolated. These addresses were then geocoded in ArcGIS to the 17 wastewater sewersheds sampled in this study.  19 At the time of this study, the positive case rate data were not geocoded to the sewershed area.

Statistical analysis
The wastewater data comprised of date, sample location type (i.e., manhole, pump station, and treatment plant), sample collection method (i.e., composite, grab), and sample temperature at collection. The SARS-CoV-2, PMMoV, and CrAssphage concentrations were the outcome variables in this study. PMMoV and CrAssphage, as human fecal indicators, are independent of monitoring SARS-CoV-2 concentrations in wastewater. Since the RNA concentrations measured in copies per ml of wastewater were highly skewed, log e transformations were used to normalize the data for statistical analysis.
The solid data comprised of date, sample location (five levels), and sample type (i.e., anaerobic centrifuge cake, anaerobic primary sludge, anaerobic waste activated sludge, or aerobic digesters). The outcome variables included the concentrations of the N gene and PMMoV. Data were log etransformed to achieve approximate normality. The solid sample data (N = 42) set were too small for statistical analysis.
Only quantifiable wastewater data, that is, data above the limit of quantification (LOQ), were used for statistical analyses. Of the total samples collected (N = 823), 88% (N = 727) had quantifiable data for each SARS-CoV-2, PMMoV, and CrAssphage targets (ESI † Table A1). From the excluded results, more often (86/96 excluded samples) were attributed to SARS-CoV-2, which was the most important target. First, significant differences among groups were tested using generalized linear models (GLM). Levene's test was also performed to assess the homogeneity of variances across different select groups. 20 To perform t-tests and compare the means of the two groups, the within-group variances were first assessed to be homogeneous using Levene's test. The t-test was conducted only when the homogeneity of variances was verified using Levene's test. If the variances were heterogeneous, the Kruskal-Wallis non-parametric one-way test was used to check for significant effects. Temporal variability over the study period was also analyzed across the different groups. The data were grouped based on site or population group and the best fit of the log e concentrations as a function of time was plotted. A natural cubic spline fit was used when it was significantly different from a linear fit (no slope) model to account for seasonal trends. Correlation analyses were used to test the similarities among the different sites and spatial correlations among the adjacent sites for date pairs. Then it was further determined whether the variability of SARS-CoV-2 RNA concentrations could be attributed to sewershed site type (i.e., manhole, pump station, and treatment plant). Aggregated treatment plant concentrations were compared with the mean nested upstream contributing sewersheds. The results were declared significant at a 5% level of significance. 21 The data analysis for this study was generated using Statistical Analysis System (SAS) software (version 9.4; Cary, N.C., USA). 22 The plots were produced in R studio (version 1.4.1106; R Core Team, Vienna, Austria) 23 using ggplot2. 24 In the figures, l e (log base e) represents the log e -transformed data.

Ethics
The University of Louisville Institutional Review Board classified this project as non-human subject research (reference #: 717950).

Results
Over the study period, untransformed triplicate data (i.e., viral gene copies per ml wastewater) for SARS-CoV-2 ranged from 8 to 22 707 copies per ml, PMMoV ranged from 10 2 to 10 8 copies per ml, and CrAssphage ranged from 10 3 to 10 8 copies per ml (ESI † Table C1). In the solid samples, untransformed data for the N gene ranged from 2833 to 80 974 gc g −1 dry weight and PMMoV ranged from 10 7 to 10 8 gc g −1 dry weight (ESI † Table C2).

Time series trends in wastewater samples
Throughout the study period, the sewershed site-specific log e SARS-CoV-2 concentrations did not have the same temporal trends; seven of the 17 sites had no significant time effect, and each of these were community sites (Fig. 2). Similar trend lines are mostly observed across sites for normalized log e SARS-CoV-2 by fecal indicators log e PMMoV and log e CrAssphage. The infected individuals within the county compared to the mean county level wastewater results both showed a winter season peak (ESI † Fig. C1).
Due to the significant time effect, the comparison of grab and composite sample collection was not direct, yet the grab collection method (collected in the morning) had significantly smaller log e concentrations than the composite method: log e SARS-CoV-2 p-value = 0.040; log e SARS-CoV-2 normalized by log e PMMoV p-value = 0.014; and log e SARS-CoV-2 normalized by log e CrAssphage p-value = 0.017 (ESI †  Table C3).

Concentration of SARS-CoV-2 wastewater across sewershed area
An analysis across the 17 sewershed areas indicated significant variability in the SARS-CoV-2 RNA concentration in the wastewater samples (Fig. 3). The Levene test to check variance homogeneity across sites indicated unequal variances ( p-value < 0.001). Further comparison of the log e concentration distributions at different sites using the Kruskal-Wallis non- A spatial correlation analysis for every pair of sites was conducted to assess similarities by date-pairing samples across the 17 sampling sites (Fig. 4; ESI † Fig. C2 and C3). Twenty-five sampling event pairs were possible. While treatment plants (MSD01 to MSD05) had similar sewage infrastructure, high pairwise correlation values (>0.7) for SARS-CoV-2 RNA (N1) were observed only between MSD02 and MSD03 and MSD02 and MSD05. When examining correlation values for SARS-CoV-2 RNA (N1) corresponding to community sites (MSD06 to MSD17) similar sewage infrastructure did not universally have a high pairwise correlation, and for those sites that were geographically adjacent none showed high pairwise correlation values (>0.7).
When SARS-CoV-2 RNA levels were grouped based on the sample location type (manhole, pump station, or treatment plant) within the sanitation system, Levene's test was used to check the homogeneity of variance, where a p-value < 0.001 implied considerable heterogeneity. A Kruskal-Wallis test was used to determine any significant differences between      indicators log e PMMoV and log e CrAssphage indicate no significant difference among the overall distribution across different location types, the p-values in both cases were greater than 0.1 (Fig. 5).
To examine SARS-CoV-2 RNA concentrations across the sewershed area (comparing downstream aggregated treatment plant to nested upstream contributing sewersheds), two treatment plants, MSD01 and MSD02, allowed for a detailed review. At MSD01 (MFWQTC), there were seven corresponding (nested) community sites, and at MSD02 (DRGWQTC), there were five corresponding community sites. In both cases, the distributions of the mean SARS-CoV-2 RNA concentrations in the community sites with the respective treatment plants were not statistically different (Kruskal-Wallis test; p-values > 0.8). Temporally, there were high pairwise correlation values (>0.7) at the treatment plants in comparison with the mean of those at the contributing community sites (correlation corresponding to the treatment plant MSD01 was 0.76 with a p-value < 0.01, while for the treatment plant MSD02 was 0.80 with a p-value < 0.01) ( Fig. 6; ESI † Fig. C4 and C5).

Sensitivity of detected wastewater concentrations versus sewershed population size
To understand the relationship between the population number within a sewershed area and the corresponding RNA concentration, the cumulative log e SARS-CoV-2 concentration averaged over time of the measurements was plotted against population size and compared to an estimated expected value (Fig. 7). This was also replicated for log e PMMoV, and log e CrAssphage cumulative concentration.
The time effect was accounted for by grouping data based on the season: summer (August 17-September 16), fall (September 17-December 21), and winter (December 22-March 21). The catchment areas were sorted in increasing order of population size, and the mean over time cumulative l e copies per ml of wastewater SARS-CoV-2 RNA (N1) starting from the least populated site was computed in the left-most point, and iteratively added through to the most populated site in the right-most point. The mean l e copies per ml of SARS-CoV-2 RNA (N1) for a specific site can be obtained by subtracting the Y-axis value for the previous site from that of the current site. The estimation model used to generate this plot is the cumulative log concentration of SARS-CoV-2 RNA as a linear function of the log 10 population (eqn (1)).
Cum l e N1 copies per ml = β 0 + β 1 × l 10 Population (1) where β 0 and β 1 represent the regression coefficients corresponding to the intercept and linear terms, respectively. The mean l e copies per ml of SARS-CoV-2 RNA (N1) relative to the change in population monotonically decreased and was approximately proportional to the inverse of the population. This is also explained by eqn (1), since differentiating (1) by For smaller populations, the cumulative concentration increases rapidly, while the increment decreases for larger population levels. These results indicate that the effect of population on wastewater SARS-CoV-2 concentration is more prominent for smaller than larger sewershed areas. Furthermore, considering temporal variability of detected  concentrations versus population, when population is grouped into the intervals ≤30 000, 30 000 to 100 000, and ≥100 000 individuals contributing to a sewershed, the SARS-CoV-2 RNA (N1) and normalized concentrations were significantly different ( p-value < 0.01) (ESI † Fig. C6).

Presence of SARS-CoV-2 in date paired solid and wastewater samples
Sewage solids and influent wastewater results from treatment plants were paired by date (Fig. 8). When increasing SARS-CoV-2 RNA (N1) was observed in influent wastewater, a similar increasing trend was not necessarily observed in solid (N gene) samples at the same treatment center. This observation was performed for both aerobic and anaerobic solid samples. At MSD01, where different treatment stages could be sampled, on a low-flow day solids would turnover every couple of days and during a high-flow (wet weather) day they would turnover in about 12 h; still none of the different treatment stages within the same facility showed similar trends to the 24 h influent wastewater results. Furthermore, at MSD01 (industrial input of approximately 10%; a combined sewer system), the low log e SARS-CoV-2 RNA (N1) normalized by log e PMMoV in influent wastewater was from December 2, a low concentration trend that is not observed in solids despite the short turnover time.
The results indicate that sewage solids and influent wastewater media should not be intermingled when comparing daily concentration variations.

Sample temperature at time of collection
At the time of collection, the log e SARS-CoV-2 RNA (N1) ( p-value < 0.001), log e PMMoV ( p-value = 0.008), and log e CrAssphage ( p-value = 0.014) varied significantly with sample temperature (ESI † Table C7). The log e SARS-CoV-2 RNA (N1) concentration decreased with an increase in sample collection temperature. As composite samples were stored in an ice bath, these samples were more likely to be at a cooler temperature than grab samples at the time of collection. Composite samples (N = 630) had a mean temperature of 38.1°F, whereas grab samples (N = 23) had a mean temperature of 46.2°F. Sewer infrastructure, especially depth from the ground surface and flow rates, as a function of distinct urban sewersheds, may account for some of the range in differences in sample temperature at the time of collection.

Residences without sewer connections, by sewershed
The results indicate 9947 residential properties were without sewer connections in the county (ESI † Table D1). Although some properties contained more than one household, a majority (73%) were classified as single-family residences. Because the county has 316 174 household residences, those without sewer connections represent only approximately 3% of the total residences and were therefore excluded from sampling activities. About half of these properties without sewer connections were within either of the two largest water quality treatment plant service areas within the urban center: MSD01 and MSD02. There is also a geographic gap in MSD service coverage over the rural southeast portion of the county, but the gap does not contain a high number of residences. Residences without sewer connections were not captured in other ways in the sewer system, as MSD does not currently accept hauled waste at any treatment center, although a few permitted industrial users collect from septic tanks, portable toilets, and grease traps, and then perform primary treatment prior to discharge to MSD.

Discussion
This study fills an important gap in the literature, and these results may inform other system-wide strategies for sampling wastewater for non-SARS-CoV-2 targets across urban sewershed scales. The finding that the changes in SARS-CoV-2 concentration become less dynamic with an increasing population in the sewershed area has important implications for estimating the incidence of SARS-CoV-2 infections. This important aspect of SARS-CoV-2 concentration and population size was also found by Rusiñol et al. 25 for small treatment plants (representing <24 000 individuals) which had lower median loads of SARS-CoV-2 RNA than larger treatment plants. Likewise, our results are consistent with Haak et al. 9 showing nested individual community sites with smaller populations having more peaks and valleys compared to treatment plant concentrations and with Weidhaas et al. 12 that, in some cases, higher SARS-CoV-2 levels were found in contributing sample locations than in aggregate sample locations. Furthermore, in raw wastewater, Nagarkar et al. 11 in a study of three sewersheds (two treatment plants in which one had a sub-sewershed also sampled) noted that normalization factors for correlating wastewater and clinical COVID-19 case data may not universally apply to individual sewersheds. We also did not find a high pairwise correlation spatial trend; although Haak et al. 9 noted that, the most distant sewershed sampling sites were more poorly correlated to treatment plant influent. Beyond SARS-CoV-2 sampling strategies, Teerlink et al. 5 also observed for trace organic chemical concentrations less variable treatment plant influent concentrations associated with dispersion and mixing when compared to sewershed scales. The nonlinear relationship between the sewershed area population and the cumulative amount of SARS-CoV-2, PMMoV, and CrAssphage in the wastewater suggests that the sampling frame has a significant bearing on sensitivity. Specifically, small sewershed areas are highly sensitive to incremental contributions, while the largest aggregation sites are relatively less sensitive. The COVID-19 pandemic has involved cycles of infection increase and decrease, which could affect the population sensitivity curve. These cycles roughly corresponded to the seasons, with peak infection in winter. Across seasons, to the left or right of this inflection, the measures are either very sensitive, generating high variability, or too dampened, which indicates that small changes in community infection will be more difficult to discern. This pattern was repeated by the other two fecal markers, which were not likely to be affected by COVID-19 infection patterns.
Nourinejad et al. 26 have advocated for the use of in-sewer SARS-CoV-2 network sensors, and our results suggest that such sensors would be best placed across different population levels rather than in street line locations of similar populations. Additionally, Larson et al. 27 suggested that a sampling of 5 to 10 community locations might be required to iteratively find a COVID-19 "patient zero". Even though a similar number of sample locations were used, our work suggests that there is greater utility in consistently sampling 12 community sites and not moving them.
The solids and wastewater influent data showed inconsistent paired temporal trends, though additional data may provide a clearer picture. The COVID-19 pandemic has often required the triangulation of different environmental health data, and it is expected that solids and wastewater do not have similar results. Wolfe et al. 28 and Graham et al. 17 reported settled solids as a more sensitive approach than measuring SARS-CoV-2 in wastewater influent for the association with clinical case rates of COVID-19. However, in our study, solid samples could be collected only at treatment plants and therefore could not estimate SARS-CoV-2 abundance changes as a function of the urban sewer system sewershed scale from manholes and pump stations; thus, the utility of solid samples in the overall study design as a function of distinct urban sewer size is limited by the logistics of the existing sewer infrastructure.
While Rusiñol et al. 25 proposed that at low COVID-19 community rates, small treatment plants are less informative than large treatment plants, in our study, the limit of detection was not a concern even for sewersheds serving <8000 people, demonstrating the value of capturing sewershed level population variation within the sanitation system. Likewise, determining who was excluded from the study area requires knowing properties related to who is, and who is not connected to the sanitation system being sampled. The United Nations Children's Fund (UNICEF) and World Health Organization 29 report sewer connections in the United States were estimated to account for 85% of the population, with 15% using septic tanks and less than 1% using other improved sanitation facilities (including shared). Some areas of Kentucky are known to have a gap in access, with both straight pipes and failing septic tanks in many areas. 30 However, Louisville/Jefferson County has a higher proportion of sewer connections. Although some places were missed, the sewershed areas studied here offer approximately 97% household coverage, which allowed us to reliably monitor the abundance of SARS-CoV-2 RNA in a large urban area.

Limitations
Despite its many strengths, this study has some limitations. Due to the difficulty in selecting community sewershed areas that had perfectly isolated boundaries, there was some overlap between the above-ground geographic borders. The demonstration and duration of shedding by COVID-19 infected persons is highly individual and can continue even after testing negative for SARS-CoV-2 in respiratory samples. 1,2 In theory, RNA measured in wastewater samples is primarily a subset of infected individuals at an unidentified temporal range of their peak fecal shedding period. The scope of this study was the SARS-CoV-2 RNA abundance in wastewater as a function of distinct urban sewersheds, and future research is required to compare the

Conclusion
The results of this study show that selecting the scale of urban sewershed matters, data resolution that is lost when sampling only a single sewershed size. Our results indicate that the sewersheds differ in SARS-CoV-2 trends; however, high pairwise correlation spatial trends were not observed, and the mean SARS-CoV-2 RNA concentrations of smaller upstream community sewershed areas did not differ from their respective treatment centers. Solid samples could only be collected at treatment plants, therefore not allowing us to evaluate SARS-CoV-2 abundance as a function of the sewershed scale. The population size sensitivity of SARS-CoV-2 concentration detection is non-linear: at low population levels the measures are either too sensitive and generate a high level of variability, or at high population levels the estimates are dampened making small changes in community infection levels more difficult to discern. Our results suggest sampling an equal range of populations over time is most likely to provide robust estimates of changes in prevalence. The results also suggest that there is limited benefit in oversampling small populations. This study and its findings may inform other system-wide strategies for sampling wastewater for estimating non-SARS-CoV-2 targets.

Funding
This work was supported by a contract from the Louisville-Jefferson County Metro Government as a component of the Coronavirus Aid, Relief, and Economic Security Act, as well as grants from the James Graham Brown Foundation and the Owsley Brown II Family Foundation. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.