Open Access Article
Chaojie Lia,
Tamar Kohn
a,
Shotaro Toriia,
Htet Kyi Wynna,
Alexander J. Devauxb,
Charles Ganb,
Timothy R. Julian
bcd and
Émile Sylvestre
*ef
aLaboratory of Environmental Virology, School of Architecture, Civil & Environmental Engineering (ENAC), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
bEawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
cSwiss Tropical and Public Health Institute, Allschwil, Switzerland
dUniversity of Basel, Basel, Switzerland
eDepartment of Water Management, Delft University of Technology, Stevinweg 1, Delft, 2628 CN, The Netherlands. E-mail: E.Sylvestre@tudelft.nl
fKWR Watercycle Research Institute, P.O. Box 1072, 3430 BB Nieuwegein, Netherlands
First published on 7th October 2025
As more data on virus concentrations in influent water from wastewater treatment plants (WWTPs) becomes available, establishing best practices for virus measurements, monitoring, and statistical modelling can improve the understanding of virus concentration distributions in wastewater. To support this, we assessed the temporal variability of norovirus, adenovirus, enterovirus, and rotavirus concentrations in influent water across multiple WWTPs in Switzerland, the USA, and Japan. Our findings demonstrate that the lognormal distribution accurately describes temporal variations in concentrations for all viruses at all sites, outperforming the gamma and Weibull distributions, which fail to capture high variability. However, notable differences in variability and uncertainty were observed across systems, underscoring the need for site-specific assessments. Using lognormal parameters, we identified optimal monitoring frequencies that balance cost-effectiveness and precision. For most sites, weekly monitoring was sufficient to estimate the annual average concentration of enteric viruses within a 95% confidence interval of 0.5
log10. We further examined the mechanistic basis of the lognormal distribution, highlighting processes that drive its prevalence and shape the behavior of its upper tail. By integrating these insights, this study provides a statistical foundation for optimizing virus monitoring frameworks and informing public health interventions targeting wastewater systems.
Water impactThis study shows that virus concentrations in wastewater follow predictable patterns, helping explain why they change over time. The proposed approach enables affordable, reliable monitoring and accurate health risk estimates. Its use can support site-specific strategies to track and manage viruses in wastewater, advancing public health protection through improved risk assessment and wastewater-based epidemiology. |
log10.2,3 This reduction may not always be sufficient to manage infection risks for people exposed to treated wastewater.4 Quantitative microbial risk assessment (QMRA) can help inform infection risks to support the design and implementation of risk reduction strategies, but reliable virus concentration data is required to provide accurate risk estimates. Establishing best practices for virus measurements, monitoring, and statistical modelling can improve understanding of virus concentration distributions and strengthen the application of QMRA in wastewater management.
Quantitative PCR (qPCR) and digital PCR (dPCR) methods are increasingly used to monitor viruses in wastewater due to their sensitivity and specificity. These molecular methods introduce uncertainties related to the accuracy of quantitative results.5 Factors such as adsorption to particles, PCR inhibition from organic matter, and losses during sample concentration and nucleic acid extraction can affect the reliability of measurements.6 These uncertainties can be accounted for by correcting measurements based on the recovery rate of spiked surrogate viruses, such as murine hepatitis virus (MHV)7 or non-enveloped viruses like MS2 and PhiX174.8 However, even though spiked viruses are routinely used as process controls, this data is rarely applied to correct concentration estimates for enteric viruses.5
The value of virus concentration data sets also depends on the frequency of sample collection. Monitoring intervals, which can range from daily to monthly intervals,9 should be guided by monitoring objectives, such as characterizing temporal variability in virus concentrations over a year or detecting short-term peaks indicative of viral outbreaks. High-frequency monitoring can be impractical in many contexts due to the significant costs and expertise required for virus analyses.10 Therefore, cost-effective monitoring strategies and statistical models that balance precision and practicality are essential to improve the utility of virus monitoring programs.
Previous studies have combined enteric virus concentration data from multiple WWTPs into pooled datasets to derive aggregate estimates for use in QMRA.11–13 Because each observation is given equal weight, these pooling approaches do not model between-WWTP heterogeneity and can therefore mask important site-specific differences driven by local shedding patterns, population size, sampling design, and analytical methods. Statistical approaches have been developed to select the most suitable parametric distributions for site-specific temporal variations of E. coli and protozoan pathogens.14–16 These approaches have not been systematically adapted for enteric viruses in wastewater, where distinct challenges, such as higher variability and the influence of virus-specific shedding patterns, must be addressed. By applying these methods and tailoring them to enteric viruses, this study aims to fill this gap and provide a stronger statistical foundation for monitoring and assessing exposure to enteric viruses in wastewater.
In this study, we investigated temporal variations in enteric virus concentrations in influent wastewater to advance monitoring strategies. Using original data sets from two municipal WWTPs in Switzerland, as well as literature data reported for five WWTPs in the USA1 and one in Japan,17 we characterized temporal variability for multiple virus types in diverse geographic contexts. We proposed candidate parametric distributions to model the data and applied information criteria to evaluate their suitability. Additionally, we assessed the impact of incorporating sample-specific analytical recovery rates and varying monitoring frequencies on model accuracy.
| Site location | Population served by WWTP | Virus types | Sample type | Monitoring frequency & duration | Quantification method | Study |
|---|---|---|---|---|---|---|
| Lausanne, Switzerland (STEP Vidy) | 220 000 |
Adenovirus | 24 hour composite | Monthly/1 year | (RT-)qPCR | Original data |
| Enterovirus | ||||||
| Norovirus | ||||||
| Rotavirus | ||||||
| Zurich, Switzerland (ARA Werdhölzli) | 471 000 |
Enterovirus | 24 hour composite | Every 2 days/1 year | RT-dPCR | Original data |
| Norovirus | ||||||
| Matsushima, Japan | 9600 | Norovirus | Grab | Weekly/3 years | RT-qPCR | Kazama et al. (2017)17 |
| California, USA (5 sites) | 300 000–4 000 000 |
Adenovirus | Grab | Every 2 to 3 weeks/1 year | (RT-)qPCR | Pecson et al. (2022)8 |
| Enterovirus | ||||||
| Norovirus |
Only data quantified by molecular methods—(RT-)qPCR or RT-dPCR—with documented process controls and recoveries were included to maintain methodological consistency. All data sets also fulfilled the following criteria: (i) ≥12 months of routine sampling, (ii) quantification of at least one target enteric virus, and (iii) sufficient sample size (≥10 positive detections) for parametric modelling. The viruses measured included adenovirus, enterovirus, norovirus, and rotavirus at Lausanne; enterovirus and norovirus at Zurich; norovirus at Matsushima; and adenovirus, enterovirus, and norovirus at the five WWTPs in California, USA. Certain studies involved extended monitoring campaigns (e.g., the multi-year, weekly sampling in Matsushima), whereas others had higher-frequency monitoring (e.g., the every-two-day sampling at ARA Werdhölzli). Analyzing data under these different monitoring regimes allowed us to examine whether the modelling framework proposed in this work holds across both short-term, high-frequency datasets and longer-term monitoring programs.
For the WWTP in Zurich, 180 24-hour composite influent samples were collected between February 1, 2021, and January 29, 2022. After collection, the wastewater samples were shipped on ice and stored at 4 °C for up to eight days before processing. The collected samples were processed either by protocol 1 (ultrafiltration followed by RNA extraction using QIAamp viral RNA Mini kit, used before November 10, 2021)19 or protocol 2 (total nucleic acid extraction, samples collected after November 30, 2021).20 Samples collected between November 10 and November 30, 2021, were processed by both protocols to establish a conversion factor that corrects for inter-method differences (see SI, Fig. S3). To adjust for differences in method performance, a correction factor was applied to measurements from protocol 1. Specifically, data from protocol 1 were adjusted by multiplying by 1.17 for EV and 2.65 for NoV GII. The extracted RNA was stored at −80 °C for up to 1.5 years before being measured on dPCR.
A duplex RT-dPCR assay for human enterovirus (EV) and norovirus genogroup II (NoV GII) RNA quantification was optimized by adapting the thermal cycling conditions and primer and probe concentrations from previously described RT-qPCR assays.21–24 Ten-fold diluted extracts were used as RNA templates. The assay was performed in 12 μL reaction mixtures using the QIAcuity OneStep Advanced Probe kit (QIAGEN) and 8.5k 96-well Nanoplates (QIAGEN). The duplex RT-dPCR mixture contained 3 μL of 4× OneStep Advanced Probe Master Mix, 0.12 μl of OneStep RT Mix, 1.5 μl of GC Enhancer, 1000 nM forward primer (EV: 5′-CCTCCGGCCCCTGAATG-3′, NoV GII: 5′-ATGTTCAGRTGGATGAGRTTCTCWGA-3′), 1000 nM reverse primer (EV: 5′-ACCGGATGGCCAATCCAA-3′, NoV GII: 5′-TCGACGCCATCTTCATTCACA-3′), 500 nM probe (EV: 5′-HEX-CGGAACCGACTACTTTGGGTGTCCGT-BHQ1-3′, NoV GII: 5′-FAM-AGCACGTGGGAGGGCGATCG-TAMRA-3′), 3 or 4 μl of template RNA, and DNase/RNase free water. All RT-dPCR reactions were performed in duplicate. The nanoplate was loaded onto the QIAcuity One, 2-plex Device (Qiagen). The thermal cycles include an RT step at 50 °C for 60 min, 95 °C for 5 min for enzyme activation, and followed by 45 cycles of denaturation (95 °C for 15 s) and annealing/extension (at 60 °C for 60 s). In each run, a negative control (no template) and a positive control (i.e., synthetic DNA gBlock® containing the target sequence, Integrated DNA Technologies, Coralville, IA, USA) were included. Quantities were expressed as genome copies (GC μL−1) per reaction. The RT-dPCR assays were performed using automatic settings for the threshold and baseline. Samples showing >5-fold difference between the duplicated reactions were excluded (3 out of 180 samples) from further analysis. Further quality control data for the duplex RT-dPCR assay is given in the SI. The completed dMIQE (digital Minimum Information for Publication of Quantitative Digital PCR Experiments) checklist for this RT-dPCR is available in Huisman, Scire.19
To assess the recovery efficiency for 91 wastewater samples collected at the Zurich WWTP, murine hepatitis virus (MHV) was spiked into replicate aliquots of 50 ml wastewater at a concentration of approximately 1 × 106 GC per 50 ml−1. The preparation of MHV stock solutions is described in Fernandez-Cassi, Scheidegger.25 MHV was determined using the primers and probes described in Fernandez-Cassi, Scheidegger,25 but using a protocol modified for RT-dPCR and a Naica System (Stilla Technologies, Villejuif, France), using the qScript XLT 1-Step RT-PCR kit (QuantaBio, Beverly, Massachusetts, United States).
In Pecson, Darby,8 4 L grab samples of raw wastewater were collected at each site every two to three weeks between December 2019 and January 2021. Commercial laboratories analyzed samples using standardized methods. Upon receipt, they were refrigerated at 4 °C and processed within 72 hours of collection. Norovirus (genogroups I and II) and enterovirus were quantified using RT-qPCR, and adenovirus was quantified using qPCR. Nucleic acid extraction was performed with the Zymo Quick-DNA/RNA Viral kit, and assays were run in triplicate with the average gene copies reported. To evaluate analytical recovery rates, MS2 and PhiX174 were spiked into 1000 ml of the sample at approximately 108 plaque-forming units (PFU). A minimum recovery efficiency threshold of 1% was set for all matrix spikes.
In Kazama, Miura,17 1 L grab samples of raw wastewater were collected weekly from a single WWTP between 2013 and 2016. Samples were transported to the laboratory on ice and stored in a deep freezer on the same day as collection. Norovirus genogroups I and II were quantified using RNA extraction with the QIAamp Viral RNA Mini kit, followed by RT-qPCR. Murine norovirus (MNV) was spiked in each sample as a whole-process control and quantified by qPCR. A minimum recovery efficiency threshold of 1% was applied to all matrix spikes.
For all data sets, concentrations of norovirus genogroups I and II were summed to analyze and compare the overall distribution of norovirus in wastewater across locations.
Two mixed Poisson distributions, the Poisson lognormal distribution (PLN) and the Poisson gamma distribution (PGA), previously used for QMRA,15,16 were used to model concentration variability using virus count and processed water volume data. When only virus concentrations were reported, virus counts were back-calculated using reported concentrations and processed water volume data. In this framework, the Poisson distribution accounts for the uncertainty associated with the spatial (random) dispersion of viruses in the water sample, and the continuous distribution (gamma or lognormal) characterizes temporal variation in virus concentrations.16
Virus concentrations in wastewater are expected to follow a lognormal distribution due to multiplicative processes such as shedding, decay, aggregation, and disaggregation, which, according to the central limit theorem, result in a normal distribution for the logarithm of the concentrations.26 The probability function of the Poisson lognormal distribution for a virus count x is:
![]() | (1) |
The gamma distribution, on the other hand, has a thinner upper tail compared to the lognormal, reflecting a more rapid decline in probability for high concentrations. This makes it a suitable candidate for modelling virus concentrations in systems where high concentrations are constrained by physical limits, such as dilution in wastewater systems, or a biological limit, such as shedding saturation. The probability function of the Poisson gamma distribution for a virus count x is given by:
![]() | (2) |
Three continuous probability distributions—lognormal, gamma, Weibull—were also used to model the temporal variations in reported concentrations. Although these models do not account for the discrete nature of microbial counts, they offer a simpler alternative to mixed Poisson models. We assessed whether these continuous models could adequately approximate the distributions generated by the more complex mixed Poisson distributions. The probability function of the Weibull distribution for a reported virus concentration c is:
![]() | (3) |
Furthermore, we quantified how the monitoring frequency affects the uncertainty of the arithmetic mean virus concentration predicted by the PLN model. Following Olsson,36 the 100(1 − α)% confidence interval (CI) of the arithmetic mean of a lognormally distributed variable is approximated by:
![]() | (4) |
is the standard-normal quantile (1.96 for a 95% CI). Eqn (4) makes explicit the dependence of CI width on the sample size n and the log-scale variance. In our case, we approximate s2Y using the model-estimated variance parameter σ2 of the fitted lognormal distributions. To facilitate comparison of the uncertainty across datasets, we expressed the 95% CI width on a log10 scale:| Width95 = log10(U) − log10(L) | (5) |
Table 2 presents the arithmetic mean concentration and coefficient of variation (CV) for each virus type across Switzerland, Japan, and the USA. Arithmetic mean norovirus concentrations in Japan and Switzerland (approx. 1.0 × 104 GC ml−1) are 10 to 100 higher than in the USA. CVs below 1.0 for Swiss sites indicate relatively stable concentrations, while CVs higher than 2.5 in Japan and the USA suggest much greater variability. Adenovirus shows high CVs across all USA sites (up to 2.1), indicating substantial fluctuations compared to Lausanne (CV of 1.0).
| Virus | Location of wastewater treatment plants | Sample size | Sample mean concentration (GC ml−1) | Coefficient of variation | Lognormal parameter (σ) |
|---|---|---|---|---|---|
| Adenovirus | Lausanne, Switzerland | 12 | 5.6 × 102 | 1.0 | 1.5 |
| California, USA – LACSD | 14 | 5.1 × 102 | 1.9 | 2.2 | |
| California, USA – LASAN | 14 | 6.7 × 102 | 1.8 | 1.9 | |
| California, USA – OCSD | 11 | 2.1 × 103 | 1.8 | 2.4 | |
| California, USA – SD | 11 | 9.5 × 102 | 2.1 | 2.2 | |
| California, USA – SFPUC | 13 | 1.6 × 103 | 2.1 | 2.6 | |
| Enterovirus | Lausanne, Switzerland | 12 | 4.8 × 102 | 1.1 | 1.4 |
| Zurich, Switzerland | 180 | 1.5 × 103 | 0.8 | 0.8 | |
| California, USA – LACSD | 10 | 1.8 × 102 | 0.6 | 1.5 | |
| California, USA – LASAN | 17 | 2.1 × 102 | 1.0 | 0.6 | |
| California, USA – OCSD | 13 | 5.1 × 102 | 2.6 | 1.6 | |
| California, USA – SD | 14 | 2.2 × 102 | 2.0 | 1.2 | |
| California, USA – SFPUC | 17 | 3.5 × 102 | 0.9 | 1.2 | |
| Norovirus | Lausanne, Switzerland | 12 | 1.0 × 104 | 0.3 | 0.4 |
| Zurich, Switzerland | 180 | 1.0 × 104 | 0.8 | 0.9 | |
| Matsushima, Japan | 131 | 1.8 × 104 | 2.5 | 2.1 | |
| California, USA – LACSD | 8 | 2.9 × 102 | 1.5 | 1.6 | |
| California, USA – LASAN | 14 | 1.1 × 102 | 1.3 | 1.0 | |
| California, USA – OCSD | 11 | 2.0 × 103 | 3.1 | 2.4 | |
| California, USA – SD | 15 | 3.7 × 102 | 1.6 | 2.0 | |
| California, USA – SFPUC | 13 | 3.4 × 102 | 2.5 | 1.7 | |
| Rotavirus | Lausanne, Switzerland | 12 | 9.7 × 103 | 1.6 | 2.2 |
In panel A, the PLN and PGA distributions are compared across norovirus datasets. The PLN distribution closely fits concentrations across all locations. In contrast, the PGA distribution underestimates peak concentrations in the Japan (Matsushima) and USA (OCSD) datasets, though it performs well for the Lausanne and Zürich datasets. The USA (OCSD) dataset highlights a limitation of the PGA when an extreme outlier are present—in this case, a maximum measured concentration 1.8
log10 higher than the second-highest value. The fitting process forces the gamma distribution to adjust its parameters to accommodate this extreme value. However, due to the thinner tail of the gamma compared to the lognormal, this adjustment distorts the overall fit, causing deviations in how the model represents lower concentrations.
Panel B shows the CCDFs of three continuous distribution lognormal, gamma, and Weibull distributions fitted to reported concentrations. As for the mixed Poisson distributions, the lognormal distribution generally provides a good fit across all datasets, while the gamma and Weibull distributions underestimate peak concentrations in Japan and the USA.
Those visual comparisons are supported by statistical comparisons using information criteria (Tables S1 and S2). The PLN and lognormal distributions yield lower DICm and AIC values, respectively, indicating better overall performance than PGA, gamma, and Weibull distributions.
The impact of recovery rate correction on concentration distributions can depend on the distribution of the recovery data. For LACSD, where the recovery rate distribution is symmetric, with a mode of 50% and a mean of 48%, applying the recovery correction results in a horizontal shift in the CCDF, increasing concentrations without changing variability. The SFPUC and LASAN exhibit right-skewed recovery rate distributions, with modes around 20% and means around 40%. For SFPUC, recovery adjustment also leads to a horizontal shift in the upper tail of the distribution without affecting variability and uncertainty. However, for LASAN, the recovery correction does not produce a horizontal shift in the upper tail; instead, it reduces variability and uncertainty by compensating for previously underestimated low concentrations.
For the Zurich data set, recovery rates estimated using MHV process control data (Fig. 4) had a mean of 6%, ranging from 2% to 20%. These low recovery rates, when used to adjust enteric virus concentrations, increased the estimated concentration by up to two orders of magnitude. Applying the MHV recovery correction also results in a horizontal shift in the CCDF, increasing concentrations without changing variability.
![]() | ||
| Fig. 5 Complementary cumulative distribution function (CCDFs) of sub-sampled (top) enterovirus and (bottom) norovirus measurement data from ARA Werdhölzli WWTP in Zurich, Switzerland. | ||
Fig. 6 illustrates the relationship between sample size (i.e., the number of samples collected at a specific monitoring frequency over a set duration) and the uncertainty of the arithmetic mean, represented by the width of the 95% CI in log scale, for different standard deviation (σ) of the lognormal distribution. At a σ of 1.5, approximately 16 samples are required to reduce the 95% CI width on the arithmetic mean to below 1.0
log10. For the Zurich WWTP dataset, where σ is 0.97 and the sample size is 180, the 95% CI width of the mean is well below 0.3
log10. The values of σ for other WWTPs range from 0.7 to 2.6 (Table 2), reflecting varying levels of concentration variability across sites. This indicates that WWTPs with higher σ values, reflecting greater variability in virus concentrations, require more frequent sampling to achieve a precise mean estimate. In contrast, for sites with lower σ values, the variability is smaller, so fewer samples are needed to achieve the same level of precision, allowing for less frequent sampling without compromising the reliability of the mean estimate.
Our findings demonstrate that the lognormal distribution is well-suited for modelling the variability of enteric virus concentration across different locations and virus types. Posterior predictive checks and model comparison with deviance information criteria supported this selection. In contrast, gamma and Weibull distributions often fail to capture the high variability observed, especially at CV values above 2.5. This limitation is expected because these distributions lack the necessary skewness to model extreme variability, as their skewness plateaus with increasing CV, unlike the lognormal distribution.40 The absence of clear multimodality in the empirical distributions suggests that seasonality may be adequately represented by a single lognormal distribution fitted to annual data. However, we note that most of our data sets spanned only one year, which limits the ability to assess seasonality. Longer-term, higher-resolution datasets are needed to confirm this and to assess whether seasonal analyses could improve model performance.
The prevalence of the lognormal distribution indicates that multiplicative processes shape the variability in virus concentrations and suggests the absence of strict constraints (e.g., physical or biological limits) on maximum virus concentrations under monitored conditions. The variability likely arises from the way enteric viruses are shed by infected people within communities. Enteric virus densities in the stools of infected individuals vary widely; for example, ranging from 105 to 1011 genome copies per gram for norovirus.41,42 When aggregated, these highly variable virus loads result in wastewater concentrations that follow lognormal distributions. The upper tail of the distribution may be shaped by rare individuals or clusters shedding exceptionally high virus loads. Localized outbreaks may further impact the upper tail by causing sudden increases in virus shedding in communities. Peak concentrations may be amplified in smaller WWTPs with lower dilution capacities. In our study, norovirus peaks were higher at the WWTP in Matsushima, Japan (serving ∼9600 people) compared to the ARA Werdhölzli WWTP in Zurich, Switzerland (serving ∼471
000 people). It is worth highlighting that not all microorganisms in wastewater follow a lognormal distribution in concentrations. For instance, the concentrations of Escherichia coli in influent wastewater from Swiss WWTPs follow a gamma distribution.43 This distinction is likely due to the lower variability of E. coli densities in stools, which typically range from 106 to 109 colony-forming units (CFU) per gram,2 combined with relatively constant shedding rates.
Accurately estimating the distribution of virus concentrations requires acknowledging that observations are subject to various measurement uncertainties, including sampling uncertainty, detection limits, and analytical recovery rates. Discrete mixed Poisson distributions are better candidate models than continuous distributions, as they can incorporate uncertainty associated with non-detects into parametric analyses.44 In addition, discrete mixture distributions—which directly model virus counts within a sample—could be further developed to address uncertainties related to virus aggregation in wastewater.
When adjusting concentrations for recovery, the reliability of surrogate viruses depends on how closely their morphology and behavior in wastewater align with those of the target virus. In our study, the model enveloped virus MHV and the non-enveloped bacteriophages MS2 and PhiX174 had variable recovery estimates, ranging from 1 to 100%. On average, MHV recovery rates were around 1.0
log10 lower than those of MS2 and PhiX174, likely due to the enveloped structure of MHV. Enveloped viruses have unique adsorption, aggregation, and recovery behaviors in raw wastewater, with studies indicating that they are more strongly associated with solids and are more susceptible to inactivation during recovery processes compared to non-enveloped viruses.7 To improve surrogate selection, evaluating correlations between the recovery rates of these surrogates and those of naturally occurring enteric viruses could provide insights into which surrogates are most reliable for assessing virus concentration distributions.
Our analysis of high-frequency monitoring data from Zurich WWTP demonstrated that increasing the virus monitoring frequency from monthly to bi-weekly significantly reduced the uncertainty of the modelled distribution of concentrations. Further increasing to weekly sampling, however, provided minimal additional reductions in uncertainty but improved the prediction of peaks for enterovirus. For lognormally distributed concentrations, the optimal monitoring frequency varies based on the standard deviation of the natural logarithm of the concentration (σ). For the Zurich WWTP, σ is approximately 1.0 for norovirus and enterovirus, but for other WWTPs in this study, σ values up to 2.6 were found. In these cases, more frequent monitoring may be needed to adequately estimate the distribution for QMRA. Developing adaptive monitoring strategies that start with high-frequency sampling (e.g., weekly) to assess concentration variability (σ) and then adjusting the frequency as the estimated σ stabilizes could make monitoring efforts more cost-efficient and tailored to site-specific conditions.
Our findings highlight that site-specific differences in mean concentrations, variability, and distribution shapes of enteric viruses can be substantial. Local shedding patterns, population size, or outbreak dynamics may shape the underlying generative processes at each site. At the same time, methodological factors, such as sampling intervals, sample volumes, and analytical recovery rates, also influence measured concentrations. Although all viruses here were measured using PCR-based methods with process controls and recovery spikes, caution is warranted when comparing results across different WWTPs, as these factors may also impact reported variability. This underscores the importance of carefully modelling site-specific distributions, which can complicate meta-analysis. Although pooling data from multiple sites into one single distribution, as proposed by Darby, Olivieri,12 can offer insights for developing risk-based pathogen treatment requirements in potable reuse; this approach can mask site-specific uncertainty by treating variability as random noise around a single mean. Consequently, sites with distinctly higher (or lower) concentrations may be missed, and peak concentrations, often the key driver in risk assessment, may be underestimated. Random-effects meta-analysis addresses this limitation by allowing each data set to have its own distribution nested within a broader population distribution, preserving site-specific variability.45 Further development of meta-analytic models for microbial risk assessment would help refine how we estimate and manage virus concentration variability across diverse WWTPs.
• Enteric virus concentrations in wastewater influent vary widely by virus type and location, with average norovirus concentrations ranging from 1.1 × 102 to 1.8 × 104 genome copies per liter across sites, emphasizing the need to preserve site-specific variability rather than relying solely on aggregated estimates.
• The lognormal distribution accurately predicted peak virus concentrations and outperformed gamma and Weibull distributions.
• Recovery-corrected concentrations using MS2, PhiX174, and MHV matrix spike data were successfully incorporated into parametric models.
• Weekly monitoring over a year should be sufficient to estimate the annual average concentration within a 95% confidence interval of 0.5
log10 at most sites. High variability sites (σ > 2) may need more frequent monitoring to achieve accurate estimates of distributions.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5ew00286a.
| This journal is © The Royal Society of Chemistry 2025 |