Open Access Article
Katherine B. Ensor
*a,
Jose Palacio
a,
Sallie A. Kellerb,
Rebecca Schneiderc,
Kaavya Domakondac,
Loren Hopkinsd and
Lauren Stadler
e
aDepartment of Statistics, Rice University, MS 138, Houston, TX, USA. E-mail: ensor@rice.edu; Tel: +1 713 348 4687
bBiocomplexity Institute, University of Virginia, Charlottesville, NC, USA
cHouston Health Department, Houston, TX, USA
dHouston Health Department & Department of Statistics, Rice University, Houston, TX, USA
eDepartment of Civil & Environmental Engineering, Rice University, Houston, TX, USA
First published on 17th March 2026
Wastewater-based epidemiology (WBE) is an effective tool for tracking community circulation of respiratory viruses. We address a scientific gap that takes measured wastewater viral load of respiratory syncytial virus (RSV) and estimate the effective reproduction number and the number of infections in the population. We advocate a modular approach to the analysis. We first estimate the trend and current level of the RSV viral load and quantify the uncertainty. These estimates become input for our Bayesian renewal model for both the infection rate and the number of infected individuals. The modular approach simplifies the analysis pipeline while maintaining scientific integrity. Further, the modular approach supports translation to other viruses by using disease-specific models for estimated transmission and cases in the second phase of the analysis.
Water impactWastewater surveillance provides a powerful approach for tracking viral infection dynamics in a community. Monitoring pathogens through sewer systems yields population-level information that informs public health guidance, resource allocation, and risk assessment. RSV wastewater monitoring provides timely estimates of the effective reproduction number and the number of infected individuals. |
For the respiratory syncytial virus (RSV), a leading cause of severe respiratory illness in infants, older adults, and immunocompromised people, clinical surveillance faces challenges from under reporting and delays in reporting. Many RSV cases are managed at home without hospitalization, so clinical records do not fully reflect community transmission. RSV measured in wastewater provides insight into community disease dynamics.2 In a comprehensive study of 176 sites during the 2022–2023 RSV season, Zulli et al.3 observed that RSV RNA concentrations at state and national levels were linked to infection positivity and hospitalization rates. Allen et al.4 detailed the implementation of a WBE approach in Northern Ireland to track RSV community transmission over the 2021 and 2022 seasons, correlating wastewater RSV levels with clinical cases. Through sequencing and phylogenetic analysis, they compared RSV A and B G-gene sequences from wastewater and clinical samples to elucidate transmission patterns. A study of an active international land border found that wastewater signals peaked in Detroit (MI, USA) for the 2022–2023 RSV season approximately 5 weeks prior to the peak in Windsor (ON, Canada).5 The authors further found a strong positive relationship between wastewater disease concentrations and hospitalization rates in the Canada location.
Building on the success of WBE for the RSV, we address a scientific gap that takes measured wastewater viral load of RSV and estimate the effective reproduction number and the number of infections in the population, advocating a modular approach to the analysis. We first estimate the trend and current level of the RSV viral load and quantify the uncertainty in the trend. These estimates become input for a Bayesian renewal model for both the infection rate and the number of infected individuals. The modular approach simplifies the analysis pipeline while maintaining scientific integrity. This approach further supports translation to other viruses by refinement of the estimate of the trend and using disease-specific models for estimated transmission and cases in the second phase of the analysis.
150 residents. The facility's size and regular sampling ensure that it provides one of the most dependable wastewater signals in the city.
The temporal relationship between the number of infections, It, is compared to the estimated number of RSV cases for the city during the same time period, proportionally allocated to the population served by the single treatment plant.
Following the approach in ref. 10, let xt denote the trend in wastewater RNA viral load, measured on a log10 basis, at time t. The dynamics of the trend are specified by the state equation as
(xt − xt−1) = (xt−1 − xt−2) + wt, wt ∼ N(0, σ2w), |
The observation equation of the state-space model is given by
| yt = xt + vt, vt ∼ N(0, σ2v), |
We estimate the parameters of the SSM by maximum likelihood. Parameter estimation is an iterative process of likelihood evaluation using the Kalman filter at a given set of parameters and nonlinear optimization moving us through the parameter space. For a fixed parameterization, the Kalman filter provides efficient calculation of the filtered states or conditional means, and their uncertainties or conditional variances. We implement this process using the MARSS package in R. A nice feature is that this algorithm and software can easily manage missing values in our time series. The SSM framework coupled with the Kalman Filter technology, can also manage irregularly spaced observations as seen in ref. 11 and 12.
The output from stage one of our analysis is the estimated state or filtered value, which we denote by
t|t = E[xt|y1,…,t] at time t throughout our time series. In other words, the estimated state at time t is the expected value of the trend (on a log10 scale) at time t, given all observations through time t. We also obtain the estimated variances of the state at time t,
t|t = Var[xt|y1,…,t], or the conditional variance of xt at time t.
Importantly, estimation is performed on the log10 scale and subsequently converted to the natural measurement scale (gc L−1), ensuring consistency with the lognormal likelihood adopted in the measurement layer of the renewal model. You can simplify this step by taking the natural log rather than log10; each work equally as well as we ultimately transform back to the original measurement scale.
The unobserved variable for disease transmission, namely {zt}, is modeled as a random walk with normally distributed noise. This parameterization provides a parsimonious yet flexible representation of week-to-week changes in disease transmission. Specifically, zt = zt−1 + ε where ε ∼ N(0, σ2ε) and z1 ∼ N(1, σ2ε) with σε > 0. The parameter k controls the curvature of the link between zt and Rt and regulates how fluctuations in the process translate into changes in transmission.
Further, let It denote the number of new infections during week t. Following the renewal formulation,13 the expected number of cases, denoted by λt, is expressed as a convolution of past infections with the transmission-interval distribution:
The weekly temporal weights, winfg, for newly infected individuals are set by segments of a gamma probability density function (pdf) for weeks 1, …, G, and rescaled to sum to one. This construction is standard in renewal-based epidemic models.13–15 The weights are explicitly defined in ref. 16, and demonstrated in the code.
Following ref. 17, we assume a mean of 7.5 days and a standard deviation of 2.1 days for RSV for the gamma pdf to define the infection transmission weights.
There is not a one-to-one relationship between wastewater RNA copies and the number of infections, rather this relationship is inferred through our renewal process. The magnitude of It depends on our assumed initial conditions. For our purposes, we set the initial number to the citywide average weekly RSV health care encounters, scaled by the share of the city's population in the service area.18 This initial setting is considered conservative, as it does not account for individuals who did not seek medical care.
Similar to the infection transmission weights, the weekly weights ({wshedd}Dd=0) for the shedding distribution for a given individual are also modeled as a discretized gamma distribution, with maximum history of D weeks, and rescaled to sum to one. Clinical evidence provided in ref. 19 leads us to assume a mean of 4.6 days and a standard deviation of 2.0 days of the gamma pdf to obtain the temporal weights for the duration of viral shedding. Because our data are aggregated weekly, the interpretation of lag zero differs from daily formulation. Specifically, d = 0 corresponds to shedding occurring within the same calendar week as infection, which is biologically plausible given that RSV shedding can begin within the first few days of illness.20,21
The wastewater RNA trend estimate, say ût|t, is provided by step 1, where ût|t follows a lognormal distribution, and the log(ût|t) is normally distributed with mean
t|t and variance
t|t. The latter are the output from the state equation of our SSM, in step 1. This parametrization implies that the expected value of ut is in fact πt, defining our link between the wastewater trend and the infection dynamics.
For the state-space model, we performed the estimations using maximum likelihood with BFGS optimization through the MARSS package,22 which utilizes Kalman filtering for efficient likelihood evaluation. The uncertainty in the parameter estimates for σv and σw of the SSM is not incorporated in the Bayesian renewal model. We estimate the parameters of the Bayesian renewal model using nimble23 via MCMC. Prior distributions for the parameters β, k, and σε include a log-scale prior for β, a half-normal prior for the random-walk volatility σε, and a log-normal prior on k to regularize the curvature of the softplus link. Hyperparameters are chosen to ensure weakly informative priors. The MCMC sampler jointly estimated the unobserved values of Rt and It along with key distribution parameters, producing posterior samples for each. Multiple chains were implemented with warm-up iterations discarded. Convergence of the MCMC algorithm was assessed using the rank-normalized split
. Mixing was verified through the relative Monte Carlo standard error (relMCSE).
0 = 0.0074 (SE 0.6783),
= 0.02145 (SE 0.0034), and
= 0.0005 (SE 0.0002). The important variance parameters are both small, with small standard errors, indicating that the model separates the trend from the sampling and measurement error. The uncertainty in these parameters is not included in the Bayesian renewal model. The estimated WW trend and its variance are conditional on the point estimates for the variance parameters. The accuracy of the estimates of these parameters supports this analysis decision and does not influence the stage 2 outcomes.
Additional assessment of the SSM assumptions is performed by examination of the standardized residuals. The standardized residuals of the fitted SSM model do not exhibit any additional autocorrelation over a 4 week history, based on the Ljung–Box test (p-value = 0.33). Further, the Kolmogorov–Smirnov test for normality of the standardized residuals indicate that normality is a reasonable assumption (p-value = 0.09). The SSM model does an excellent job of capturing the WW viral trend.
Table 1 summarizes posterior estimates and convergence diagnostics for the scalar parameters of the Bayesian renewal model. We observe strong convergence with rank-normalized split
values between 1.005 and 1.018. Further the chains are well mixed as indicated by the small relative Monte Carlo standard errors (relMCSEs) (≈0.01–0.02) across all parameters. The link-curvature parameter k centers around 8, consistent with a moderately sharp softplus mapping from zt to Rt. The innovation standard deviation σε is small (0.089), suggesting gradual week-to-week changes in transmission. For the wastewater, the scaling factor β is 14.865 (SD 3.388 with 95% credible interval [9.465, 22.352]). The wastewater observation dynamic variance is provided through the SSM, in other words, the time-varying
t.
/relMCSE) for scalar parameters. The SSM-filtered viral-load inputs with week-specific variances
t are treated as known
| Parameter | Mean | SD | 95% CI | /relMCSE |
|---|---|---|---|---|
| β | 14.737 | 3.969 | [8.591, 23.945] | 1.018/0.021 |
| k | 8.026 | 4.727 | [1.765, 18.600] | 1.005/0.009 |
| σε | 0.078 | 0.027 | [0.032, 0.137] | 1.017/0.019 |
Fig. 2 depicts the time series of the effective reproduction number (Rt). For the majority of the two-year period, the reproduction number is 1, indicating no change in disease levels. The reproduction number increases in October 2023 and again in October/November 2024, corresponding to the increase of disease measured in WW. Disease transmission quickly returned to stable levels after the rise, as indicated by reproduction numbers less than one in January 2024, and then stabilized at one.
The results shown in Fig. 1 and 2 were driven solely by wastewater trends. The only case information used in modeling was the initial starting value, which was set to 64 cases based on the observed RSV healthcare encounter cases for the week of January 2, 2023. Additionally, our modeling decisions for the transmission and shedding weights were based on scientific references, not estimated from our own data. For comparison, Fig. 3 shows posterior means and 95% credible bands for the estimated number of infections (It) and the observed time series of weekly RSV healthcare encounter cases, proportionally adjusted for the population served by the wastewater treatment plant.
The estimated number of cases closely follows the observed cases in both magnitude and pattern. The observed cases peak two weeks before the estimated cases during both periods of increased disease prevalence. During the second peak, wastewater levels did not rise as much as expected, leading to a lower case estimate, specifically It.
The mismatch in peak levels between the estimated cases and observed cases could be due to several choices in our modeling framework. The first is selecting a random walk for the transmission process. A modification that includes two weeks of history instead of one may be more appropriate. Additionally, the disease transmission weights and shedding weights for the renewal equations were derived from scientific literature rather than being specifically fitted to the RSV case data for this study area. This was an intentional decision for the study. Our aim was to evaluate how well we could model the case counts and reproduction number relying solely on wastewater.
000 people residing in Houston, TX. The two-stage approach separates estimating the wastewater viral trend from the downstream estimate of the infection counts and dynamics. This analytic approach simplifies the analysis pipeline, bringing actionable science to real-world applications in a timely fashion.
Further, the two-stage approach allows the analyst to focus on two very different modeling challenges. By capturing the trend dynamics in wastewater, the analyst has a path forward for now casting and short-term forecasts of wastewater virus levels. The use of the SSM results in a clear separation of the underlying trend from the observation noise driven by both measurement and sampling uncertainty. We have specified a simple SSM model to track RSV RNA levels for our study area, and it worked well. However, the SSM technology is very flexible and can handle more complexity as needed for a specific situation. Further, the SSM technology supports missing or irregularly sampled wastewater time series. By handling these data issues early, we streamline the next step of the analysis, understanding the disease dynamics.
The stage-two Bayesian renewal model to estimate the number of individuals with the disease and the effective reproduction number can be implemented in an on-line fashion providing realtime actionable public health information. It is simpler than a Bayesian methodology that tries to address both modeling steps one and two simultaneously. Again, the input to the Bayesian renewal system is the estimated wastewater viral trend, and the uncertainty in that estimate.
An additional input to the stage-two Bayesian renewal model is the initial number of infected persons, representing the starting point of the disease. In our study, we used a proportional allocation of the infected population within Houston at the beginning of the study. This approach worked well, allowing us to accurately capture the levels and disease dynamics of RSV. It is important to note that this starting point serves as an anchor for estimating the number of infected individuals with the Bayesian renewal model. Essentially, we define the population with this setting. Since we relied on healthcare encounter data, our estimated number at later times reflects the same population. However, this estimate is lower than the actual number of infected individuals, as it does not include those who are sick but do not seek medical care. If there is reliable scientific data on the degree of under counting, this information could be used to refine the initial estimate of infected individuals. We also implicitly assume that the disease dynamics, for example the shedding distribution, does not change over time. If more scientific knowledge is available about the disease, this information can be incorporated into the modeling. Beyond this initial point, our disease trajectory is driven entirely by the wastewater trend estimate and its uncertainty.
The fact that the estimated number of infections measured from wastewater peaks after the healthcare encounter counts reflects the lag in observed wastewater levels. This scientific feature is worth further study to elevate the public health use of wastewater for understanding RSV spread in the community. The lag may be reflective of infected individuals in the population who are not seeking medical care. Advances in scientific knowledge about the disease and measurement dynamics can be incorporated into the stage two model to improve the public health utility of the wastewater estimated infections present in the community.
Wastewater-based epidemiology (WBE) is a proven effective tool for tracking community circulation of respiratory viruses. We contribute to the scientific gap that takes measured wastewater viral load of respiratory syncytial virus (RSV) and estimate the trajectory of the effective reproduction number and the number of infections in the population. Future statistical research is warranted to understand the mismatch in the timing of peak levels in number of RSV health care encounters. Specific areas to address include the assumption that disease transmission is a random walk, and the weights selected from the scientific literature for both the shedding and transmission distributions. Another hypothesis for the timing mismatch is that clinical data captures the onset of the disease by documenting severe cases, while cases estimated through wastewater reflect the spread of RSV to the larger and potentially less vulnerable population. These are questions that require further scientific investigation to answer.
| This journal is © The Royal Society of Chemistry 2026 |