Speech-generated aerosol settling times and viral viability can improve COVID-19 transmission prediction †

Droplets during human speech are found to remain suspended in the air for minutes, while studies suggest that the SARS-CoV-2 virus is infectious in experimentally produced aerosols for more than one hour. However, the absence of a large-scale association between regional outbreaks and weather-in ﬂ uenced virus-laden speech-generated aerosol characteristics such as settling time and viral viability makes it challenging for policy making on appropriate infection control measures. Here we investigate the correlation between the time series of daily infections and of settling times of virus-containing particles produced by speaking. Characteristic droplet settling times determined by the Stokes – Cunningham equation as in ﬂ uenced by daily weather conditions were estimated based on local meteorological data. Daily infection data were calibrated from local reported cases based on established infection timeframes. Linear regression, vector autoregression, simple recurrent neural network, and long short-term memory models predict transmission rates within one-sigma intervals using the settling times and viral viability over 5 days before the day of prediction. Corroborating with previous health science studies, from the perspective of meteorology-modulated transmission, our results strengthen that airborne aerosol transmission is an important pathway for the spread of SARS-CoV-2. Furthermore, historical weather data can improve the prediction accuracy of infection spreading rates.


Introduction
The novel coronavirus (SARS-CoV-2) has caused more than 240 million infections and 4.8 million deaths globally from COVID-19 as of October 19, 2021. 1 COVID-19 is known to cause considerable asymptomatic infections.Therefore, the ability to predict local COVID-19 outbreaks is imperative for effective public health management. 2 Faster u transmission during winter months is oen linked to lower temperatures and relative humidity than occur during the summer. 3Virus-laden aerosols from infected human hosts evaporate into smaller aerosol particles at lower humidity and as a result, they take longer to settle out of the atmosphere.In addition, viruses in aerosols survive longer at lower ambient temperatures, and thus, they remain contagious for longer periods of time while airborne. 4Speech-generated aerosols may be suspended in air for 8 to 14 minutes, 5 while viruses encapsulated in aerosol droplets could remain viable for 49 hours. 6,7Thus speechgenerated aerosols are widely considered to have contributed to asymptomatic transmission of COVID-19. 5,8,9The fate and transport of these virus-laden aerosol droplets could be used for predicting the spread of COVID-19.
Airborne transmission of COVID-19 has been studied extensively over the past year. 10,11Previous studies on predicting COVID-19 transmission and similar airborne transmission diseases were focused on using an infected population (SIR model) 12 or meteorological observation 13 directly as the input variables when predicting COVID transmission.Considering the non-linear relationships connecting weather to settling time and viral viability, 7,14 using weather-derived settling times and viability as input variables may improve the goodness of t as well as elucidating additional factors affecting airborne transmission.
Meteorological conditions such as temperature and humidity affect aerosol settling velocity by affecting the nal size of aerosols aer equilibration with ambient moisture through evaporation or condensation.The settling velocity of the equilibrated aerosols in the atmosphere is oen calculated using Stokes' law, 15 which has been traditionally used to estimate aerosol terminal velocity at ambient temperatures and pressures.Because it assumes no-slip boundary condition, it underestimates the terminal settling velocity for small particles of size < 1 mm.In air at 25 C, the terminal velocity accounting for slip correction is 1.24 times faster than calculated from uncorrected Stokes' law for a 1 mm-diameter particle, and 2.2 times faster for a 200 nm-diameter particle.Stokes' law also assumes that aerodynamic stress is transferred primarily through viscous exchange, meaning it is valid for small Reynolds number Re < 1. Cunningham later introduces a correction factor to account for particle surface slippage and the resultant Stokes-Cunningham law applies for aerosols sizes as small as 100 nm at ambient temperature and pressure. 16Other models, such as the one proposed by Epstein 17 and Millikan, 18 are only applicable at Knudsen numbers Kn > 10, corresponding to nmsized particles in the lower troposphere or micron-sized particles at millibar-level pressures. 19n addition to settling time, weather also affects the viability of viruses in suspended aerosols. 20In the case of SARS-CoV-2, high temperature, relative humidity (within 20-70% range) and ultraviolet B (UVB) light produce higher decay rates, 7 which is in agreement with previous studies on an enveloped virus. 21n a study focused on the viability of SARS-CoV-2 on surfaces, investigators reported an extension of viability over longer times at low temperatures and humidities. 22,23Weather also affects inuenza A virus viability, though the relationship depends on the specic solution medium. 24iven aerosol settling times and viral viability as the input variables, COVID-19 cases can be forecasted using regression analysis or machine learning models.Regression analyses such as linear regression and vector autoregression can identify key input variables among all the input variables but are limited to linear correlations only. 25,26Machine learning algorithms can nd highly non-linear correlations but they do not reveal any intuitive relationship between the input and response variables.Machine leaning has been introduced as a promising alternative to existing forecasting models for inuenza 27 and SARS-CoV-2 (ref.28) with temperature, humidity and sunlight intensity as input variables.
Herein, we test the model tting and prediction performance of the transmission rate of COVID-19 in the US using the settling times of speech-generated aerosols coupled with viral viability data.In order to achieve this goal, weather information, evaporated speech aerosol settling times, and viral viability are processed in regression and recurrent neural network (RNN) models to forecast SARS-CoV-2 daily transmission rates.We compared linear regression, vector autoregression (VAR), simple RNN and long-/short-term memory (LSTM) RNN in terms of prediction performance of COVID-19 transmission.We expect that inclusion of rst principles such as the Köhler equation for vapor pressure reduction on aquated aerosol size and settling velocity calculation improvements should removes some of the non-linearity that models need to accommodate in order to achieve better tting and forecasting performance.A good model tting and prediction performance would indicate that speech-generated airborne aerosols are a signicant transmission route for COVID-19 and that the weather-affected speech-generated aerosol properties may be incorporated to assist further predictive model development.

Methods
Fig. 1a shows the data ow of the model from weather data to predicted SARS-CoV-2 transmission in this work.Each section of the model is elaborated in this section.

Data mining
Five counties were selected for inclusion in our model development.They are Harris County, TX, King County, WA, Los Angeles County, CA, Maricopa County, AZ, and Santa Clara County, CA.The counties are representative of the top-20 most populated counties in the United States.Of the 5 counties selected none had zero-case days throughout April 2020.They also had moderately warm weather and no temperature below 0 C. When temperatures are below 0 C, additional data on water surface tension and sodium chloride solution partial molal volumes below normal melting point are needed.Constraining the predictive model to T > 0 C avoids the complication of ice crystal formation within aquated aerosols. 29The daily local meteorological data, including daily average temperatures and relative humidities (RH) were obtained online from National Oceanic and Atmospheric Administration (NOAA) from 1 April to 29 August 2020.For counties with more than one station, the station with most data coverage for daily temperature and RH was chosen.The station numbers are 12960, 24233, 93134, 23183, and 23293 for Harris County, King County, Los Angeles County, Maricopa County, and Santa Clara County, respectively.
The county-level COVID-19 conrmed case counts were obtained from USAFacts.org,who collected data from the Centers for Disease Control and Prevention (CDC), and the corresponding state-and local-level public health agencies.Data was acquired on 14 September 2020 and contained up-to-date daily conrmed cases.Given the extended asymptomatic period of COVID-19, the daily conrmed cases data was processed to reect the daily active cases based on a disease progression timeline (Fig. 1b) that summarizes information provided by the CDC. 30The daily active cases of a certain day to study is therefore the sum of daily conrmed cases for the past 12 days and future 4 days.

Aerosol settling behavior
Given the fast kinetics of water evaporation from micron-sized aerosols (seconds) 31,32 compared to their settling time from a typical human height (minutes), 33 the Köhler equation (eqn (1)) is used to estimate the size of evaporated aerosols: where is the ratio of dry salt diameter to wet aerosol diameter, x s;0 ¼ r s;0 r dry is the ratio of the characteristic length scale of Kelvin effect to dry salt diameter where the characteristic length scale is calculated as follows: in which n w is the partial molal volume of water in the solution, s is the surface tension of the solution-air interface, R is the gas constant and T the absolute temperature.The speech-generated aerosols are modelled as sodium chloride solutions at physiological concentration of 80 mM, which is a typical salivary sodium concentration. 34The initial size of speech-generated aerosols before evaporation is taken as 6 mm, which is the most abundant size according to experimental measurements. 35The partial molal volume of water in a sodium chloride solution, 36 water vapor pressure, 37 water surface tension, 38 and the binary diffusion constant of water vapor in air 39 are taken from previous experimental data or semi-empirical relationships.
The settling velocity of the evaporated aerosol of a given size is calculated using the Stokes' law with the Cunningham correction factor shown in eqn (3) where V t is the terminal settling velocity, r p is the particle density, D p is the particle diameter, g is the gravitational acceleration, mu is the viscosity of air, and C c is the Cunningham correction factor calculated as follows: where 2.52 is an empirical constant specic to air, and Kn is the Knudsen number, which is the ratio of the mean free path of the gas molecules (l) and the aerosol diameter (D p ) as shown in eqn (5).The three periods are: the pre-symptomatic contagious period, the wait period to obtain the test result after taking the test, and the recovery period at the end of which the patient is modelled as either recovered and no longer contagious, or entering the intensive care unit (ICU) and isolated from the public.We assume that the patient takes the test at the onset of symptoms.Under this assumption, a positively tested patient is considered contagious in our model from 4 days before until 12 days after the positive test result.
Assuming ideal gas law, the mean-free path, l, for a given gas is From the aerosol settling velocity, the settling time is calculated assuming aerosols attain their terminal settling velocity immediately aer release at a height of 1.5 meters.Because the settling time is used as an intermediate variable in the model depicted in Fig. 1 to check tting and make predictions, the absolute height of release does not affect conclusions obtained.

Viral viability
Viral viability is calculated using empirical linear regression with interaction by Paul Dabisch. 7 Because the regression equation is obtained from a limited range of temperature (10-30 C) and humidity (20-70%), we focus on counties with moderate climate where the viability calculation is valid.

Transmission model
The variable describing SARS-CoV-2 transmission is the "new case percentage increase (NCP)," which is calculated as the number of new positive tests on a particular day divided by the "total number of active cases (TNAC)" on that day.The TNAC on a day is estimated by summing all positive tests from 12 days before until 4 days aer the day of interest as stated above.
The time series data for each county are separated into a training set and a test set, with the test data set containing the last 4 days of data and the training set containing the remaining data.VAR and RNN models are developed using the training data.Subsequently, the predictive accuracy of the trained models is tested using the test data.
Linear regression analysis uses the settling times and viral viability between the day of interest and 5 days before as the input variables (total of 10).VAR uses the settling times, viral viability, and "new case percentage increase" between 1 day and n days prior to the day of interest as the input variables, where n is the order of VAR and selected by Akaike's Information Criterion.As an autoregressive algorithm, predictions of more than one day in the future are calculated using the predictions of previous days, not the actual data as in the linear regression or RNN models.Simple RNN uses the same input variables as the linear regression model, one hidden layer of 70 nodes, a max epoch of 10 5 and a learning rate of 10 À4 .LSTM uses the same input variables as RNN, one LSTM layer of 120 units, a max epoch of 10 6 and a batch size of 72.All models use the new case percentage increase on the day of interest as the response variable, which represents the transmission rate.

Results and discussion
In order to investigate the gravity settling of the speechgenerated droplets, the settling velocity and dimensionless numbers of the Stokes-Cunningham modication were estimated for droplets of 6 mm size (Fig. S1 †), which is used as the peak initial size of speech-generated droplets. 35It should be noted that this size is comparable to the average diameter of cough-generated droplet size of 5 mm. 40Thus, we use the size representing speech-generated droplets considering asymptomatic transmission of SARS-CoV-2, 41 which is at its most contagious before symptom onset. 42Fig. 2 shows the estimated terminal settling velocities of an evaporated aerosol as well as its associated Reynolds number and Knudsen number at that particular size and velocity in ambient air.The density of the aerosol is set to unity in this chart for illustration purposes; estimated sodium chloride solution density accounting for evaporation is used in producing all tting and prediction results.Because the Stokes-Cunningham equation is only applicable to Re < 1 and particle size > 100 nm, the estimated terminal velocity is accurate up to approximately $10 mm and down to 0.1 mm in terms of aerosol size.Thus, the size spectrum is broad enough to encompass the entire range of sizes produced by equilibrating speech-generated aerosols with ambient moisture (vide infra).For the range of sizes shown in Fig. 2b, Kn ( 10. Thus, the Epstein or Millikan equations 17,43 are not applicable in regard to the range covered (Fig. 2b).The decreasing trend of Kn as droplet size increases also conrms the importance of surface slippage at small droplet sizes.
Droplets of an initial size of 6 mm equilibrate with atmospheric moisture and evaporate into smaller aerosols or condense into larger droplets as shown in Fig. 3a and b.Fig. 3a shows the temperature effect on the size of aerosol aer evaporation or condensation, which is negligible within the temperature range seen in the counties investigated.Assuming that a few seconds are needed for droplets to evaporate to an equilibrium size, 32 we further assume instantaneous kinetics, thus the temperature effects demonstrated in this work are expected to be smaller than in reality.Fig. 3b shows the relative humidity effect on the size of aerosol aer evaporation (below 90% relative humidity) or condensation (at 100% relative humidity).A higher relative humidity corresponds to a larger equilibrium size of droplet or aerosol as expected.An initial size of 6 mm yields a droplet of size 1 to 10 mm in equilibrium with moisture, and this nal droplet size is used to calculate its settling time from the height of 1.5 m shown in Fig. 3c.As expected from Fig. 3a, the temperature effects on the settling velocity are minimal.The relative humidity effect on settling time is signicant, yielding as short as 1 min at 100% relative humidity and >20 min at <10% relative humidity.The evaporation and settling calculations agree with the classic Wells model. 33,44Similarly, the SARS-CoV-2 virus half-life is plotted as a function of ambient temperature and relative humidity in Fig. 3d.Lower temperatures and humidities yield longer viral half-lives.However, the relationship is highly nonlinear.The non-linearity poses a challenge to previous models 13,45,46 using meteorological data directly as input variables.Current transmission models incorporating weather data as input variables have varying goodness of ts and correlation signicances that may be due to how the meteorological variables were used. 29For example, humidity has been factored into models as relative humidity, 47 absolute humidity, 48 or dew point.The correlation between humidity and transmission may be related to the hydrophilic interactions between water and the proteins on the outer surface of SARS-CoV-2 virus via hydrogen bonding. 50The range of virus half-lives varies from several minutes to over an hour with typical ranges of temperature and humidity in April.These results underscore the potential effect of weather on airborne virus transmission.Results show that the weather affects the fate and transport of speech-generated, virus-laden droplets by changing the settling times and viral half-lives, and thus these intertwined effects may not be captured by a simple linear model.
To establish an effective weather-based model for COVID-19 epidemic prediction, regression analyses (LR and VAR) and machine learning models (RNN and LSTM) were compared for 5 U.S. counties.Fig. 4 shows the time series of daily case percentage increase in the different US counties.The model ttings follow the major trends of the actual data and capture most of their peaks and troughs; the actual data of the last 4 days also fall inside the one-sigma prediction intervals despite simplicity of the models used.The goodness of t and the prediction accuracy generally rank as follows: LSTM > simple RNN > LR > VAR (see r 2 for tting and residual sum of squares (RSS) for prediction in Table S1a and b †).Considering a key difference between VAR and the rest of the models are the use of auto-regressively predicted settling times and viral viability data versus actual data starting from the second day of prediction, the lowest tting and prediction accuracy of VAR suggests inaccurate aerosol settling times and viral viability predictions from past data as expected.It is clear that accurate weather-originated data input is required to predict transmission rates accurately.VAR also includes past transmission rate data as an input, which is not included in the other models explored.This suggests that past transmission is not a signicant input variable for predicting future transmission compared to the two weather-originated variables as normalized into a percentage increase.Improved tting for LSTM over simple RNN suggests that weather beyond 5 days prior affects current transmission.Better tting and prediction performance of neural network models compared to LR suggests nonlinearity in the correlation between settling time, viral viability, and transmission rate, even though reasonable linear correlations are observed.For example, the r 2 values for the counties considered vary from 0.36 to 0.80 with an average of 0.59, achieved using input variables capturing two types of weather inuences on transmission.Variability in goodness of t among the counties may be explained by local residents, who have delay in time from the onset of symptom to getting a COVID test.
To better understand how weather-originated aerosol settling times and viral viability affect transmission, the contours of model predictions are shown in Fig. 5.The ranges of settling times and virus half-lives are determined in part by the local temperature and RH range during April, for each county of study.Note that the data points used to generate the contour plots are not uniformly distributed inside the contours, and the data to be predicted may not lie within the range of training data (see Fig. S2 †).Although UV intensity is not a direct input variable in this model, it positively correlates with temperature 51 and is, therefore, indirectly taken into account in this model.
Counties have faster transmission at longer aerosol settling times or longer virus half-lives.These results indicate that active-virus-laden aerosols are a major pathway for COVID transmission.The only exception to this claim was seen for Santa Clara County for which there appeared to be faster transmission at low viral viability and settling times leading to a less accurate prediction compared to the other counties that were analysed in Fig. 5. Harris, King and Maricopa counties show faster transmission with a longer virus half-life, while LA County had increased transmission rates at longer settling times.The LR, VAR and simple RNN predictions show clear trends, while the trend of LSTM predictions indicates hotspots for easy transmission in the 2D space of viral viability and aerosol settling times.This may be indicative of the small training data set used, considering the high accuracy of tting and predictions by LSTM in Fig. 4. The different trends between LA County vs. Harris, King and Maricopa counties may be a result of their different policies and human behaviours not captured by the input variables in this work.Future work in the training of an LSTM model with sufficient data over a wide range of weather conditions from all seasons may reveal a clear trend of correlation similar to the LR, VAR and simple RNN models in this work.
The performance of transmission rate prediction based on aerosol settling times and viral viability was also studied with an extended dataset of Maricopa County from May to August 2020, as shown in Fig. 6.The r 2 values are 0.172, 0.579, and 0.999 956 for linear regression, simple RNN, and LSTM, respectively.Similar to the April data, LSTM has the closest tting, followed by the simple RNN, and a linear regression.All three models have similar prediction accuracies, with RSS values of 0.0110, 0.0156, 0.0160 for linear regression, simple RNN, and LSTM, respectively.The matching performance of these 3 models are also observed in April Maricopa County data.The observed increase in new cases line falls within the one-sigma prediction interval for the last 21 days of available data.
The prediction from weather-driven settling times and viral viability to transmission rate in this work corroborates with previous ndings that transmission is faster at low temperatures and humidities for COVID in major global cities from Nov 2019 to Feb 2020, 52 in the US using state-level data over Jan-Apr 2020, 53 and for Singapore using data from Jan-May 2020. 13espiratory droplets travel can travel three times farther at lower temperatures and higher humidity compared to typical dry and hot environments. 54t should be noted that not all published work supports a link between weather and transmission.Linear machine learning models failed to establish the correlation between state-level (Italy and US) or country-level (rest of the world) transmission and meteorological data. 47This is most likely due to the non-linearity in linking temperature and humidity data to other variables that are important factors in transmission.For similar reasons, a recent multilinear regression model found no signicant correlation between temperature, humidity and the basic reproductive number R 0 of transmission. 55However, the lack of correlation between meteorological data and COVID transmission in China during early 2020 may be a result of strong policy changes overshadowing any weather effects. 56ther works have analysed the link between virus-laden aerosol settling and SARS-CoV-2 transmission from different perspectives. 5,9,57Smith et al., provided a useful model that assesses aerosol transmission of SARS-CoV-2 through respiratory droplet physics. 57Their study calculated the number of virus particles inhaled via indirect airborne transmission by calculating the persistence (settling time) of cough-generated aerosols, and concluded that aerosol transmission is a possible but not efficient route of transmission of SARS-CoV-2. 57This conclusion as well as evidence suggested by Stadnytskyi et al. 5 and Annrud et al. 9 agree with the conclusion of the present work to the extent that indirect airborne transmission is a possible route of transmission of SARS-CoV-2.The WHO, in the most recent update (Apr 30, 2021), has also acknowledged aerosol transmission as one of the major routes of transmission for SARS-CoV-2. 58Homogeneity of the aerosols in the space studied is oen assumed in these approaches to translate aerosol persistence to aerosol inhaled, which can be far from reality. 35One advantage of this work is that by predicting transmission from aerosol persistence (and virus viability) via data analysis tools, homogeneity is not assumed.Because the infection risk assessment is embedded in the data analysis step connecting aerosol persistent and transmission, mathematical infection risk assessment models such as Wells-Riley and doseresponse are also not required in this work.This approach reduces uncertainties introduced into the model as the infection threshold of SARS-CoV-2 is still unclear. 59 key assumption in the models presented is that the timeframes of virus transmission, disease progression, test-toresults, and hospitalizations are consistent across a studied population, their location, and time span.However, timeframes could actually be uctuating and thus undermine the accuracy of our model predictions.For example, since COVID case data that is reported may have inherent time delays due, for example, to the shortage of test kits.Delays are an important parameter in this study, and thus model tting residuals associated with this input variable cannot be eliminated.Another underlying assumption of this study is that the fraction of asymptomatic infections of total infections is constant.However, this is still unknown to the best of our knowledge.Our models also have simplications that may be additional sources of error.These simplications include that a sodium chloride solution, which is used as a surrogate model of physiological uids, is a good proxy for virus-laden aerosols and that the surface tension of an aerosol droplet is only a function of its temperature and solute concentration.The neural network models use a random set of parameters initially for each neuron, and the optimized result can be dependent on this initial set of parameters, if they are actually too different from the optimal set.
Although the models in this work use the outdoor weather input variables and transmission can occur indoors, the outdoor temperature correlates positively with the indoor environment. 60,61The correlation coefficient (slope of linear regression), however, depends on the season and location.For example, Massachusetts has T outdoor $ 0.04T indoor at T < $10 C, and T outdoor $ 0.41T indoor at T > $10 C. 62 South Korea has T outdoor $ 0.13T indoor at T < $15 C, and T outdoor $ 0.47T indoor at T > $15 C. 63 The indoor absolute humidity also tracks the outdoor humidity across seasons and diverse locations. 61,64,65As a result, the outdoor transmission risk predicted in this work tracks with, and can be used as a surrogate for the indoor transmission risk.
Control measures such as mandatory mask-wearing and lockdowns are not accounted for by two input variables in this work.We limit our scope to April in Fig. 4 when nationwide lockdown was still in effect to minimize this variable in terms of its inuence on transmission.The extended-time analysis on Maricopa County for May-August in Fig. 6 has lower tting and prediction accuracy compared to the April results as shown in Table S1c.† The lower accuracy for longer time periods of analysis may be the result of encompassing more non-weatherrelated events, such as a signicant increase in mask-wearing and the mass public protests of 2020.Although it is possible that the models presented in this work are not capable of handling data over longer times, the RNN models typically benet from additional training data to improve prediction accuracy.They are expected to have improved prediction performance for longer study times, if non-weather-related events would be represented in the model.

Conclusions
Seasonality of airborne COVID transmission may be explained in part by weather-induced changes in the aerosol settling times and virus viability.We use Stokes' sedimentation model with a Cunningham correction factor for surface slippage in order to estimate the settling times of speech-produced aerosols aer evaporation for Re < 1 and Kn ( 1. SARS-CoV-2 viral viability is estimated using an empirical relationship from local historical weather data.Linear regression, vector autoregression, and recurrent neural network models using the settling time, viral viability and past transmission rate successfully predict future transmission rates within one-sigma prediction interval.Airborne speech-generated aerosol transmission is a signicant transmission route of SARS-CoV-2.Including aerosol settling time and viral viability from historical weather data as input variables can improve the accuracy of transmission rate prediction.Corroborating with publications and public actions over the past year, the ndings of this study support implementation of control measures including social distancing, enforcing mask wearing, and systematic preventive measures such as improved ventilation in both community and healthcare settings. Overall, the evidence on weather inuence of transmission has been contradictory and inconclusive.We note that the present work does not aim to prove that aerosol settling time and virus viability are exclusively important on predicting transmission rate.The tting and prediction performance of the models presented suggests that weather plays a considerable role in transmission.Thus, the incorporation of weatherderived, transmission-mechanisms-based input variables, including aerosol settling times and virus viability, into epidemiological prediction model may worth further investigation.Future work in model development should also include additional variables that play a role in airborne or surface-based transmission such as wind speeds, turbulence (especially those created by speech which can lengthen the suspension time by 30-150 times 66 ), and UVB intensity.Datasets should include more locations outside of the US where the weather system may be different.Furthermore, the study periods can be extended to allow for better machine learning algorithm training.

Fig. 1
Fig. 1 (a) Illustration showing the model data flow in this work.(b) Typical COVID-19 progression around the date of positive test result.The three periods are: the pre-symptomatic contagious period, the wait period to obtain the test result after taking the test, and the recovery period at the end of which the patient is modelled as either recovered and no longer contagious, or entering the intensive care unit (ICU) and isolated from the public.We assume that the patient takes the test at the onset of symptoms.Under this assumption, a positively tested patient is considered contagious in our model from 4 days before until 12 days after the positive test result.

10 Â
d is the van der Waals diameter of the gas molecule (3.10 À10 m for N 2 ), and N V is the molecular density of gas (2.46 Â 10 25 at 25 C and 1 atm total pressure).l ¼ 95 nm for air at 25 C.

Fig. 2
Fig. 2 (a) Calculated settling velocities of aerosols of varying sizes using Stokes-Cunningham law.(b) The Reynolds number (Re) and Knudsen number (Kn) of droplets of varying sizes.At Kn < 10, the Stokes-Cunningham law is the most applicable first-principle relationship to calculate settling velocity.

Fig. 3
Fig. 3 Evaporated aerosol sizes derived from the K öhler equation based on an initial size at different ambient (a) temperatures at 50% relative humidity and (b) relative humidity at 25 C. (c) Calculated settling times obtained from the empirical model using 6 mm as the initial droplet size and (d) viral viability at different ambient temperatures and relative humidity.

Fig. 4
Fig. 4 Time series of daily case percentage increase in decimal format for April 2020 in counties studied.The predicted daily case increase of the last 4 days are shown as triangles with their associated one-sigma prediction intervals.Dashed lines show the model fitting from the 6 th day to the 25 th day of April.No fitting data obtained from the model for the first 5 days because they would require weather data from March (up to 5 days prior).LR: linear regression; VAR: vector autoregression; simple RNN: simple recurrent neural network; LSTM: long-/short-term memory recurrent neural network.The green filled areas represent 95% confidence intervals for LR predictions.The blue patterned areas represent 95% confidence intervals for VAR predictions.

Fig. 5
Fig. 5 Contour plots of the daily case percentage increase as a function of settling time and viral viability (represented by half-life) for different counties.Colour shows the daily case percentage increase in decimal.LR: linear regression; VAR: vector autoregression; simple RNN: simple recurrent neural network; LSTM: long short term memory recurrent neural network.

Fig. 6
Fig. 6 (a) Fitting and (b) predicted daily new case percentage increase for Maricopa County from May to August 2020.Interrupted data in (a) is due to interrupted weather history data from NOAA.Error bars show one-sigma prediction intervals.The training data in (a) are 75 days long and the testing data in (b) are 21 days long.LR: linear regression; simple RNN: simple recurrent neural network; LSTM: long short term memory recurrent neural network.The green filled areas represent 95% confidence intervals for LR predictions.