Predicting the start week of respiratory syncytial virus outbreaks using real time weather variables
© Walton et al; licensee BioMed Central Ltd. 2010
Received: 21 April 2010
Accepted: 2 November 2010
Published: 2 November 2010
Respiratory Syncytial Virus (RSV), a major cause of bronchiolitis, has a large impact on the census of pediatric hospitals during outbreak seasons. Reliable prediction of the week these outbreaks will start, based on readily available data, could help pediatric hospitals better prepare for large outbreaks.
Naïve Bayes (NB) classifier models were constructed using weather data from 1985-2008 considering only variables that are available in real time and that could be used to forecast the week in which an RSV outbreak will occur in Salt Lake County, Utah. Outbreak start dates were determined by a panel of experts using 32,509 records with ICD-9 coded RSV and bronchiolitis diagnoses from Intermountain Healthcare hospitals and clinics for the RSV seasons from 1985 to 2008.
NB models predicted RSV outbreaks up to 3 weeks in advance with an estimated sensitivity of up to 67% and estimated specificities as high as 94% to 100%. Temperature and wind speed were the best overall predictors, but other weather variables also showed relevance depending on how far in advance the predictions were made. The weather conditions predictive of an RSV outbreak in our study were similar to those that lead to temperature inversions in the Salt Lake Valley.
We demonstrate that Naïve Bayes (NB) classifier models based on weather data available in real time have the potential to be used as effective predictive models. These models may be able to predict the week that an RSV outbreak will occur with clinical relevance. Their clinical usefulness will be field tested during the next five years.
Bronchiolitis is a major cause of hospital admissions during the winter and can cause severe hospital overcrowding. Respiratory syncytial virus (RSV) is a respiratory virus that can cause severe infection in infants and young children and is the leading cause of bronchiolitis in children under one year of age in the United States [1–4]. RSV outbreaks cause a significant increase in hospital admissions during the winter season . The ability to predict the start date of an RSV outbreak using readily available data may allow for the implementation of management strategies in a more timely, effective, and efficient fashion. Some possible improvements may include: improved staff scheduling, improved rescheduling of elective procedures, anticipatory resource utilization and mobilization of respiratory supplies, improved timing of control measures, and improved timing for restricting visitation and grouping patients [5, 6].
RSV has a characteristic biennial outbreak pattern with alternating low peak and high peak seasons . While exceptions of this biennial pattern have been reported in the literature [7–11] we observed the normal biennial pattern in all the data used in this study. It has been shown using mathematical models that this pattern of high peak and low peak outbreaks can be explained by the variation in the number of susceptible individuals in the population, which is considerably diminished following a large outbreak [1, 4, 12]. The 'high peak' seasons tend to occur earlier in the year, have a higher peak, and a shorter duration. In contrast, the 'low peak' seasons occur later, have lower peaks, and a longer duration . If, as suggested by the mathematical models, the different patterns observed for high and low seasons were due to the size of the susceptible population, it also would be expected that on a low outbreak season there would be a greater lag between outbreak stimulus and the exponential growth of confirmed RSV cases. The inference is that it takes the outbreak longer to infect a large number of individuals given the level of immunity already existing in the population due to large outbreaks during the previous year. Therefore, it is important to develop independent models for high peak and low peak years.
Although existent mathematical models describe RSV outbreaks, the prediction of outbreak start dates remains an unsolved problem [1, 3, 4, 12]. Current methods of identifying outbreak start date are all based on retrospective data. To our knowledge, there are no studies published in the biomedical literature that attempt to predict the start week of an RSV outbreak using variables that can be acquired in real time and that are appropriate for inclusion in a forecast model that can be used in a clinical setting. Among these variables, weather-related variables (such as temperature, humidity, and precipitation) have great potential to be predictive because many studies have correlated them with RSV outbreaks [13–16]. To construct a predictive model for RSV outbreaks, one must consider the small training set resulting from the lack of long term epidemiology records and the need to separate the data between low and high peak seasons. Therefore, the objective of our investigation was to explore the feasibility of using weather variables with Naïve Bayes (NB) classifiers to predict RSV outbreaks and the concomitant increase in RSV-related admissions to a pediatric hospital.
Weather data from 1985 to 2008 were obtained from the National Oceanic and Atmospheric Administration (NOAA) National Climatic Data Center (http://cdo.ncdc.noaa.gov/CDO/cdo). Data from the Salt Lake International Airport weather station were used to represent weather for the entire Salt Lake County region.
Data representing the diagnosis of RSV were obtained from the Intermountain Healthcare Enterprise Data Warehouse (EDW). We selected records for patients that reside in Salt Lake County, Utah that meet the following criteria: 1) laboratory confirmation of RSV by viral culture or direct fluorescent antibody (DFA) or/and polymerase chain reaction (PCR); or 2) a discharge diagnosis coded with a bronchiolitis or RSV-related ICD-9 code, including: 466 acute bronchitis and bronchiolitis; 466.1 acute bronchiolitis; 466.11 acute bronchiolitis due to respiratory syncytial virus (RSV); and 466.19 acute bronchiolitis due to other infectious organism. We selected patients from of all ages meeting the selection criteria given above; the age of the patients included in the study ranges from newborn (0 days) to age 95, with the median age of 6 months.
For the purpose of this study, an RSV season was defined as September 20th to July 15th of the following year. We used the diagnostic ICD-9 codes to select records from 1985 through 2008, providing data for 12 high- and 11 low-peak seasons. We used the laboratory criteria defined above to select records from 2002 through 2008. Viral testing laboratory data were not available prior to 2001; viral testing was routinely performed on patients seen at PCMC after 2003. Between August 24, 2004 and June 26, 2008, RSV-related laboratory results and ICD-9 codes for non-specific bronchiolitis were highly correlated (Kendall tau correlation statistic = 0.78; see also in the Additional file 1 the superposition graph of RSV and ICD9 cases for this period). This indicates that ICD-9 codes are a reasonable proxy for positive laboratory tests. While some literature results  shows that outbreaks of other respiratory pathogens like HMPV, dual out breaks, etc. may invalidate this assumption, the ICD9 signal observed in our data does not present any evidence of these issues, giving confidence that our assumption to use ICD9 codes as a proxy for RSV cases is valid for our study.
Determination of the start Week of the Outbreak
Weather Variable Selection
To select weather variables to be considered for the prediction model, we reviewed the literature and the availability and utility of variables identified. We identified publications that reported either a positive or negative relationship between RSV activity and weather. Among the 40 publications, the following weather variables either correlated or did not correlate with RSV outbreaks: humidity (16:12), dew point (4:5), temperature (26:9), wind speed (0:3), wind chill (1:0), atmospheric pressure (3:6), precipitation (8:8), and UV light exposure (7:8) . We discarded UV light as a potential variable because there was insufficient data available for inclusion in our prediction model. We discarded dew point and wind chill because these variables can be derived from other variables already included in the model (temperature and humidity, and wind speed and temperature, respectively). The remaining weather variables were retained to be considered in our models. We used two measurements to represent humidity (daily minimum relative humidity and daily maximum relative humidity) and three measurements to represent temperature (mean daily temperature, minimum daily temperature, and maximum daily temperature). Precipitation, atmospheric pressure, and wind speed were each represented by a single daily measurement. Attempts to use feature selection methods to further reduce the number of variables in our models were unsuccessful. Therefore, the following eight measurements (variables) were included in our model: daily minimum relative humidity, daily maximum relative humidity, mean daily temperature, minimum daily temperature, maximum daily temperature, total daily precipitation, average daily atmospheric pressure, and average daily wind speed.
Naive Bayes Classifier
Institutional Review Board approval to perform this study was obtained from Intermountain Healthcare and the University of Utah.
As discussed above, in most cases there was no single model that can be clearly considered the best for predicting the start of an outbreak one, two or three weeks in advance. Inspection of the Tables in the Additional file 2 as well as Figures 4 and 5 show that numerous models can be considered equally appropriate, depending on the selection criteria (i.e. sensitivity, specificity, different combinations of them, performance on the test set or the training set, etc.) Given the limited number of outbreaks in our testing data, this finding is not surprising. A larger set of testing data is necessary to adequately describe performance.
Individual variable performance
Percentage of models that include each variable in the highest performance models, based on achieving a minimum sensitivity of 67% and minimum specificity of 94%, for each advanced prediction for high peak seasons.
Average Daily Pressure
Average Daily Wind Speed
Average Daily Temperature
For low peak seasons, the analysis of the weather variables included only the models that can reasonably predict an outbreak. The predictions for the week of the outbreak and one week in advance of the outbreak are probably not significant since these models have only one variable combination that achieves the top performance and they both perform much worse than any other model considered here. For top performing models in low peak seasons, wind speed appears in over 50% of the best models for predictions two weeks in advance and then drops to less than 20% for predictions three weeks in advance. The minimum temperature appears in almost 60% of the predictive models for predictions two weeks and three weeks in advance. Similar to high peak seasons, precipitation was not present in any of the top performing models for low peak seasons. Atmospheric pressure appears in 51% of the models predicting 2 weeks in advance and drops to 22% of the models predicting the outbreak three weeks in advance. Maximum relative humidity appears in 51% of models predicting two weeks in advance while minimum relative humidity only appears in 2% of models two weeks out, both have identical representation in predictions three weeks out, appearing only in 22% of the top performing models.
Overall, the input variable temperature, expressed as maximum, average or minimum, consistently appears in the best predictive models, with one of the three temperature variables appearing in over 90% of the best performing models for any prediction attempted. The importance of temperature as a predictive variable increases the further in advance the predictions are made. The atmospheric pressure appears to be an important factor for predicting the outbreak in its week and one week in advance, but it is encountered less frequently in predictions models for two and three weeks in advance. These results show that the variables most commonly encountered in the best models for predicting RSV outbreaks are similar to those associated with the development of temperature inversions in the Salt Lake Valley . It is important to note that these inversions are always associated with a severe increase in levels of air pollutants that have been consistently correlated with respiratory health issues [24–26].
To further our investigation, we attempted to use air pollution indicators as predictive factors for RSV outbreaks. Several attempts were performed to develop NB models using air pollution variables (e.g., PM 10, and concentration of CO and SO2) without success. Unfortunately, the data available for concentrations of PM 2.5, an air pollution indicator that has shown some association with bronchiolitis [27–29], is not sufficient to appropriately train a NB classifier. Analysis of only nine years of PM 2.5 data suggested that the concentration of these particles may have predictive value to forecast RSV outbreaks, but definite answers must wait until sufficient years of retrospective PM 2.5 data becomes available.
According to our literature review, wind speed has not been reported as a good predictor for RSV outbreaks. In contrast, wind speed was predictive in our analysis. The unique geography of the Salt Lake Valley contributing to the common occurrence of inversions in winter may account for this discrepancy, as wind (or lack thereof) plays a vital role in the creation of inversions.
The difference in findings for high and low peak outbreak years agree well with what it is expected if RSV outbreak biannual patterns are a result of changes in the susceptible population. In low peak years, the number of susceptible individuals is lower than in high peak years; therefore, it is expected that even if the meteorological conditions exist to start an outbreak, the time for the outbreak to spread will be longer due to herd immunity. The speed of transmission is slower when more of the population exhibits immunity. This observation agrees very well with our relative lack of success in predicting outbreaks during the week the outbreak occurred and one week in advance during low peak seasons. In contrast, during high peak years, the number of cases quickly ramp up once the appropriate meteorological conditions exist to start the outbreak, leading to good predictive power for these short-term predictions.
Our study has limitations. Data limitations allowed for only rough performance estimates for unique models. Because of the limited amount of seasons on which this model was tested, it is possible that large changes in sensitivity and specificity could be a reflection of limited data rather than actual model performance. The question of which model to implement in practical applications remains unanswered, and will depend upon refined performance estimates as data availability increases or the desire to select models with increased sensitivity or specificity to meet the operational needs of the hospital. Finally, we acknowledge that serious outbreaks from other viral pathogens can interfere with the RSV outbreaks making the prediction of a weather based model less reliable. Despite these limitations, the results validate our decision to use different models for high and low peak seasons and are consistent with the existing models to explain high and low peak seasons based on the size of the susceptible population. To address the limitations we identified, we will evaluate prediction models prospectively in the Salt Lake County region for the next five years.
Use of the results presented here in other geographic locations would require NB training with local weather data to account for the changing characteristics of RSV outbreaks in different regions with different climates and varying geographic features. However, it is likely that the same climate variables and methods could be used to build a predictive model in other locations, particularly if they have similar climates and geography to the Salt Lake County area.
Weather-related measurements available in real time have the potential to predict RSV outbreaks. Our results are consistent with previous studies that indicated low peak and high peak outbreak seasons have different population dynamics, which most likely would lead to a lag between stimulus and event for low peak outbreak seasons. In our study, it appears that weather conditions that lead to outbreaks may be conditions that also lead to the establishment of a temperature inversion in the Salt Lake Valley, which in turn creates a condition of more polluted air known to impact respiratory health. In the future, NB prediction models should include pollution measurements as inputs, and validation should be performed prospectively.
NW was supported by the National Library of Medicine (NLM) Medical Informatics Training Grant #T15-LM007124-11. PG, CS and JCF were partially supported by the Rocky Mountain Center of Excellence in Public Health Informatics # 1P01HK000069-10.
- Arenas AJ, Gonzalez-Parra G, Morano JA: Stochastic modeling of the transmission of respiratory syncytial virus (RSV) in the region of Valencia, Spain. Biosystems. 2009, 96 (3): 206-212. 10.1016/j.biosystems.2009.01.007.View ArticlePubMedGoogle Scholar
- Lyon JL, Stoddard G, Ferguson D, Caravati M, Kaczmarek A, Thompson G, Hegmann K, Hegmann C: An every other year cyclic epidemic of infants hospitalized with respiratory syncytial virus. Pediatrics. 1996, 97 (1): 152-153.PubMedGoogle Scholar
- Panozzo CA, Fowlkes AL, Anderson LJ: Variation in timing of respiratory syncytial virus outbreaks: lessons from national surveillance. Pediatr Infect Dis J. 2007, 26 (11 Suppl): S41-45.View ArticlePubMedGoogle Scholar
- Weber A, Weber M, Milligan P: Modeling epidemics caused by respiratory syncytial virus (RSV). Math Biosci. 2001, 172 (2): 95-113. 10.1016/S0025-5564(01)00066-9.View ArticlePubMedGoogle Scholar
- Bont L: Nosocomial RSV infection control and outbreak management. Paediatr Respir Rev. 2009, 10 (Suppl 1): 16-17. 10.1016/S1526-0542(09)70008-9.View ArticlePubMedGoogle Scholar
- Forbes M: Strategies for preventing respiratory syncytial virus. Am J Health Syst Pharm. 2008, 65 (23 Suppl 8): S13-19. 10.2146/ajhp080440.View ArticlePubMedGoogle Scholar
- Mlinaric-Galinovic G, Welliver R, Vilibic-Cavlek T, Ljubin-Sternak S, Drazenovic V, Galinovic I, Tomic V: The biennial cycle of respiratory syncytial virus outbreaks in Croatia. Virology Journal. 2008, 5 (1): 18-10.1186/1743-422X-5-18.View ArticlePubMedPubMed CentralGoogle Scholar
- Terletskaia-Ladwig E, Enders G, Schalasta G, Enders M: Defining the timing of respiratory syncytial virus (RSV) outbreaks: an epidemiological study. BMC Infectious Diseases. 2005, 5 (1): 20-10.1186/1471-2334-5-20.View ArticlePubMedPubMed CentralGoogle Scholar
- Galiano MC, Palomo C, Videla CM, Arbiza J, Melero JA, Carballal G: Genetic and Antigenic Variability of Human Respiratory Syncytial Virus (Groups A and B) Isolated over Seven Consecutive Seasons in Argentina (1995 to 2001). J Clin Microbiol. 2005, 43 (5): 2266-2273. 10.1128/JCM.43.5.2266-2273.2005.View ArticlePubMedPubMed CentralGoogle Scholar
- Peret T, Hall C, Schnabel K, Golub J, Anderson L: Circulation patterns of genetically distinct group A and B strains of human respiratory syncytial virus in a community. J Gen Virol. 1998, 79 (9): 2221-2229.View ArticlePubMedGoogle Scholar
- Goddard NL, Cooke MC, Gupta RK, Nguyen-Van-Tam JS: Timing of monoclonal antibody for seasonal RSV prophylaxis in the United Kingdom. Epidemiol Infect. 2007, 135 (1): 159-162. 10.1017/S0950268806006601.View ArticlePubMedGoogle Scholar
- Capistran MA, Moreles MA, Lara B: Parameter estimation of some epidemic models. The case of recurrent epidemics caused by respiratory syncytial virus. Bull Math Biol. 2009, 71 (8): 1890-1901. 10.1007/s11538-009-9429-3.View ArticlePubMedGoogle Scholar
- du Prel JB, Puppe W, Grondahl B, Knuf M, Weigl JA, Schaaff F, Schmitt HJ: Are meteorological parameters associated with acute respiratory tract infections?. Clin Infect Dis. 2009, 49 (6): 861-868. 10.1086/605435.View ArticlePubMedGoogle Scholar
- Welliver R: The relationship of meteorological conditions to the epidemic activity of respiratory syncytial virus. Paediatr Respir Rev. 2009, 10 (Suppl 1): 6-8. 10.1016/S1526-0542(09)70004-1.View ArticlePubMedGoogle Scholar
- Omer SB, Sutanto A, Sarwo H, Linehan M, Djelantik IG, Mercer D, Moniaga V, Moulton LH, Widjaya A, Muljati P: Climatic, temporal, and geographic characteristics of respiratory syncytial virus disease in a tropical island population. Epidemiol Infect. 2008, 136 (10): 1319-1327. 10.1017/S0950268807000015.View ArticlePubMedPubMed CentralGoogle Scholar
- Noyola DE, Mandeville PB: Effect of climatological factors on respiratory syncytial virus epidemics. Epidemiol Infect. 2008, 136 (10): 1328-1332. 10.1017/S0950268807000143.View ArticlePubMedPubMed CentralGoogle Scholar
- Aberle SW, Aberle JH, Sandhofer MJ, Pracher E, Popow-Kraupp T: Biennial Spring Activity of Human Metapneumovirus in Austria. The Pediatric Infectious Disease Journal. 2008, 27 (12): 1065-1068. 10.1097/INF.0b013e31817ef4fd.View ArticlePubMedGoogle Scholar
- Respiratory syncytial virus activity-- United States, July 2007-December 2008. MMWR Morb Mortal Wkly Rep. 2008, 57 (50): 1355-1358.
- Watkins RE, Eagleson S, Hall RG, Dailey L, Plant AJ: Approaches to the evaluation of outbreak detection methods. BMC Public Health. 2006, 6: 263-10.1186/1471-2458-6-263.View ArticlePubMedPubMed CentralGoogle Scholar
- Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ: Applying cusum-based methods for the detection of outbreaks of Ross River virus disease in Western Australia. BMC Med Inform Decis Mak. 2008, 8: 37-10.1186/1472-6947-8-37.View ArticlePubMedPubMed CentralGoogle Scholar
- Siegrist D, Pavlin J: Bio-ALIRT biosurveillance detection algorithm evaluation. MMWR Morb Mortal Wkly Rep. 2004, 53 (Suppl): 152-158.Google Scholar
- Walton NW: Methods and Tools For RSV Outbreak Prediction. Unpublished Master's Thesis. 2010, Salt Lake City, UT: University of UtahGoogle Scholar
- Clements C, Whiteman C, Horel J: Cold-air-pool structure and evolution in a mountain basin: Peter Sinks, Utah. Journal of Applied Meteorology. 2003, 42 (6): 752-768. 10.1175/1520-0450(2003)042<0752:CSAEIA>2.0.CO;2.View ArticleGoogle Scholar
- Fuentes-Leonarte V, Tenias JM, Ballester F: Environmental factors affecting children's respiratory health in the first years of life: a review of the scientific literature. Eur J Pediatr. 2008, 167 (10): 1103-1109. 10.1007/s00431-008-0761-7.View ArticlePubMedGoogle Scholar
- Pope CA: Adverse health effects of air pollutants in a nonsmoking population. Toxicology. 1996, 111 (1-3): 149-155. 10.1016/0300-483X(96)03372-0.View ArticlePubMedGoogle Scholar
- Pope CA: Particulate pollution and health: a review of the Utah valley experience. J Expo Anal Environ Epidemiol. 1996, 6 (1): 23-34.PubMedGoogle Scholar
- Karr CJ, Rudra CB, Miller KA, Gould TR, Larson T, Sathyanarayana S, Koenig JQ: Infant exposure to fine particulate matter and traffic and risk of hospitalization for RSV bronchiolitis in a region with lower ambient air pollution. Environ Res. 2009, 109 (3): 321-327. 10.1016/j.envres.2008.11.006.View ArticlePubMedPubMed CentralGoogle Scholar
- Karr C, Lumley T, Schreuder A, Davis R, Larson T, Ritz B, Kaufman J: Effects of subchronic and chronic exposure to ambient air pollutants on infant bronchiolitis. Am J Epidemiol. 2007, 165 (5): 553-560. 10.1093/aje/kwk032.View ArticlePubMedGoogle Scholar
- Segala C, Poizeau D, Mesbah M, Willems S, Maidenberg M: Winter air pollution and infant bronchiolitis in Paris. Environ Res. 2008, 106 (1): 96-100. 10.1016/j.envres.2007.05.003.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/10/68/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.