The record review was approved by the Children's Hospital Boston Committee on Clinical Investigation (Protocol Number M-01-03-040). The setting was Children's Hospital Boston, an urban, tertiary care pediatric teaching facility. Subjects were consecutive patients visiting the emergency department from June 2, 1992 through January 4, 2002. Patients with respiratory illness were identified using a previously validated set of chief complaint and ICD codes.
We built models of ED utilization using a time-series analytic approach. Models were constructed through an iterative process and were trained on roughly the first eight years of data (2,775 days). The remaining two full years of data (730 days) were reserved for testing and validation. The model and the statistical methods used in its development are described below. Complete details are also available from the authors. Our approach consisted of three main phases: signal characterization, model development, and model validation.
Signal characterization
As a first step towards understanding the trends and behaviours of ED utilization, we performed a principal Fourier component analysis on the daily visit totals (Figure 1). A Fast Fourier Transform (FFT) was calculated from the time series data, and the results indicated the presence of strong weekly and yearly periodicities in the data. The analysis was performed using the Matlab software package, Release 12, Version 6.0 [21]. Guided by these findings, we calculated the ensemble average profiles for both the weekly and yearly periodicities (Figures 2 and 3). These profiles represent the average number of visits for each day of the week and year respectively.
Model development
Our model development methods followed established time series forecasting procedures. The signal was broken up into its various components: the overall mean, the weekly trend and the yearly trend. Mean Absolute Percentage Error (MAPE) was used to measure of quality of fit. A MAPE of 0% indicated a perfect fit of the model to the training dataset.
The trimmed mean seasonal model was then constructed from the overall mean, the weekly trend, and the yearly trend, as follows:
Expected number of visits = Overall mean.+ Mean for day of week + Trimmed mean for day of year
We started with a simple-mean model that consistently predicted 137 total visits every day, 48 of which were respiratory-related. This mean was subtracted from the data. Next the weekly average profile of the resulting signal was calculated, yielding a model of the average utilization patterns for each day of the week. The weekly profile was also subtracted out from the data. The yearly average profile of the resulting signal was calculated, yielding a model of the average utilization patterns for each day of the year. To increase the robustness of this model and ensure that it was not overly sensitive to outliers in the training set, a trimmed mean function (ignoring a certain percent of the highest and lowest values) was used for calculating the yearly profiles. Testing a range of trimming magnitudes, we found that excluding the highest and lowest 25% of the values yielded the best results as far as improving the quality of fit (low MAPE).
When analyzing the residuals (forecasting errors) of the trimmed-mean seasonal model, a significant degree of autocorrelation was detected. Significant autocorrelations indicate that, for example, a high utilization rate on one day is associated with a high utilization rate on the following day. To model these localized auto-correlative effects, we fit an Auto-Regressive Integrated Moving Average (ARIMA) time-series forecasting model to the residuals (forecasting errors) of the trimmed mean seasonal model. This class of models is often used in economic time series forecasting, and is described in full detail in [22]. The ARIMA model was fit using the SAS Release 8.2 software package Time Series Forecasting System (Cary, NC) [23].
Three main parameters need to be selected when fitting an ARIMA model: the order of autocorrelation (AR), the order of Integration (I), and the order of Moving Average (MA). The higher-order models are more complex and can usually achieve a better fit of the training data set, while the simpler low-order models are usually less likely to over-fit to training dataset.
The first derivative of the data was taken to ascertain whether Integration would improve the model. Since forecasting was not improved, the optimal degree of integration was determined to be zero (I = 0). Different combinations of AR and MA orders were then tested. Of all the models tested, an ARIMA(2,0,1) model (second-order Auto-Regressive combined with first-order Moving Average) was found to work best for overall ED volume. An ARIMA(1,0,1) model (first-order Auto-Regressive combined with first-order Moving Average) was found to work best for respiratory-related ED volume. The SAS toolkit was used to automatically fit the parameters for the models.
Model validation
The models, trained on approximately the first eight years of data – the training set, were validated using the final two years of the dataset – the validation dataset. Again, a MAPE of 0% indicated a perfect fit of the model to the validation dataset.
We set out to quantify the detection performance of a biosurveillance system based on the above ARIMA model of overall ED volume by measuring its ability to detect outbreaks. Since the historical dataset used here was devoid of any known outbreaks and there is a paucity of data available on actual germ warfare attacks [11], we introduced a set of simulated outbreaks into the historical visit data by adding a certain number of simulated visits to specified days.
The simulated outbreaks lasted seven days each. While real outbreaks may last well beyond these durations, we focused on the first few days since useful detection systems should be able to spot outbreaks within that time-frame.
The outbreaks were spaced 15 days apart and there were 233 simulated 7-day outbreaks in total inserted into the data. This ensured that all the effects of any previous outbreak could be reset from the detection system's memory before the onset of the next outbreak. Furthermore, since the only significant periodicities present in the data were seven day (weekly) and 365.25 day (yearly), the 15-day spacing of outbreaks yielded a good unbiased sample of days for infusing outbreaks.
Each complete 3,505-day simulation used outbreaks of only one size. Different sizes of outbreaks were tested in separate simulations. For each simulation, sensitivity was calculated as the ratio of the number of outbreaks detected (true positives) to the total number of outbreaks. Since there is an inherent tradeoff between sensitivity and specificity, a benchmark specificity of 97% was chosen, producing approximately one false alarm per month. A study that evaluates different detection approaches using simulated outbreaks can be found in [24].