### Data

Data were collected retrospectively in the emergency department (ED) of an urban pediatric tertiary care teaching hospital. All patients with respiratory presenting complaints seen in the ED between August 1, 1992 and July 30, 2004 were included in the study. The data were divided into a six-year training period, and a test period consisting of the final six years. ED chief complaints were selected at triage from among a constrained list, and classified as respiratory or non-respiratory using a previously validated method [17]. The study was approved by the institutional review board.

During the study period, approximately 137 patients were seen each day in the ED. The number of daily visits for respiratory complaints varied from 2 to 78. The mean number of respiratory visits was 21.05, and the standard deviation was 9.03 (see figure 1). These data and other hospital visit data time series have previously been shown to depend significantly on the day of the week and the season of the year [16, 18–20].

### Time series algorithms

We implemented five traditional time series models used for outbreak detection: a simple autoregressive model, a Serfling model, the trimmed seasonal model, a wavelet-based model, and a generalized linear model. In addition, we introduced a model of both the expectation and the variance based on generalized additive modeling techniques. The input to each algorithm was a time series of historical daily ED respiratory visit counts, and each returned a threshold number of visits for the day immediately following the historical period. An alarm occurred when the actual number of visits exceeded the threshold.

### Autoregressive model

The autoregressive model predicted the number of ED respiratory visits using linear regression on the number of visits during the previous seven days:

{E}_{t}={a}_{0}+{\displaystyle \sum _{k=1}^{7}{a}_{k}\cdot {V}_{t-k}},

(1)

where *E*
_{
t
}is the predicted number of visits on day *t*, *V*
_{
t-k
}is the actual number of visits on day *t* - *k*, and the coefficients *a*
_{
k
}were fitted by least squares regression using training data.

### Serfling method

The Serfling method and its variants have been extensively used for surveillance of influenza and other diseases [3, 21, 22]. Our implementation modeled the number of daily visits using linear regression on sine and cosine terms having yearly periodicities to capture seasonal effects, categorical variables for the day of week, and linear and quadratic terms. Under this model, the predicted number of visits on day *t* was

{E}_{t}={\displaystyle \sum _{k=0}^{6}{a}_{k}\cdot {\delta}_{k,\text{dow}(t)}}+{a}_{7}+{a}_{8}\cdot t+{a}_{9}\cdot {t}^{2}+{a}_{10}\cdot \mathrm{sin}\left(\frac{2\pi \cdot \text{doy}(t)}{365}\right)+{a}_{11}\cdot \mathrm{cos}\left(\frac{2\pi \cdot \text{doy}(t)}{365}\right),

(2)

where dow(*t*) is the day of the week from 0 to 6, doy(*t*) is the day of the year from 1 to 365, and the Kronecker delta function *δ*
_{
x,y
}is equal to 1 when *x* = *y* and 0 otherwise. To calculate the day of the year during leap years, each day after February 28 was treated as though it occurred on the previous day.

### Trimmed seasonal model

The trimmed seasonal model is used in the AEGIS system [23] for statewide real-time population health monitoring, and was implemented as previously described [16]. Beginning with training set data, the average number of visits was calculated and subtracted from the data. From this, the average for each day of the week was calculated and again subtracted. To remove seasonal effects, the average for the day of the year was calculated after excluding the highest and lowest 25% of values for each day of the year, and again subtracted from the data. A first-order autoregressive, first-order moving average (ARMA) model was then fitted to the errors. The predicted number of visits *E*
_{
t
}was calculated by summing the overall average, the average for the day of the week, the average for the day of the year, and the ARMA prediction for day *t*.

### Wavelet model

The wavelet-based model was patterned after the wavelet anomaly detector developed by Zhang et al. [15]. The method used the number of daily visits in a training set, *V*
_{1}, *V*
_{2}, ..., *V*
_{
t-1}, to produce a prediction for day *t*. It consisted of the following steps:

1. A low-frequency wavelet component of the visit signal having periodicity of more than 32 days was calculated. This period was selected by Zhang et al. because it removes seasonal effects while preserving higher-frequency information, and because it is a power of 2, which is mathematically convenient for wavelet analysis. We used the Haar wavelet in our implementation of the model [24].

2. This low-frequency baseline was subtracted from the original signal, producing a residual for each day in the training set.

3. The predicted number of visits on day *t* was the value of the low-frequency component on the previous day.

Daily alarm thresholds for the autoregressive, Serfling, trimmed seasonal, and wavelet-based models were calculated as the sum of the expected number of visits and a multiple *λ* of the standard deviation of the model residuals on the historical training data. The value *λ* of was an adjustable parameter that affected the specificity of each model.

### Generalized linear model

The generalized linear model consisted of a Poisson distribution function, an identity link function, and a linear predictor that included day of the week, month of the year, holiday and linear trend terms:

{E}_{t}={\beta}_{0}+{\displaystyle \sum _{k=0}^{5}{\beta}_{k+1}\cdot {\delta}_{k,\text{dow}(t)}}+{\displaystyle \sum _{k=0}^{11}{\beta}_{k+6}\cdot {\delta}_{k,\text{moy}(t)}+{\beta}_{18}{I}_{\text{holiday}}(t)+{\beta}_{19}t,}

(3)

where dow(*t*) and *δ*
_{
x,y
}are described in equation 2, moy(*t*) is the month from 1 (January) to 12 (December), and *I*
_{holiday}(*t*) is an indicator function equal to 1 if day *t* is a holiday, and 0 otherwise. An alarm sounded if the value of the cumulative distribution function of a Poisson random variable with mean *E*
_{
t
}exceeded the desired specificity. This model was found by Jackson et al. [18] to have superior sensitivity to a variety of outbreak types compared to several control-chart and exponential weighted moving average models.

### Expectation-variance model

In addition, we developed and implemented a novel method for outbreak detection that captures changes in the ED visit standard deviation, as well as in the expected number of visits. In contrast to previous surveillance models, which assumed that the variance is constant or proportional to the mean, it did not assume a functional form for the variance. Instead, the dependence of both the mean number of visits and the variance was modeled explicitly. In other applications, several statisticians have modeled the variance as a function of the same or additional covariates used to model the mean using iterative successive relaxation procedures (see, for example, [25] and [26]). We employed a simplified procedure involving two distinct models: an expectation model of the daily expected number *E*
_{
t
}of respiratory ED visits, and a variance model of the daily variance {\sigma}_{t}^{2} of respiratory ED visits. The number of daily visits is then modeled as a Gaussian with mean *E*
_{
t
}and variance {\sigma}_{t}^{2}. Both components are generalized additive models (GAM's): nonparametric extensions of linear regression models having several variants depending on the choice of smoothing technique, the procedure used to find estimates of the nonparametric functions for multivariate models, and the number of degrees of freedom for each covariate [27, 28].

The GAM of the expectation accepted historical daily visit counts as input, and modeled them as a function of linear time to capture a long-term trend, the day of the year to account for seasonal trends, and the day of the week:

*E*
_{
t
}= *f*
_{trend}(*t*) + *f*
_{doy}(doy(*t*)) + *f*
_{dow}(dow(*t*)).

No smoothing was performed for the day-of-week term, since many replicates were available for each day of the week. A Gaussian kernel smoother was used for the trend term, and a Gaussian kernel smoother with circular boundaries was used for the day-of-year term since the day is a periodic covariate. Although a Gaussian was selected for its ease of interpretation, in general the choice of kernel function has little effect on the model compared to the choice of bandwidth [27]. Optimal bandwidths of the two Gaussian smoothers were estimated by a two-step procedure. First, to optimize the bandwidth of the day-of-year Gaussian, the mean predictive squared error (PSE) on a training set consisting of the first six years of ED visit data was calculated for a range of bandwidths using 10-fold cross-validation for a model containing only the day-of-week and day-of-year covariates. The bandwidth minimizing the mean PSE was chosen, corresponding to a Gaussian distribution with a standard deviation of five days. Next, the bandwidth of the kernel used for the trend term was chosen by using 10-fold cross-validation to estimate the mean PSE on the training set of a model containing all three covariates for a range of trend bandwidths, using the previously determined optimal bandwidth of the day-of-year kernel. The minimizing bandwidth was again chosen, corresponding to a standard deviation of eight days. Because the model contained multiple nonparametric functions, an iterative backfitting procedure was used to estimate each until the model converged [27].

The residuals of the expectation GAM on the historical data were squared and used as the input to the variance GAM. This GAM was also a function of linear time, day-of-year, and day-of-week variables:

{\sigma}_{t}^{2}={g}_{\text{trend}}(t)+{g}_{\text{doy}}(\text{doy}(t))+{g}_{\text{dow}}(\text{dow}(t)).

(5)

The Gaussian smoothers were chosen to minimize the PSE on the training data set using the same procedure as above. The optimal smoothers corresponded to Gaussian distributions with standard deviations of 6 and 253 days for the day-of-year and trend terms, respectively.

To set the alarm threshold for a given day, a composite expectation-variance model consisting of the two GAM's was trained on the previous six years of data. The alarm threshold for the next day was calculated as the sum of the expected number of ED visits, as predicted by the expectation GAM, and a multiple *λ* of the expected standard deviation of ED visits, as predicted by the variance GAM:

*A*
_{
t
}= *E*
_{
t
}+ *λ*·*σ*
_{
t
}

={f}_{\text{trend}}(t)+{f}_{\text{doy}}(\text{doy}(t))+{f}_{\text{dow}}(\text{dow}(t))+\lambda \cdot \sqrt{{g}_{\text{trend}}(t)+{g}_{\text{doy}}(\text{doy}(t))+{g}_{\text{dow}}(\text{dow}(t))}.

(7)

The value of *λ* was an adjustable model parameter.

All models were implemented using the Matlab software package, Version 7.0.1 [29]. The Matlab system identification, statistics and wavelet toolboxes were used for the wavelet, generalized linear, and expectation-variance models.

### Model predictions based on historical data

We used the expectation-variance model to generate alarm thresholds for each day during the test period from August 1, 1998 to July 30, 2004, which comprised the last six years of historical data. All of the available data could not be used for testing because a training period was required. To predict each threshold, the model was trained on the previous six years of data, ending the day before the day to be predicted, and was blind to the actual number of ED visits on the prediction day. The backfitting procedures to estimate the model successfully converged for each day of the study period. The model predictions for both the expected number of patients and the variance were always positive numbers throughout the study period. The average absolute predictive error was approximately four patients during the study period.

For each day, an alarm threshold was produced for each desired outbreak detection specificity between 0.01 and 0.99 in 0.01 increments. This was achieved by varying the threshold parameter *λ* appropriately. For example, to generate an alarm threshold with specificity *s* on day T, the model was trained on the historical visit data, *V*
_{
T-2191}, ..., *V*
_{
T-1}. This generated model estimates for the expected number of visits for each day, *E*
_{
T-2191}, ..., *E*
_{
T-1}, *E*
_{
T
}, as well as estimates for the expected standard deviation of visits, *σ*
_{
T-2191}, ..., *σ*
_{
T-1}, *σ*
_{
T
}. The parameter *λ* was chosen so that the fraction of historical days for which the Z-score was at most *λ* was as close as possible to the desired specificity *s*. That is, *λ* was chosen to have the property that

#{*t* : *T* - 2191 ≤ *t* ≤ *T* - 1 and *V*
_{
t
}- *E*
_{
t
}≤ *λ*·*σ*
_{
t
}} ≈ 2191·*s*.

The predicted threshold for day T was *E*
_{
T
}+ *λ*·*σ*
_{
T
}.

Alarm thresholds for each day of the test period and each desired specificity were similarly calculated for the autoregressive, Serfling, trimmed seasonal, and wavelet models. The alarm threshold for the generalized linear model was the largest integer *A*
_{
t
}for which the cumulative distribution function of a Poisson random variable with mean *E*
_{
t
}was at most *s*. With the exception of wavelet model thresholds, all alarm thresholds were calculated using the six years of visit data immediately preceding the prediction day. The wavelet model requires a training period having length equal to a power of two, so 2048 days of training data were used.

### Detecting variability in the specificity

To determine whether a given model at a particular mean specificity had constant specificity as a function of the day of the week, we tabulated the proportion of alarm and non-alarm days at that mean specificity by day of the week. A chi-square analysis was performed under the null hypothesis that all days of the week had an equal fraction of alarm days. A *p*-value less than 0.05 indicated that the specificity was dependent on the day of the week. To determine whether the specificity was constant as a function of month and year, we performed similar chi-square analyses after tallying alarm days by month of the year and by calendar year of the study, respectively.

### Simulated outbreaks

In order to ascertain the sensitivity of the models to outbreaks, we superimposed three synthetic outbreaks on the test data set: a flat outbreak of five additional patients per day for seven days, a linear outbreak which increased from one to five patients over five days, and a spike outbreak of 10 additional patients in one day. For each model, each outbreak type, and each day of the test period, we created a new semisynthetic data set by adding an outbreak beginning on that day to the original data set. We then made an alarm threshold prediction for each of the outbreak days, and for each desired specificity between 0.01 and 0.99, based on training using the semisynthetic data set.

### Estimating sensitivity, specificity, and timeliness of detection

The actual mean specificity for one model at each desired input specificity was determined by running the model on the historical data set. Specificity was estimated by calculating the fraction of days without alarms for each day of the week, month of the year, or calendar year. Sensitivity calculations used the results of applying each of the models to the semisynthetic data sets. The sensitivity was calculated as the fraction of outbreaks for which there was at least one alarm day. Exact 95 percent binomial confidence intervals were calculated for each estimate of sensitivity and specificity. Timeliness of detection was evaluated for each method by calculating the mean lag in days between the start of a flat outbreak and the first alarm sounded. Missed outbreaks, for which no alarms were sounded on any day of the outbreak, were excluded from timeliness calculations. An alarm sounding on the first outbreak day corresponded to a lag of zero. Timeliness calculations were calculated at the benchmark specificity values of 0.85 and 0.97.

### Comparing outbreak detection among models

To compare the outbreak detection performance of the expectation-variance model with the traditional models, receiver-operator (ROC) curves were constructed for all models. ROC curves show the dependence of the mean sensitivity on the mean specificity, and the area under the ROC curve is an indicator of overall performance. The area was estimated by the trapezoidal method.