Bmc Medical Informatics and Decision Making Modeling and Detection of Respiratory-related Outbreak Signatures

Background: Time series methods are commonly used to detect disease outbreak signatures (e.g., signals due to influenza outbreaks and anthrax attacks) from varying respiratory-related diagnostic or syndromic data sources. Typically this involves two components: (i) Using time series methods to model the baseline background distribution (the time series process that is assumed to contain no outbreak signatures), (ii) Detecting outbreak signatures using filter-based time series methods.


Background
Well-known, as well as previously uncharacterized infections continue to (re)emerge around the globe. To avoid casualties from outbreaks of these infections and from the potential criminal uses of bioagents, surveillance systems are needed that have the capacity to identify such outbreaks accurately and rapidly. The accuracy and timeliness of biosurveillance systems rests on the ability to model the uncertainty, severity, and aberrancy of clinical symptoms that are likely to portend disease outbreaks as expressed through the data monitoring system. Shmueli [1] summarizes the problems that biosurveillance systems, in general, pose to traditional statistical monitoring: (a) biosurveillance data may not be independent or stationary; (b) non-traditional data are assumed to contain earlier signature of an outbreak but this signal is weaker compared to actual diagnosis data; (c) since there are no data that contain bioterrorist outbreaks, outbreak patterns particularly as they would manifest in non-traditional data streams are unknown; (d) biosurveillance data are assumed to have no bioterrorist outbreaks, but the natural outbreaks add up to the background noise.
The key issue at hand is to design statistical modeling and detection methods that can address these problems.
Among children, respiratory symptoms are an attractive target for surveillance. These are a prominent feature of many childhood epidemics and an early presentation of diseases like avian influenza, severe acute respiratory syndrome (SARS), and inhalational anthrax that have recently come to the public's attention. Unfortunately, respiratory complaints are also a feature of many common childhood illnesses, reducing the ability of biosurveillance systems to detect epidemics of greater public health concern. What is needed, therefore, is clinical information that is readily accessible and pre-processed in a manner that reflects the severity and aberrancy of respiratory symptoms. Using such data, discrimination between common childhood diseases and more serious respiratory epidemics would be possible.
Chest radiographs (X-rays), because they are readily available and are generally ordered by clinicians to evaluate respiratory complaints that are atypical or severe, have the potential to act as such a bio-monitoring and validation tool. In addition, detection based on models of radiograph ordering can indicate when in-depth follow-up is needed, as may occur when ordering of radiographs by clinicians is excessive for a given time of the year. Such indepth review of radiographs may confirm clinical suspicions of an emerging epidemic or signal the need to perform a targeted review of medical charts to identify anomalous findings or groupings of aberrant findings that might herald the early stages of a respiratory-related outbreak.
In this article we consider time series methods for the modeling and detection of respiratory-related outbreak signatures based on chest radiograph ordering patterns from a number of pediatric emergency departments (EDs) located in the Midwestern region of the United States.
These models include ambient temperature records collected in each city, as a covariate. We use the temperature series as a surrogate measure of the annual influenza season. Also, a patient visit count series is included in the models to account for variations between-EDs (like ED sizes) and within-EDs (day-of-week, for example). Addressing the fact that the underlying process is neither "independent or stationary", our interest is to model the underlying "respiratory-complaint background", using the available covariates and significant temporal dependencies present in the data. Without modeling the spatial dependence directly, we investigate whether or not there is evidence of spatial homogeneity in the statistical models across cities. We use filter-based prediction methods to indicate evidence of respiratory-related outbreaks (e.g., due to anthrax attack) using chest radiograph data. We describe the form and function of various filters that are commonly used to detect outbreak signatures. Using a stochastic model for an anthrax attack, we assess the performance of these methods. Since there are no data that contain outbreak patterns, the use of a model is key to providing realistic outbreak patterns that can accurately be used to evaluate these statistical detection methods.
Reis, Mandl, and others use time series methods to detect evidence of disease outbreaks at Boston Children's Hospital [2][3][4], modeling specific clinical complaints as deterministic trend and seasonalities plus a stationary autoregressive moving average (ARMA) process. They repeat this process for the visit counts instead of considering a joint modeling procedure. The detection algorithm filters one-step-ahead prediction errors [5], and looks for values in this residual process that exceed a predetermined threshold. Their simulations are based on less realistic deterministic outbreak models. Ivanov, et al. [6] use the Exponentially Weighted Moving Average smoother to measure timeliness, sensitivity, and specificity of free-text chief complaints (information describing patient's status on the ED visit). They indicate that the methods are good for detecting relatively large seasonal outbreaks, but not for small outbreaks. Burkom, et al. [7] extend this approach using Bayes Belief Networks to improve detection sensitivity and timeliness. There are also a number of wavelet-based and smoothing-based methods that can be used to monitor and detect abnormalities of unknown form, occurring over different time scales [1,8,9]. The use of scan statistics [10][11][12] has also gained popularity in recent years. Many of the methods based on scan statistics use fixed-effect models for biosurveillance (also see, e.g., [13] and [14] for other methods based on fixed-effect models).
Our manuscript is organized as follows: we start by summarizing the data of interest, and propose a statistical model for the ED data in each city. Using the growing lit-erature on the subject, we outline a stochastic model for an anthrax outbreak along with a healthcare utilization model for simulating people entering the ED. We then describe the methodology and theory for detecting unexpected outbreak signatures in time series sources using filters. We examine the form and function of the filters used for detection. In the results section, we explore the time series models obtained for each Midwest ED included in our study. We assess these detection methods using a simulated anthrax attack, and we end with a discussion.

A chest radiograph model for respiratory-complaints
The data of interest consist of daily counts of ED visits and chest radiographs taken between January 1st, 2003 and September 9th, 2004, in five metropolitan children's hospitals in the Midwest of the USA (Minneapolis/St. Paul, Milwaukee, Chicago, Akron, and Columbus), supplemented with time series of daily average temperature, obtained from the Average Daily Temperature Archive at The University of Dayton [15]. These series are shown in Figure 1. There seems to be strong seasonal component in the chest radiograph and visit series, for all the cities, which is negatively correlated with temperature. Although it could be argued that we do not have enough years of data to prove this empirically, it is expected that a higher number of respiratory complaints is associated with colder temperatures.
We now discuss two aspects of the statistical model for these daily chest radiograph counts: distribution and scale. Although the outcome variable of interest is counts, like other researchers in the field [1][2][3][4][6][7][8][9], we model the data using a Gaussian rather than Poisson process. The reasons are: (i) A normal approximation to the Poisson random variable is reasonable when the mean number of radiograph counts is large, as in this study; (ii) Gaussian time series models are easier to fit, diagnose, and interpret; (iii) The theory for filter-based methods commonly used for detecting the outbreak signature in such data is well developed for Gaussian processes. In some applications, count data are transformed when Gaussian models are used. Transformations, such as log or square root, are used to parameterize multiplicative models or to stabilize the variance. We use the original scale instead of transforming the data, like other researchers in the area, for ease of interpretation. This is especially important when we analyze filter-based detection methods in this article. For the same reason, we do not model the number of chest radiographs per daily ED visits (i.e., the proportion of radiographs). The filter-based theory, allows us to directly assess the effect of additive stochastic outbreaks signatures upon the radiograph series.
In our study, the first ten months of data (Jan to Oct, 2003) were used as the training set, while the remaining ten months (Nov, 2003to Sep, 2004 were used, as a test dataset, to evaluate the model and detection methods. Since the goal is to predict the number of chest radiographs, for each city we fit a linear time series regression model, using the number of chest radiographs as the response, and the number of visits and temperature as predictor variables. We smooth the temperature series, because we believe that long-range temporal trends are more predictive of chest radiograph counts. Let {R k,t } denote the number of chest radiographs for the ED in city k (k = 1, 2, ..., 5) on day t and let Here {V k,t } are the visit counts for city k and {T k,t } is the smoothed time series of temperatures for city k, filtered by taking a thirty day moving average to remove intra-month variation. To complete the model we assume that {X k,t } is a zero mean stationary time series (we discuss the consequences of this assumption in the Discussion), that we shall represent using a seasonal autoregressive integrated moving average process (SARIMA). The SARIMA model (defined in the Appendix) allows for simultaneous modeling of dependencies on both the day as well as the weekly seasonal scales. To aid in the comparison of the dependencies across cities, the order of the time series model, as determined by choosing the autoregressive (p k and P k ) and moving average orders (q k and Q k ), will be the same for each city k.

A stochastic model for an anthrax outbreak
We now propose a simple stochastic model for an inhalational anthrax outbreak, based on the work of Buckeridge, et al. and Brookmeyer et al. [16,17]. As proposed by these authors, our model incorporates two elements: Any inhalational anthrax outbreak starts with the dispersion of the anthrax spores. Once spores are inhaled by a subject, in the incubation stage spores either germinate or are cleared out the lung. For the spores that germinate during incubation, the later stages of the disease are the prodromal and fulminant stages, followed by death. Buckeridge, et al. [16] model the spread of spores over a grid covering the Norfolk, Virginia region using a Gaussian plume model. Their model considers the source and Time series plots  strength of the anthrax attack, along with prevalent wind directions.
Instead of using the region-based approach of Buckeridge, et al., we use the individual-based infection scheme of Brookmeyer, et al. [17]. Although it is known that anthrax spores can survive for long periods of time in the environment, we consider a small scale scenario. Since the population of interest is the individuals attending children's emergency departments, we assume an outbreak that affects a fixed number, N say, of children. Brookmeyer, et al. [17] define the infection probability of an individual exposed to inhalational anthrax using a competing risk model, modeling the dynamics of spore clearance and germination. Let θ represent the hazard rate per unit time (days, say) that a spore is cleared from the lung and λ be the rate of germination. Suppose that each individual inhales a dose of D spores. Then, the probability that at least one spore germinates is called the attack rate (AR) and is calculated using a Poisson approximation, as For a given attack rate, the probability that at least one of the D spores germinates within t days is given by Note that the limit, lim t → ∞ F(t) = AR. Based on a statistical analysis of an anthrax outbreak that occurred in Sverdlovsk, Russia, Brookmeyer, et al. [17] estimate hazard rates of between 0.05 and 0.11 for θ (which is compatible with θ = 0.07 obtained from animal studies for an AR of 0.5). The value of λ is not estimated in their data analysis -based on animal studies they found that the rate λ lies between 5 × 10 -7 and 10 -5 . Buckeridge, et al. [16] propose log normal models for the duration of the prodromal and fulminant stages, based on the 2001 anthrax attack in the United States. The median duration of the prodromal stage was 12.18 days, with a dispersion of 1.41, and the median duration of the fulminant stage was 1.5 days with a dispersion again of 1.41.
Next, we describe a simplified health-utilization model for people entering the ED, based on ideas discussed in [16]. During the incubation stage, we assume that no infected people enter the ED. Non-infected people that enter the ED for other chest problems are part of the background data. During the prodromal and fulminant stages we assume a simple Markov model of utilization: each infected subject is a Bernoulli event, independent across days. At the prodromal stage people enter the ED with probability P d on a weekday and P w on weekends. At the fulminant stage the probability of entering the ED, P f say, is larger than the probabilities in the prodromal stage. The reason is that at the fulminant stage, the anthrax symptoms are similar to those of a heart attack and therefore people enter the ED with higher probability. The differentiation between weekday and weekend is irrelevant at this stage. We suppose that a small percentage of people in the prodromal or fulminant stages are misdiagnosed and thus need to re-enter the system. Also, we assume that people can potentially be misdiagnosed a maximum of two times during the same attack (10% in the first visit and 5% in the second visit). The probabilities of entering the ED after being misdiagnosed are increased by an additive factor, C, for every additional entry. Our model allows for a small probability of drop-out, to account for other ways of leaving the system (e.g., pharmacy visit). The health-utilization model could easily be extended to include, e.g., varying probabilities of entering the ED by stage/time, and/or more advanced ways to exit the system.

Filter-based outbreak signature detection
The main idea of filter-based methods is to create a detection process {D k,t }, for each time point t, which is a weighted average of the diagnostic or syndromic data to be used for the detection of an outbreak signature. The weights that appear in {D k,t } are defined using the form of the time series model and a filter, {a l } say. Extreme positive values of the detection process at a time point indicate a possible outbreak signature. The definition of the detection processes has its origin in process control [5], and is popularly used in the context of biosurveillance [1][2][3][4]6,8]. Naturally, the form of the filter has an important effect in the detection process.
A common parametric approach that we follow in this study is to first obtain the residual process by subtracting off the non-stationary part of the model (i.e., the effect of the covariates). We then define the filter {c k,j }, that first decorrelates (whitens) this residual process using the onestep-ahead prediction errors and second filters this whitened series using {a l } to yield the detection process {D k,t }.
Commonly used examples of the filter {a l } include the differencing, moving average, or exponential filter. For a fixed value of α, let 1 -α denote the specificity (the probability of no detection, when there is no actual outbreak signature). We declare evidence of an outbreak signature at time t if the detection process at that time point, D k,t , exceeds a threshold τ k,α , calculated using the data or the process. Further details are given in the Appendix. Filter plots Figure 2 Filter plots. Plots of the four filters {f k, l } that can be used to calculate the detection process D k, t for the Columbus series. The number in parentheses after the filter name is τ k, α , calculated using a normal approximation. 2. When {a l } is the 7-day filter, {f k,l } is a weighted average of the last 7 days (with most weight on the current day), minus a weighted decaying average of the remaining past days. The weekly seasonal terms are not as strong, compared to the filter that is the convolution of the 1-day filter.
3. When {a l } is the linear filter, {f k,l } is an average of 6 days in the past. Far more weight is put on the current day relative to the previous 6 days. From this we subtract an average of values previous to the 6 days. Most of the weight from the second average comes from around days 8-10 in the past.
4. When {a l } is the exponential filter, {f k,l } is a combination of the 4 most recent days (mostly the current day, very little by the fourth day). We subtract an average of the past values (mostly days 6 -11 in the past).
In Figure 2, the numbers within the parentheses at the top of each panel show the value of the threshold τ k,0.03 , calculated using the normal approximation used in the Appendix (equation 10), assuming an innovation variance of = 1 for the process. We examine the effects upon the sensitivity, specificity, and timeliness to detect anthrax outbreaks in the simulations that we present later.

Time series modeling
We fit the chest radiograph model given by (1) to the first ten months of data for each ED in city k. After specifying a model for {R k,t }, we have a regression model with time series errors, that can be fit using standard maximum likelihood methods. We selected the order of the SARIMA model for the time series errors, {X k,t }, as defined by (4) in the Appendix using standard identification techniques based on the sample autocorrelation and partial autocorrelation (e.g., [18]). To facilitate the comparison of the time series models across cities, we restricted to same order of model for each series. The orders p k = 1 and q k = 1, correspond to a single autoregressive and moving average term on the daily scale, equivalent to observing an autoregressive process of first order with measurement error ( [19], Exercise 2.9). With a seasonal period of seven days (s = 7), we set P k = 1 and Q k = 1, so that the random seasonal component is a combination of an autoregressive and moving average term, each of first order over a period of seven days. The seasonal component of the time series model corresponds to an evolution of a first order autoregressive process with a measurement error over the weeks.
The model fit for the data in each city was assessed using diagnostic plots (time series plots, normal quantile plots, autocorrelation and partial autocorrelation plots, and the spectral estimates) of the estimated innovations of the time series component. Figure 3 shows some of these diagnostic plots for the Akron data (the residuals plots for the other cities were similar). Except for a few extremely positive values, the estimated time series looks stationary. The normal approximation is good, as evidenced by the straight line on the normal quantile plot. The plots of the sample autocorrelation and partial autocorrelations functions of the time series innovations lie inside the dotted confidence bounds for a white noise process (i.e., are samples of uncorrelated time series errors).
The parameter estimates for each city are summarized in

Simulations
In the simulations described in this section we used the following experimental design. For each city, we fit the regression time series model using the first ten months of data (training data). We used the second half of the data from November 1st, 2003 onwards and added outbreakrelated counts to test for the detection of an outbreak signature.
We simulated an outbreak, as previously described, 500 times on 500 individuals using an attack rate of 0.5. The first day of the outbreak was randomly picked from a uniform distribution in the period December 5th, 2003 to June 22nd, 2004. (Making the earliest date later than November 1st, 2003, guarantees enough data for the filter-based methods to work with a prediction window of m = 28 days). In the absence of other information (e.g., Buckeridge, et al. [16] do not report their probabilities) we set the ED utilization probabilities to be: P d = 0.25 (weekday entry probability at the prodromal stage); P w = 0.4 (weekend entry probability at the prodromal stage) and P f = 0.80 (entry probability at the fulminant stage). The daily drop-out probability was set at 0.05. We assumed that with probability 0.9, the infected persons that enter the ED for the first time during a given outbreak receive a chest radiograph, i.e., 10% of the infected persons are misdiagnosed at the first visit. In a subsequent visit, 5% of the infected persons that re-enter the ED are misdiagnosed. The misdiagnosis additive factor, C, was assumed to be 0.05. We added the counts from the ED visits and subsequent number of chest radiographs generated from the healthcare utilization model on each day during the outbreak period. Let {O k,t } denote the number of chest radi-

Partial autocorrelation function
ographs attributable to the anthrax attack on day t. Since each simulated outbreak will be different, we start by summarizing the distribution of {O k,t }. Figure 4 shows a plot of the 0.025, 0.5 (i.e., median), and 0.975 quantiles calculated over the 500 simulated realizations for each day t.
Examining the progression of the quantiles over time allows us to explore the center and tails of the extra radiograph count distribution. The counts increase rapidly in the first week, then stay fairly constant (except for the spikes), for a week and then slowly drop to zero after the second week. The small spikes are due to the different probabilities of entry in the prodromal stage. Although the shapes at each quantile are similar, the magnitude and duration of the extra radiographs counts differ. The counts drop to zero at different time points for the different quantiles. They drop to zero after 3 weeks for the 0.025 quantile, after 4 weeks for the 0.5 quantile, and after 7 weeks for the 0.975 quantile. These patterns in the observed time-varying distribution of the outbreak signature are a strong motivation to not use deterministic outbreak patterns, as used by some authors, since deterministic outbreak patterns do not give a realistic assessment of detection methods.
We declare that there is evidence of an outbreak signature in the radiograph data at time t if the outbreak process D k,t , defined by (2), exceeds a given threshold, τ k,α . Suppose that {Y k,t } is the process defined as the sum of {O k,t }, the extra radiograph counts attributable to the outbreak, and the radiograph process {R k,t } that follows model (1). Removing the non-stationary part due to the estimated effect of the covariates yields By applying the filter to this series we obtain the detection process that we would observe in the presence of extra counts attributable to an outbreak: To examine the effect of the filter upon the outbreak signature we examine the standardized quantity for the filters that we defined previously. We set α = 0.03, which corresponds to a false alarm rate of one per month [2]. Then, as shown in the Appendix, the threshold, τ k,α , is chosen by solving P(D k,t > τ k,α ) = α for τ k,α . Either we can estimate this value from the data using the 1 -α quantile of the {D k,t } process of non-outbreak-based training data, or via a normal approximation, given by (10). There were some differences between the values of τ k,α for the two methods in the simulations we studied. We chose the normal approximation method as it tended to preserve the specificity across the filters and EDs that we considered.
Scaling g k,t by τ k,0.03 and using the same simulated radiograph realizations due to the outbreak, allow us to compare the filtered signals consistently across filters.
The filtered series will be different for each simulated outbreak pattern realization. Just as in Figure 4 with the radiograph series, we calculate the 0.025, 0.5 (median), and 0.975 quantiles of the filtered radiograph series on each day t, based on the model fit to the Akron ED series. Figure  5 shows the time-varying quantiles for each of the four filters. Relative to the no-outbreak situation, positive values .
denote those time points for which there would be a greater chance of setting off a detection. Negative values denote the time periods that would actually decrease the chance of a detection. Except for the 0.975 quantile of the 1-day filter, each quantile of the filtered signal drops below zero for a period, and stabilizes at zero after that.
There are some similarities in the shapes of the three quantiles presented in Figures 4 and 5. Namely, the filtered signals increase over the first week, and then slowly decrease. The spikes in the distributions of the original signal are preserved for the 1-day filter, smoothed out for the 7-day filter, and partial smoothed over for the linear and exponential filters. Smoothing out the spikes tends to lead more sustained peaks (i.e., wider periods of higher sensitivity). The power of detection is smaller for the 1-day and 7-day than the linear and the exponential (as witnessed by smaller maximum heights in each of these stwo curves). For all filters, a drop below zero occurs immediately after the peaks; the counts go slowly back to zero. Periods for which the filtered signals are below zero inhibit detection and thus can mislead the prediction of the end of the outbreak signature.
We calculated the proportion of true non-detects in the absence of an outbreak signature, based on the test data (the specificity), as well as the proportion of true detects during the outbreak (the sensitivity) averaging over the 500 simulated anthrax outbreaks, for each filter and city.
We calculated the actual specificity and sensitivity using the test data, for values of α, in the range 0.01 to 0.10 in steps of 0.005. Figure 6 displays the difference of actual specificity (calculated using the training data) and the nominal specificity (predetermined 1 -α) for different values of α. Curves above or below the horizontal dotted line at zero, indicate departures from the nominal α. The Chicago series preserves the nominal value of the specificity (curves are clustered around the zero line). For the Minneapolis-St.Paul series the actual specificity is biased downwards, and for the other cities, the departure from the nominal value changes as α increases. The choice of filter affected the calibration. To understand the tradeoff between the specificity and the sensitivity we show the receiver operator characteristic (ROC) curves, for each city, in Figure 7. For all cities, the 1-day filter has the poorest sensitivity. Except for Milwaukee, the 7-day filter tends to have higher sensitivities compared to the other filters, but also tends to have the poorest specificity. The exponential filter balances the specificity-sensitivity tradeoff. Tables 2 and 3 display, respectively, the median and maximum detection times for each filter and city for our model. The 1-day filter performed poorly. The other filters are more comparable. We observed (results not shown) that the time to detection increases if the attack rate is reduced and also if the probabilities of entering the ED is delayed.
We now compare the performance of our model, as defined by equation (1) Quantiles of the chest radiograph distribution attributable to an anthrax outbreak Figure 4 Quantiles of the chest radiograph distribution attributable to an anthrax outbreak. Plots of the day-by-day quantiles of the radiograph counts attributable to an anthrax attack, based on 500 simulated anthrax outbreaks.  these four models. For illustration we use the linear filter, which represents a compromise between the actual specificity and sensitivity (Across all four models, the 1-day filter always had specificities closest to nominal and the 7- Quantiles of the filtered chest radiograph distribution attributable to an anthrax outbreak Figure 5 Quantiles of the filtered chest radiograph distribution attributable to an anthrax outbreak. Plots of the day-byday quantiles of the filtered radiograph counts attributable to an anthrax attack, based on 500 simulated anthrax outbreaks. The filtering operation, as defined by (3), is based on the first ten months of training data from Akron, in combination with one of four filters (1-day, 7-day, linear and exponential). Each filtered signal is scaled by τ k, 0.03 , calculated using a normal approximation, for comparisons to be made across filters. The biases in the specificity Figure 6 The biases in the specificity. A plot of the actual specificity (the specificity calculated using the training data) minus the nominal specificity (1 -α) for different values of α. The value of the detection threshold τ k,α , was calculated using a normal approximation. One minus the actual specificity

Mean sensitivity
MinnStPaul day filter had specificities furthest away). Except for Columbus, Model 1 achieved a specificity closer to the nominal value across all EDs (Model 3 outperforms Model 1 in Columbus). Except in Chicago, Model 4 had the largest magnitude of bias. Figure 9 compares the receiver operator characteristic (ROC) curves for the four models across the five EDs using the linear filter (Across all models, the sensitivity for the 1-day filter was the lowest while the 7-day and linear filters tended to have the highest sensitivity). To readily compare the models, we use one minus the nominal specificity on the x-axis. Except for Akron and Minneapolis/ St. Paul, the random weekly and daily time series components in Model 1 yield higher sensitivities, compared to including a day-of-week covariate plus ARMA errors in Model 2, and day-of-week covariate plus no ARMA errors in Model 3. This is due to the insignificance of the day-ofweek fixed effect in Models 2 and 3. In Akron, Model 3 has higher sensitivities than Model 1, suggesting that the time series errors are less relevant in outbreak signature detection for the test data. In Minneapolis/St. Paul Models 1, 2, and 3 have equivalent sensitivities. Across all cities either Model 2 or Model 4 had the lowest sensivities, indicating that these models are less reliable for outbreak signature detection.
In terms of sensitivity, except for Chicago, Model 3 tended to outperform Model 2 for all filters in the cities we studied. This is counter-intuitive, as we would expect that a model that includes the significant ARMA time series component (Model 2) should outperform one that did not contain any time series components (Model 3). In practice there is a tradeoff between estimation and prediction. Table 4 displays the Akaike information criterions (AICs) for the four models, fit to the training data of the five EDs. For each ED, a smaller AIC denotes a model that better fits the data. Model 3 performed better than Model 2 in one-step-ahead predictions (Figure 9), but Model 2 better explains the training data than Model 3 (Table 4).
Using the AIC we find Model 1 fits best to the training data, even though it does not always yield the highest sensitivity.

Discussion
Our intention in this study was to find a flexible set of statistical models that could be applied across a number of emergency departments. We employ time series models that include covariates, such as patient visit counts and ambient temperature, as well as random seasonal terms. We use chest-radiograph ordering data from emergency departments of five regional Midwest children's hospitals to detect signatures of respiratory outbreaks. We include visit counts series as a covariate in the chest radiograph model to account for variations due to, for example, ED sizes, changes of staff within the ED, and even some seasonalities across the time period of interest. We use the temperature series as a surrogate measure of the influenza season -the colder months in the western hemisphere. This is a more accurate measure of the influenza season than using a fixed covariate such as a sinusoid. To reflect uncertainty in the variation of the influenza background over seasons, these models allow for randomness in the seasonal components. The use of random seasonal components is an advantage over traditional fixed effect models, since temporal patterns are not assumed to repeat precisely the same way. Thus, signature detection capabilities are improved for the majority of EDs -sensitivity is higher. For increased accuracy and timeliness, the use of our model for the data analysis should represent one component of an integrated detection system. Once a signal is triggered by any of our models, we recommend the use of clinical follow-up to corroborate or refute the emergence of a bona fide epidemic. For example, radiographs and medical charts will need to be reviewed to identify highly anomalous findings or groupings of aberrant findings that would be expected to be present at early stages of outbreaks. We believe that the approach utilized in this work will aid in this process and is more appropriate than models using fixed periodicities that do not have the ability to capture the underlying variabilities across seasons.
Of note, our study shows that there are similarities in the chest radiographs series from different EDs that can, for the most part, be modeled by similar time series models. Similarities of the time series model across EDs have a number of ramifications for detection of outbreak signatures. First, by borrowing information across the different EDs we can build more complicated multivariate time series models, possibly involving the joint modeling of chest radiograph and visit counts across locations. Second, we could use these models to jointly detect outbreak signatures across large spatial regions. In this context, Diggle et al. [20] use spatio-temporal Cox models to identify anomalies (real and artificial outbreaks) in the space-time distribution of gastrointestinal infections. But, some caution is needed because one potential drawback of aggregating data spatially is that the chance of detection can be reduced (data from unaffected areas will mask the outbreak signature, and increase the detection time) [21]. For localized outbreaks, there is still some utility in building models that borrow strength across EDs, even if the joint detection of outbreak signatures is not meaningful. In terms of these localized outbreaks, a geographically close site could act as a benchmark to judge detections at other sites. For example, under certain circumstances an epidemic detection signal triggered in Columbus, but not in Akron, could imply that some unusual event had occurred in Columbus. It should be noted, however, that even though there are similarities in the time series models, our simulations demonstrate that the sensitivity and timeliness to detect outbreak signatures using chest radiograph counts were different across EDs. These differences may limit the utility of our models in comparing signals across different EDs.
Our study has a number of limitations. We assume that the resulting series {X k,t }, after accounting for these covariates, is stationary. We also ignore the fact that an individual may have multiple ED visits. Furthermore, these data do not contain any known anthrax outbreaks. Instead, outbreak patterns are simulated using a simple stochastic model. Although more care needs to be taken when summarizing simulation results, we agree with Buckeridge, et al. [16] and Brookmeyer, et al. [17] that stochastic outbreaks are more realistic than deterministic ones. Other results (not presented here) indicate that the results are different (and possibly misleading) with a smooth deterministic outbreak rather than a stochastic one. We utilize the four filters used by Reis, et al. [2]. Although filter design is an active area of research, especially in engineering, we decided to only study these four simple and fairly easy to understand filters. In the future we could extend our analysis to different choices of filter. We view wavelet-based biosurveillance methods [1,8] as extensions of this idea to different linear time-invariant filters. An important issue in filter-based-prediction methods is the choice of covariates and the correlation structure for the time series model. Clearly, the properties of the time series model could change with time. One way to improve the prediction of outbreak signatures is to use time-varying models, with parameters that slowly evolve over time. The use of time-varying parameter models is analogous to a periodic review of the parameter values. Window-based estimation methods could be used to re-estimate the model parameters. In this case, the choice of window length is critical as it represents a compromise between the efficiency to estimate the model parameters, and the sensitivity and specificity of detection. Other issues to be taken into account include negative singularities (as defined by [8]) that can cause a drop in the sensitivity. Abnormal points (like holidays or severe weather) can cause "false alarms". Our solution to this problem is to jointly model chest radiograph and visit counts. So far, we have assumed that the visit counts are observed without error. Related to these issues, we are investigating methods that can be used to incorporate the uncertainty in parameter estimation upon the detection procedure.

Conclusion
We present in this paper a stochastic model of chest radiograph ordering patterns and temperature as an adjunct to a biosurveillance system for detecting emerging respiratory-related epidemics, focusing on a potentially highimpact public health hazard such as inhalational anthrax. We show that in time series analysis of respiratory-related data it is important to capture important seasonal effects that are present in such data, as well as to consider the influence of important covariates that can be easily obtained and incorporated into the models. We also demonstrate the importance in assessing the sensitivity and specificity of these methods, of utilizing more realistic stochastic, rather than deterministic, models for outbreaks patterns. We demonstrate spatial homogeneity in chest radiograph data across EDs and suggest ways in which these observations may be used to improve regional biosurveillance for (re)emergent infections.
Regardless of the choice of filter, our simulations demonstrate that the specificity calculated using the training data   The biases in the specificity for the four models Figure 8 The biases in the specificity for the four models. A plot of the actual specificity (the specificity calculated using the training data) minus the nominal specificity (1 -α) for different values of α using a linear filter, for four different models. The value of the detection threshold τ k,α , was calculated using a normal approximation. Receiver operating characteristic curves for the four models Figure 9 Receiver operating characteristic curves for the four models. Receiver operating characteristic curves (averaged over 500 simulations of an anthrax outbreak) for the detection of the anthrax outbreak signature in the chest radiographs at each of the five metropolitan children's hospitals using the four different models. The linear filter is used in each case. MinnStPaul varied across the five EDs studied ( Figure 6). The tradeoff between specificity and sensitivity also varied by the ED (Figures 6 and 7). The 1-day filter, while closely matching the nominal specificity, performed poorly in terms of maximizing the sensitivity to detect an outbreak signature. The 7-day filter, by smoothing over the outbreak signature ( Figure 5), maximized the sensitivity, but at a compromise to the actual specificity obtained (Figures 6  and 7). The linear and exponential filters provided a balance between the specificity and sensitivity. Detection using our covariate-based seasonal time series model performed well across all EDs compared with fixed-effects regression models or time series models that omitted seasonal terms and/or covariates (Figures 8 and 9). denotes the one-step ahead prediction for the SARIMA process and {b k,1 , ..., b k,m } are obtained from the solution of the set of m linear equations: The one-step prediction error process, {E k,t } is where we define c k,0 = 1 and c k,j = -b k,j for j = 1, ..., m. The process {E k,t } is a filtering of {X k,t } and so if {X k,t } is stationary, by the linear time invariant filtering result for stationary processes, {E k,t } is also stationary ( [18], Theorem 4.10.1). Moreover for large enough m, if {X k,t } is an invertible time series process then we can approximate {X k,t } by an autoregressive, AR(m) process ( [18], Theorem 4.4.3). Using this approximation we can argue that {E k,t } is approximately a mean zero Gaussian independent and identically distributed process with variance gamma γ X,k (0). Let {a 0 , ..., a L-1 } denote a pre-specified filter of width L. We define the detection process {D k,t } by filtering the error process {E k,t } using this filter. Thus, By the filtering result for stationary processes, stationarity of {X k,t } implies that {D k,t } is also a stationary process.
For large enough m, since {E k,t } is approximately a white noise process,{D k,t } is approximately a moving average process of order L -1 with coefficients θ 0 = 1, θ j = a j-1 /a 0 for j = 1, ..., L -1, and innovation variance .
We declare evidence of an outbreak signature (test positive), at time t if the observed detection value D k,t is larger than a given threshold. For a fixed value of α, let 1 -α denote the specificity (the probability of a true-negative). Similarly, the sensitivity is defined to be the probability of true positive. The threshold, τ k,α , is chosen by solving P (D k,t > τ k,α ) = α for τ k,α . Either we can estimate this value from the data using the 1 -α quantile of the {D k,t } process of non-outbreak-based training data, or if {X k,t } is Gaussian then for large m, where Z is a standard normal random variable. The solution for the threshold under this approximation is where Φ -1 (·) denotes the inverse cumulative distribution function for a standard normal random variable.
The filter-based detection method requires knowledge of the "true" values of the model parameters. In our work, we replace the parameters with their maximum likelihood estimates.