BioSense is a system-of-systems, with data being received from federal partners, existing health department systems, hospital systems, and individual hospitals. In 2008, the primary data sources were 532 civilian hospital EDs, 333 DoD facilities, and 770 VA facilities. These data sources vary markedly in their geographic coverage, population coverage, demographics, and timeliness of reporting. Demographically, visits to BioSense EDs are reasonably representative of the U.S ED population [20, 21]; DoD visits, which are made by active duty personnel, retired persons, and family members, represent a younger population with a higher proportion of males; and VA visits represent an older and heavily male population. Since 2006, most of the growth in the system has been through receipt of data from existing state or local health department systems that provide ED chief complaint data. Time from patient visit to data receipt at CDC was shortest for civilian hospital chief complaint data, especially if the data was sent directly from the hospital rather than via a state or local system. A detailed comparison of ED data from BioSense with a nationally representative survey performed by the National Center for Health Statistics is underway [20, 21]. However, the 2 most common reasons for visit (abdominal pain and chest pain) are the same in both systems. We estimate that in 2008 BioSense captured chief complaints from about 14--15% of all US ED visits.
We have summarized visit rates for both the broader syndromes and narrower sub-syndromes. Because they capture more specific concepts, sub-syndromes may be more useful. For example, during a 3-day period of smoke exposure due to wildfires in San Diego in October 2007, ED chief complaint visits increased 22% above the previous 28 days for the respiratory syndrome; in comparison, increases were larger for 2 respiratory sub-syndromes, dyspnea (50%) and asthma (182%) [22]. Automated surveillance was originally geared to infectious diseases, and many of the sub-syndromes fall under infectious categories such as respiratory and gastrointestinal. However, we also include sub-syndromes related to injuries, chronic diseases, and a number of general concepts not fitting in any of the above categories (i.e., malaise and fatigue). The breadth of concepts that can be monitored, albeit in a simple manner, is a strength of this system.
At least 3 methods are commonly used to make syndrome assignments from chief complaint data. The Real-Time Outbreak Detection System (RODS), which uses 7 syndromes, employs a training dataset of ED records which have been assigned manually to syndromes by experts; this dataset then is used to train CoCo, a naïve Bayesian classifier, to make the free-text chief-complaint-to-syndrome assignment during production data analysis [15]. The Early Notification of Community-Based Epidemics (ESSENCE) system, which uses variable numbers of syndromes and sub-syndromes in different versions, first normalizes the text to remove punctuation and expand abbreviations and then assigns the normalized text to a concept using human-assigned weights [3]. Other systems, including the Early Aberration Reporting System (EARS) [14] and the New York City system [18], use a text-parsing method, similar to that used by BioSense, which scans for keywords, misspellings, and abbreviations and then makes the category assignments.
In comparing BioSense with other commonly-used automated surveillance systems such as EARS, ESSENCE, or RODS, it is useful to distinguish between the BioSense program, which supports a number of activities including collation of numerous data types from around the U.S., vs. the BioSense application, which displays data and analyses. Other systems do not have a function comparable to the BioSense program, but instead provide software tools for a locality to analyze its own data. In comparison with the BioSense application, other systems use different surveillance concepts (as noted above), different statistical methods for finding data anomalies, and different interfaces for displaying results. An evaluation of the sensitivity and specificity of statistical methods used in BioSense has been published [19]; this includes a comparison with methods used in EARS but not with those used in other systems. A study funded by BioSense found that, because of greater familiarity and engagement, epidemiologists generally preferred to use systems that they developed rather than the BioSense application [23]; these results are useful as plans, outlined in the Conclusion, are being made to revise the BioSense program.
The BioSense program includes a number of activities not otherwise described in this manuscript. The BioIntelligence Center, composed of several analysts, reviews the data daily and provides reports to state and local health departments and to the CDC Emergency Operations Center. A specialized Influenza Module [24, 25] summarizes data from 3 traditional sources supplied by the Influenza Division at CDC [26], and from 5 automated sources via BioSense. The sub-syndrome designated "influenza-like illness" captures free-text data that mention "flu" or "influenza" and ICD-9CM codes of 487; however, the Influenza Module uses the following combination of sub-syndromes for influenza surveillance: influenza-like illness or (fever and cough) or (fever and upper respiratory infection). Computer infrastructure installed by CDC is used to forward notifiable disease laboratory data to state health departments from 40 hospitals and 1 national laboratory. Plans to enable messaging from a second national laboratory are in progress. In addition, funding has also been provided to support original research, evaluation of syndromic surveillance and BioSense [23], and awards to the Centers of Excellence in Public Health Informatics [27].
Evaluation of the validity of the data received by automated systems is difficult, but a number of points support the validity of BioSense data. When new data sources are added, a one-week sample of data are scanned by technical personnel for data quality problems and adherence to data dictionary standards; thereafter, if the typical number of incoming records changes, corrective action is taken. As part of a BioSense-funded cooperative agreement, a validation of >9,000 records from two North Carolina hospitals showed >99% agreement with data received by CDC [28]. Biologically plausible increases in visits have been found for asthma associated with a wildfire [22], falls associated with winter weather [29], and burns associated with Independence Day [30]. Seasonal trends in influenza-like illness [25], heat injury [31], asthma [32], and gastrointestinal disease [33] follow expected patterns. Finally, a number of 1-day increases in visits at single hospitals have been linked via newspaper reports or personal communications to known incidents [34]; on three occasions, such increases were due to artificial records introduced into hospital ED systems during preparedness drills (unpublished information).
Limitations of this study include representation of only a convenience sample of civilian hospitals and an inability to perform detailed comparisons of visits at individual facilities with data received by the system. The rates per 1000 total visits presented represent proportional morbidity at facilities providing data rather than population-based incidence. Caution should be used in comparing these rates among the data sources because of differences in the numbers of facilities reporting, varying numbers of chief complaints or diagnoses provided per visit, and the use of different reference tables used to assign chief complaints vs. diagnoses to syndromes and sub-syndromes. There is potential misclassification because of limitations of patient-reported chief complaints, which are subjective, and diagnosis codes, which have well-recognized limitations. Additionally, the same patient may contribute >1 visit on different days, and follow-up visits for recheck may be particularly high in systems such as the DoD where patients do not have to pay for such visits. While the same visit may be classified as showing >1 disease indicator, counts from these 2 categories are analyzed separately and not added together. BioSense, like other automated systems, can monitor seasonal influenza activity [24, 25] and recognize large increases in visits for some general surveillance concepts; however, a more substantial contribution to public health practice awaits the ability to access data that is more specific than chief complaints and diagnoses. Nevertheless, to our knowledge this report presents the largest collation of automated surveillance data yet assembled.
During its first 6 years of operation, the system has had a number of problems, most of which have been corrected or will be addressed in the near future. First, certain key variables, e.g., diagnosis priority at hospital facilities, and clinic type and patient identifier at DoD facilities, either were not available or were inadvertently excluded. During 2003--2006, the application displayed sentinel alerts based on ICD-9CM coded diagnoses from the VA and DoD systems, in some cases due to miscodes at the facility such as "plaque" being coded as "plague." Current processes avoid this problem. During 2005--2006, program insistence on receipt of data directly from hospitals was expensive and created resentment among some state and local health departments; the current approach emphasizing access to data through existing state or local systems or Health Information Exchanges is more fruitful. A number of problems have been identified in chief complaints and diagnosis mapping tables, e.g., "sore throat" being assigned to the localized cutaneous lesion syndrome because of the word "sore." The BioSense application has expanded functionality but lacks key features such as the ability to perform free-text searches or to create custom syndromes using terms such as "fever and cough." The capability to share datasets with health departments and research partners is hampered by data use agreements as well as technical issues. Finally, procedures for data receipt, warehousing, and pre-processing have not been flexible enough to allow revisions to be made quickly.