Estimation of hospital emergency room data using otc pharmaceutical sales and least mean square filters
© Najmi and Magruder; licensee BioMed Central Ltd. 2004
Received: 30 October 2003
Accepted: 15 March 2004
Published: 15 March 2004
Surveillance of Over-the-Counter pharmaceutical (OTC) sales as a potential early indicator of developing public health conditions, in particular in cases of interest to Bioterrorism, has been suggested in the literature. The data streams of interest are quite non-stationary and we address this problem from the viewpoint of linear adaptive filter theory: the clinical data is the primary channel which is to be estimated from the OTC data that form the reference channels.
The OTC data are grouped into a few categories and we estimate the clinical data using each individual category, as well as using a multichannel filter that encompasses all the OTC categories. The estimation (in the least mean square sense) is performed using an FIR (Finite Impulse Response) filter and the normalized LMS algorithm.
We show all estimation results and present a table of effectiveness of each OTC category, as well as the effectiveness of the combined filtering operation. Individual group results clearly show the effectiveness of each particular group in estimating the clinical hospital data and serve as a guide as to which groups have sustained correlations with the clinical data.
Our results indicate that Multichannle adaptive FIR least squares filtering is a viable means of estimating public health conditions from OTC sales, and provide quantitative measures of time dependent correlations between the clinical data and the OTC data channels.
Surveillance of Over-the-Counter (OTC) pharmaceutical sales as a potential early indicator of developing public health conditions has been suggested in the literature . OTC sales offer several advantages as possible early indicators of public health. They are, first of all, very widely used. According to a recent health survey , 77% of the U. S. population said they have used non-prescription medications to treat a health condition at least once in a 6-month period. This compares to 43% who said they consulted a physician in the same time period, and 38% who said they used prescription medications.
A second advantage of OTCs is that reliable and detailed electronic records are made at the time of sale. These records are aggregated regionally for commercial purposes. The only additional burden for health surveillance purposes is to communicate this data to the appropriate public health organizations. The OTC data contain significant information, e.g. sales volume of each of several hundred possible products, and the precise location of the store where they are sold.
A third possible advantage, which has not been so well established, is the timeliness of OTC sales relative to other observable events that might occur when the public health is threatened.
The purpose of this article is to present evidence that when judiciously grouped, the OTC data show time-dependent correlations with clinical data, and that the latter can be reconstructed from the former using a linear filter.
JHU/APL is currently collecting large quantities of daily OTC sales data. We receive sales records of 622 different products under the general category of cold remedies from a single vendor, with similar numbers from other vendors. Many of these products are used to treat very similar conditions. As a starting point to analyze the data, we made use of product groups that had been defined subjectively by a local expert in the domain of pharmacoepidemiology. The groupings are based on a product's presumed use, and are further divided into child and adult medication. The product groups are summarized in Table 1 (see Additional file 1). We aggregated the sales of individual products within each group to form a time series of daily sales of product packages. No attempt was made to apply different weights to different products, for example by the total dosage contained in a package. Whether such a weighting scheme would be useful remains an open question. These product groups are further divided into children's medications and adult medication.
Product sales from some of these product groups are known to be good indicators of the corresponding clinical data . For instance, chest rub sales are highly correlated with the count of physician diagnosis of acute bronchitis or acute bronchiolitis .
Least mean square (LMS) filtering
If we denote the clinical data time series by y[n], and the OTC reference time series channels x j [n], where the index n denotes the day number and the index j denotes the OTC product group, then the today's and past days' OTC data are used to estimate today's clinical data, in the sense that the estimated quantity is , and it is to be compared directly to the actual value of the clinical data today. This is referred to as the "filtering" problem, as distinct from the following two problems . The "prediction" problem attempts to estimate future values of the clinical data using today's and past days' values of the OTC channels, i.e. the predicted quantity is , k > 0, which is then to be compared to the actual value of the clinical data on day number n + k. The "smoothing" problem, of no interest to us in this application, is to compute past values of the clinical data using today's and past values of the OTC data, and so the "smoothed" value is , k > 0, which is to be compared to the actual value of the clinical data on day number n - k.
It should be clear from the above description that a useful public health surveillance system could be quite interested in the "prediction" problem. Clearly, cross correlations between the OTC data channels, if they exist in sufficient amounts, could be used to "predict" the clinical data from OTC channels. Similarly, auto correlations of the clinical data itself, if they exist in sufficient amounts, could be used to (self) "predict" the clinical data. If both types of predictions can be made reasonably successfully, then one could compare them and/or combine them in order to maximize the "prediction" abilities of some component of a useful surveillance system. This latter problem is particularly difficult and in order to motivate a serious attempt at it, we have set ourselves the simpler task of studying the "filtering" problem. This way we can use the adaptive filter technique to show "time dependent" correlations between the clinical and OTC channels, that are caused by the non stationary nature of both data sets.
The requirement of minimum mean-squared error can be most easily implemented using the Widrow LMS algorithm . A related adaptive filter that we have used in this paper is the normalized LMS algorithm that avoids some of the difficulties with the choice with the adaptation parameter. This is the solution to the following constrained optimization problem. Given the primary channel data p[n] and the reference channels' data z j [n] the optimal filter coefficients h j [n + 1] are found by minimizing the magnitude of the difference ||h[n + 1] - h[n]|| subject to the constraint . We show in the Appendix (see Additional file 2) that the adaptive filter must satisfy the following equations:
where 0 < μ < 2 and 0 <a << 1.
Results and discussion
As is apparent from the figure 4 we have been quite successful at reconstructing the emergency room data from all 10 product groups, except the large peak at day 20 that has not been estimated well. This clearly shows that the latter event is not correlated with any of the OTC data streams at the same time. Individual group results clearly show the effectiveness of each particular group in estimating the hospital data and serve as a guide as to which groups are more likely to produce better prediction of the hospital data, in the sense described in the Methods section above.
The perceived value of OTC sales as a data source for syndromic surveillance would be greatly enhanced if it could be shown that OTC sales provided an earlier indicator for health problems than could be obtained from clinical data. The results presented in this article indicate that sales of over-the-counter flu remedies were well correlated with physician diagnoses of acute respiratory conditions throughout the National Capital Area (and slightly beyond) in the 2001–2002 winter cold season and can reproduce that data with rather small error. These results tend to strengthen the hypothesis that some OTC product sales might be used as an early indicator of a general class of human disease known as acute respiratory condition, if we are successful in extending the estimation algorithms presented here to one for predicting the same results with non zero positive time lags.
over the counter (medications).
This research is sponsored by the Defense Advanced Research Projects Agency and managed under Naval Sea Systems Command (NAVSEA) contract N00024-98-D-8124. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, NAVSEA, or the United States Government.
We would like to thank Sheri Happel-Lewis who provided the OTC cold-product subgroupings used in this study.
- Goldenberg A, Shmueli G, Caruana RA, Fienberg SE: Early statistical detection of Anthrax outbreaks by tracking over-the-counter medication sales. PNAS. 2002, 99: 5237-5240. 10.1073/pnas.042117499.View ArticlePubMedPubMed CentralGoogle Scholar
- Self-care in the New Millennium, report by Roper Starch Worldwide, Inc., prepared for the Consumer Healthcare Products Association, Roper-Starch. 2001
- Magruder S: Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of public health. Johns Hopkins University Applied Physics Laboratory Technical Digest. 2003, 24: (to appear)Google Scholar
- Diagnostic Coding Essentials. 2001, Ingenix Publishing Group, Salt Lake City, Utah
- Lombardo J, Burkom H, Elbert E, Magruder S, Happel Lewis S, Loschen W, Sari J, Sniegoski C, Wojcik R, Pavlin J: A Systems Overview of the Electronic Surveillance System for the Early Notification of Community Based Epidemics (ESSENCE II). Journal of Urban Health. 2003, 80: i32-i42.PubMedPubMed CentralGoogle Scholar
- Moon T, Stirling W: Mathematical methods and Algorithms for Signal Processing. 2000, Prentice HallGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/4/5/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.