Evaluation of an automated safety surveillance system using risk adjusted sequential probability ratio testing

Background Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery. Methods A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year. Results We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80). Conclusions The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.


Background
Public reporting of risk adjusted mortality rates following cardiac surgery has become an important tool in the evaluation and improvement of quality of patient care [1]. Several states, including Massachusetts, have enacted legislation requiring public reporting of cardiac surgical outcome data [2]. Beginning in 2002, all Massachusetts hospitals performing cardiac surgery were required to submit cardiac surgical outcomes data to the Massachusetts Data Analysis Center (Mass-DAC), a data coordinating center for the Massachusetts Department of Public Health (MA DPH), and to the Society of Thoracic Surgeons (STS) using the STS National Cardiac Database collection tool. Since then, Mass-DAC has published annual public reports on 30-day, allcause, risk-adjusted mortality rates for isolated coronary artery bypass surgery (CABG) for institutions, and beginning in 2004, for individual cardiac surgeons [3].
Though public reports are intended to provide transparency and public accountability, and to inform consumer choice, there are other consequences to public reporting that have potentially significant long-term, financial and reputational impact on both the institutions and providers. In Massachusetts, two centers were identified as statistically significant high mortality outliers, one of which was identified as such in multiple, consecutive years [4,5]. As a result, this cardiac surgery program was temporarily suspended while quality improvement initiatives were undertaken [6]. Notification of the first year of outlier status for this program was not available publicly until 2 to 2.75 years after data collection (relative to calendar quarter) [5,6].
There are inherent delays between performance of procedures and public reporting of outcomes due to the rigorous data review and manual case adjudication required from both a regulatory and data quality standpoint. Massachusetts has steadily decreased the delay between data collection and public reporting since the program's inception in 2002, currently reduced for fiscal year 2008 to 1.25 to 2 years (relative to calendar quarter) [7]. However, any temporal delays between the performance of the procedure and the analysis and public reporting of results is undesirable and could expose some patients to increased risk of morbidity and mortality as procedures continue to be performed while a quality issue exists. An active, automated prospective surveillance system, as an adjunct to the existing rigorous regulatory approach, could provide institutions or physicians timely internal feedback and provide opportunities to mitigate these risks well in advance of public release of the data.
We designed the Data Extraction and Longitudinal Trend Analysis (DELTA) system to provide real-time monitoring of clinical data to support continuous quality or safety monitoring of newly approved medical devices, medications, or therapeutic interventions [8]. This system supports a variety of frequentist and Bayesian statistical methods, which can be configured to provide unadjusted and risk-adjusted safety monitoring among prospective and retrospective cohorts [8][9][10][11]. DELTA also incorporates de-identification and encryption algorithms to guard protected health information and control data ownership, and employs flexible alerting mechanisms to trigger notifications via e-mail or through the web interface when an observed event rate exceeds boundaries of risk-adjusted expectations for the event of interest.
Risk-adjusted SPRT, a method for observational cohort safety surveillance, was first proposed by Spiegelhalter and colleagues [12]. This method has been used to evaluate hospital and physician performance among retrospective cohorts for coronary artery bypass patients and percutaneous coronary intervention patients [12][13][14]. While this method has not achieved widespread use, it is well suited for implementation in a prospective, automated system because it analyzes each sequential case and incorporates adjustment for repeated measures on the same subject with explicit type I and II error rates.
In this study, we sought to assess the utility of Risk-Adjusted Sequential Probability Ratio testing when imbedded in an automated surveillance tool as compared to the gold standard of retrospective annual quality reports used by the Massachusetts Department of Public Health (MA-DPH). The primary outcomes of this analysis were the sensitivity and specificity of the automated implementation as compared with the public reporting methods of Massachusetts Data Analysis Center (MASS-DAC) and the MA-DPH, assessed for hospital 30-day mortality after isolated coronary artery bypass graft surgery.

Study Setting
Massachusetts regulations require all acute care nonfederal hospitals that provide cardiac surgery to collect data using a standardized data collection instrument based on the Society of Thoracic Surgery (STS) registry [15]. Each institution is required to submit data on a quarterly basis to Mass-DAC, and participating centers collect the data using a variety of point-of-care collection tools, chart review, and patient follow-up. Mass-DAC performs manual adjudication of all cases with adverse outcomes as well as a sample of all other case records. Yearly reports of hospital and surgeon 30-day mortality performance are published. Additional information and annual public reports are available online [3].
A total of 23,020 isolated adult coronary artery bypass surgery admissions were conducted from January 1, 2002 to September 30, 2007. The surgeries did not involve valve replacement or other associated cardiac surgical procedures. We selected these surgeries for our study because the state uses them as the primary index of institution and surgeon quality for cardiovascular surgery. In 2006, Mass-DAC changed reporting from a calendar year basis to a fiscal year basis that runs from October 1 through September 30. Consequently, the 2006 fiscal year analysis included the last three months of the 2005 calendar year. The primary patient outcome of the registry is the 30-day all-cause mortality after isolated coronary artery bypass graft surgery. We focused on 30-day all-cause hospital-specific risk-standardized mortality rates. The current study was approved by the Brigham & Women's and Harvard Medical School's Institutional Review Boards.

Gold Standard Statistical Analysis
Mass-DAC reports the data annually utilizing Bayesian hierarchical logistic regression [1]. The model assumes that the log-odds of mortality is linearly related to a set of patient risk factors and permits baseline risk to vary across hospitals through the inclusion of a hospital-specific intercept. Estimation of the model parameters, including the between-hospital variance, hospital overall mean log-odds, and regression coefficients of patientlevel risk factors are obtained via Markov chain Monte Carlo (MCMC) methods. The MCMC method uses the Gibbs sampler to sequentially sample from probability distributions and produces a Markov chain with the joint posterior density as its stationary distribution [16]. This is accomplished by selecting a set of starting values, then performing a number of "burn-in" sampling iterations that are not recorded, followed by the collection and averaging of additional sampling iterations to form the final posterior estimates. The primary analysis used all of the data for the year and declared a hospital as an outlier if the lower limit of the 95% posterior interval of the risk-adjusted institutional standardized mortality rate exceeded the unadjusted statewide mortality rate. Because of the small number of cardiac surgery hospitals in Massachusetts, Mass-DAC also performs cross-validation analyses in which each hospital is eliminated and data from the remaining hospitals are used to assess hospital performance in the eliminated hospital. This strategy was developed to avoid one large center from having too great an influence on statewide risk expectations. A hospital was considered an outlier if either the 95% posterior interval from the statewide comparison exceeded the unadjusted statewide mortality rate or the posterior predictive p-value from the cross-validation analysis was 0.01 or smaller.
Hospital 9 was declared an outlier in the 2002-2005 reports, and hospital 8 was declared an outlier in the 2004 report. All of hospital outliers were detected with only the cross-validation evaluation by MASS-DAC. Although the original 2002 public report did not include the cross-validation analysis method, we have exactly reproduced this analysis protocol on this data to provide consistency across all years. A summary of the risk-adjusted standardized mortality incidence rate (with upper and lower limits of the 95% interval) by hospital and year are reported in Figure 1. The dotted line in the figure represents the statewide unadjusted mortality incidence rate, and values in red indicated an outlying hospital.

Automated Risk Adjusted Sequential Probability Ratio Testing (RA-SPRT)
The SPRT control chart methodology detects unacceptable event rates by evaluating each unit of analysis sequentially in time [12,13]. The risk hypothesis is whether the observed outcome event rate exceeds the accepted or baseline event rate given a specific odds ratio (OR) and Type I and II error [12]. This method accepts or rejects this hypothesis after each sequential case is evaluated. Risk adjustment is performed through the use of a risk prediction model whereby the cumulative log likelihood ratio is adjusted by the probability of the outcome [17]. Repeated measurement (reanalysis after each additional case) error adjustments are incorporated explicitly in the framework. These features are uncommon in statistical process control methods and are strengths of this method.
The following describes the calculations necessary to construct risk-adjusted SPRT control charts as refined by Rogers and colleagues [13] based on Spiegelhalter's work [12]. The control limits are defined by and where h 0 and h 1 are the cumulative log-likelihood ratio values in which the null hypothesis (H 0 ) or the alternate hypothesis (H 1 ) is accepted (respectively), OR is the odds ratio, α is the type I error rate, and β is the type II error rate.
The cumulative log-likelihood ratio value (T cum ) is calculated in sequence for higher risk detection (OR > 1) with where T 0 cum = 0, O i is the observed outcome (0 or 1) for a binary procedure for i th case, and where p i is the calculated probability of the outcome for the i th case as determined by the risk prediction model.
We have previously described an automated real-time safety monitoring tool, Data Extraction and Longitudinal Trend Analysis (DELTA), that is able to perform larger numbers of concurrent prospective analyses using a variety of statistical methodologies and alerting thresholds [18]. The system uses a SQL 2005 server (Microsoft Corp., Redmond, WA) to provide internal data storage and configuration information, as well as providing the capability to integrate with external databases. The user interface was developed in the Microsoft. NET programming environment and was displayed in a web browser from a Microsoft IIS 6.0 Web Server (Microsoft Corp., Redmond, WA). Security of patient data is further addressed by record de-identification steps and user login access restrictions [19].
The RA-SPRT method was implemented directly within DELTA, and the statistical module evaluated the data after setting cohort inclusion and exclusion criteria as well as the necessary statistical parameters, such as the desired odds ratio and the type I and II error.
Parameters and risk variable selection for the logistic regression risk adjustment were then passed through a bi-directional interface to SAS (Version 9.1, Cary, NC) in order to develop the required logistic regression models.

Statistical Analysis
The risk-adjusted sequential probability ratio testing (RA-SPRT) method was used to evaluate the data separately for each calendar or fiscal year. Although one of the strengths of this method is that it can accumulate data continuously until the alerting odds ratio hypothesis is accepted or rejected, analyses were terminated at the end of each calendar or fiscal year and the cumulative log-likelihood ratio was reset to 0. This was done in order to be directly comparable to the gold standard. Risk adjustment was performed using standard logistic regression with the same risk factors used in the source method by Mass-DAC. Each logistic regression model was developed from data in the prior 11 months and then applied sequentially to each case in the "current" month. This process was repeated throughout the entire range of the data. Data were not available prior to 2002, so the models developed prior to December 2002 used from one to ten months of data (depending on the analysis month). Data from January 2002 were not analyzed by RA-SPRT because they were required to build the first model. It should be noted that the risk models developed from the first two months of data showed regression coefficient instability and poor calibration, as would be expected with low sample sizes. Both of these measurements subsequently stabilized for the remainder of the data. A type I error of 0.05 and type II error level of 0.10 were used in each of the RA-SPRT analyses, and an OR threshold of 2.0 was defined as the reasonable thresholds for concern regarding the clinical quality of the institution evaluated. An outlier was declared if a hospital exceeded the log likelihood ratio threshold at any point during that calendar year.  Figure 2C). The remaining evaluations for each hospital were true negatives (for example, Hospital 6 shown in Figure 2D). This resulted in a sensitivity of RA-SPRT of 100% (5/5) and the specificity was 98.8% (79/80) compared to the publicly available reports.

Discussion
The automated RA-SPRT method within DELTA performed well applied to a statewide clinical registry data over a number of years when compared with the method used by the state to produce the public reporting of institutional and physician performance reports. The method detected each of the true outliers and generated a single false positive, which is a desirable profile for an early detection system used for knowledge discovery and internal quality improvement initiatives. Due to the frequent periodic analysis that the method employs, the DELTA system would have reported these findings to the local institutions significantly earlier than they were available publicly.
In general, an automated outcomes surveillance system, whether used for post-marketing surveillance or institutional and physician profiling, should be tuned with appropriate error levels and alerting thresholds to be over-sensitive in a manner similar to a screening test, where it is highly desirable to capture all of the true signals and tolerance to false positive signals is dependent on the resources that are available to perform root cause analyses and further data exploration. The RA-SPRT method did perform in this manner in this data set using stock values for types I and II error rates, and a common odds ratio threshold, which is encouraging, but does require further validations in other clinical domains to increase the generalizability of these findings. Selection of a desirable false positive versus false negative threshold depends on the clinical domain's desire to avoid missing a true signal and the cost of performing more in depth analysis of each detected signal. In addition, processing time was very reasonable within the automated application, and a quarter of data was able to be analyzed in seconds, with the most time consuming step being that of logistic regression model generation in each month analyzed.
The RA-SPRT method has a number of strengths, including explicit alerting boundary thresholds and incorporation of repeated testing [12][13][14]. It should be noted that the RA-SPRT method uses a simple hypothesis rather than a composite hypothesis that tests all odds ratios greater than 1.0 for statistical significance. This could potentially result in a statistically significant odds ratio less than the selected threshold (such as 1.1 or 1.2) that is not detected as an outlier by the method. However, in many cases, such an alert may not be clinically significant. In this study, we chose an odds ratio threshold that would be clearly clinically significant.
There are a number of limitations to this work. It should be noted that the sensitivity and specificity are optimistic in that implementation of this system in a prospective setting would eliminate the time costly manual data adjudication and outcome chart review steps that are done by Mass-DAC, at the expense of the accuracy of the data analyzed. Another limitation in the use of this system is the timing of data submission by the sites. Currently, data are submitted quarterly, rather than monthly (as used in this analysis), by MA cardiac institutions. The RA-SPRT method is intended for sequential case level analysis and performing the analysis in larger time segments results in over-correction of the repeated measurement adjustment. Real-time data surveillance would be ideal, but requires a regular and relatively clean data stream submission from the source institution. In order to evaluate this potential and evaluate the impact of lack of data adjudication, we are currently conducting percutaneous coronary angiography prospective surveillance in a subset of institutions in the state with the necessary infrastructure for near real-time submission.

Conclusions
The RA-SPRT method implemented within an automated surveillance system was able to detect institutional outliers in a statewide clinical registry. While either a significant electronic health record infrastructure or a state reporting mechanism is required to realize the full utility of this system for outcome profiling, it could result in significant time savings in providing early warnings to local institutions and physicians. Authors' contributions MEM participated in the conception and design of the study, analysis and interpretation of the data, drafting of the manuscript, and critical revision of the manuscript. STN participated in the conception and design of the study, acquisition of the data, analysis and interpretation of the data, and critical revision of the manuscript. TPG, DMD, NLB, and SD participated in the conception and design of the study and critical revision of the manuscript. VDV participated in analysis and interpretation of the data and critical revision of the manuscript. FSR participated in the conception and design of the study, analysis and interpretation of the data, and critical revision of the manuscript. All authors read and approved the final manuscript.