Predicting out of intensive care unit cardiopulmonary arrest or death using electronic medical record data

Background Accurate, timely and automated identification of patients at high risk for severe clinical deterioration using readily available clinical information in the electronic medical record (EMR) could inform health systems to target scarce resources and save lives. Methods We identified 7,466 patients admitted to a large, public, urban academic hospital between May 2009 and March 2010. An automated clinical prediction model for out of intensive care unit (ICU) cardiopulmonary arrest and unexpected death was created in the derivation sample (50% randomly selected from total cohort) using multivariable logistic regression. The automated model was then validated in the remaining 50% from the total cohort (validation sample). The primary outcome was a composite of resuscitation events, and death (RED). RED included cardiopulmonary arrest, acute respiratory compromise and unexpected death. Predictors were measured using data from the previous 24 hours. Candidate variables included vital signs, laboratory data, physician orders, medications, floor assignment, and the Modified Early Warning Score (MEWS), among other treatment variables. Results RED rates were 1.2% of patient-days for the total cohort. Fourteen variables were independent predictors of RED and included age, oxygenation, diastolic blood pressure, arterial blood gas and laboratory values, emergent orders, and assignment to a high risk floor. The automated model had excellent discrimination (c-statistic=0.85) and calibration and was more sensitive (51.6% and 42.2%) and specific (94.3% and 91.3%) than the MEWS alone. The automated model predicted RED 15.9 hours before they occurred and earlier than Rapid Response Team (RRT) activation (5.7 hours prior to an event, p=0.003) Conclusion An automated model harnessing EMR data offers great potential for identifying RED and was superior to both a prior risk model and the human judgment-driven RRT.


Background
Out of intensive care unit (ICU) cardiac arrests and unexpected deaths are common despite evidence that patients often show signs of clinical deterioration hours in advance [1][2][3][4]. This has prompted national organizations to recommend the implementation of rapid response teams (RRTs) as a strategy to prevent hospital deaths [5]. Such recommendations were made despite conflicting evidence regarding the benefits of RRTs [3,[6][7][8][9][10]. Some have speculated that the indeterminate benefit of RRTs is due to insufficiently predictive activation criteria and poor response time by clinical staff [11]. Early warning systems have been developed to identify deteriorating patients using readily available clinical information [12]. However, these early warning systems may not be adequate because they 1) require monitoring and activation by often overburdened clinical staff, 2) fail to systematically monitor all patients, and 3) demonstrate only modest accuracy identifying which patients are at risk of out of ICU cardiopulmonary arrest and death. Early warning systems that are timely, accurate, automated, and comprehensive in their surveillance are needed.
The increasing use of electronic medical records (EMR) in health care makes the use of computerized prediction models possible. These models could represent powerful avenues to identify patients at high risk of adverse events [13,14]. Though a few studies have examined the accuracy of clinical automation to identify patients at risk of clinical deterioration, they retain limited utility since they do not fully harness the EMR, produce no actionable alerts, define primary outcomes differently, and do not allow for monitoring patients in real time [15,16].
This study sought to 1) derive and validate an automated prediction model based on near real-time EMR data to identify patients at high risk of out of ICU resuscitation events and death (RED), 2) compare the test operating characteristics of the new automated model to the previously published Modified Early Warning Score (MEWS) [12] and human judgmentactivated institutional RRT, and 3) determine if the automated model detected RED events sooner than the human judgment activated RRT.

Setting and patient population
The automated prediction model was constructed using data from adult patients admitted to Parkland Hospital, a large urban academic hospital in Dallas, TX, between May 18, 2009 and March 31, 2010. Patients were included in the study if they were admitted to the internal medicine ward from either the emergency department (ED) or outpatient clinics. Additionally, patients were included if they were admitted to the ICU from the ED. Patients were excluded if they were directly admitted to the surgical floor or obstetrics or had a do not resuscitate (DNR) order at admission. However, any hospital patient-days prior to a patient consenting to a DNR order were included. To determine if early collection of data was predictive of events, all variables included in the automated model were obtained from the previous calendar day defined as time period between 12:00 AM and 11:59 PM. Therefore, events that occurred on the first day of each hospitalization were excluded. We also excluded any data within one hour of an event to make sure the model did not include factors that were early signs of resuscitation care. Patient-days that occurred after an event were excluded. The research protocol was approved by The University of Texas Southwestern Institutional Review Board (IRB) which concluded that the research presented no more than minimal risk of harm to subjects. Therefore, the IRB waived the need for informed consent.

Outcome variables
The primary outcome variable was defined as resuscitation events or death (RED). Resuscitation events were defined as out of ICU hospital codes and unplanned transfers to the ICU. Hospital codes included cardiopulmonary arrests (CPA) and acute respiratory compromise (ARC) events, regardless of location, except those that occur in the ICU for ICU length of stays >24 hours. CPA was defined as an event in which chest compressions and/ or defibrillation are delivered, and an ARC event was defined as an event requiring emergency assisted ventilation [17]. These events were identified electronically through the hospital's internal registry which is structured on the American Heart Association's Get With The Guidelines -Resuscitation national registry, formally known as the National Registry of Cardiopulmonary Resuscitation [17]. This registry collects data on inhospital resuscitation events from hospitals across the United States to provide feedback on an institution's resuscitation practices and patient outcomes. Unplanned ICU transfers included any transfers from the internal medicine ward or ED to a medical or cardiac ICU requiring an ICU length of stay >24 hours. We used unplanned ICU transfer in the definition of a RED event because these patients were in critical condition and would have a high likelihood of CPA or death had the transfer not occurred. There are no elective admissions to the ICU at this institution. Unexpected death was defined as: 1) an inhospital death that occurred on the medical ward; or 2) death that occurred in patients transferred to a medical or cardiac ICU team with an ICU length of stay <24 hours. Patient death and transfers to the medical or cardiac ICU were identified electronically in the hospital's EMR. The date and time of bedside RRT activation was extracted from the hospital's systematic log of all RRT calls. Data used to predict the primary outcome were extracted from the previous calendar day.

Predictor variables
We developed a conceptual model of RED events based on a comprehensive review of the literature and expert clinical opinion. Candidate predictor variables for the automated model were those extractable from the hospital EMR (EPIC Systems Corporation, Verona, WI). Data from the previous 24 hours calendar day were used to determine the daily risk. Potential predictor variables included the most abnormal laboratory value or vital sign in the 24 hours period between 12:00 AM to 11:59 PM on each hospital day. We also examined other possible indicators of impending RED events such as STAT physician orders and medications. Medications of interest were those thought to increase risk of serious adverse events according to the Institute for Safe Medication Practices (ISMP). The MEWS is a previously published risk score based on the number and degree of vital sign and level of consciousness (LOC) abnormalities (Additional file 1: Appendix A) [12]. We determined LOC using a text-processing algorithm to read the free text in nursing notes. Finally, we postulated that patients who were more ill or unstable in subtle, hard-to-measure ways could be preferentially admitted to certain non-ICU medical floors, so we classified medicine wards accounting for the top 15% of RED as "high risk floors."

Derivation and validation of the automated prediction model
The automated model was constructed in stages. First, the total cohort was randomly split into derivation (50%) and validation (50%) subsamples. We constructed the final model using the derivation cohort. Second, recursive partitioning was used to identify significant cutpoints in continuous candidate variables that were associated with an increased rate of RED events. Third, candidate predictors of RED events were identified using univariate logistic regression. Continuous variables were examined for nonlinear effects by testing the contributions of spline functions and variable transformations. Fourth, candidate variables significant at p ≤ 0.20 were entered into a multivariate logistic regression model. Final model variables were selected on the basis of conceptual and statistical significance (p ≤0.05). The unit of analysis was in patient-days.
The model based on the derivation dataset was validated by comparing its performance in the validation sample. Model discrimination was assessed with the cstatistic and calibration with the Hosmer-Lemeshow goodness-of-fit test [18]. Using cut-points determined by the derivation subsample, five risk categories were created on quintiles of predicted risk and graphically assessed in the validation sample. To account for within patient correlation, we used robust variance-covariance matrix estimators for computing standard errors for model coefficients.
Prior to model development, we assessed all variables for missing values. For categorical and continuous variables with less than 2% missing data, a missing category was created, and the event rate was compared with and pooled into the most appropriate reference group. For categorical and continuous variables that had greater than 2% missing data and were not measured from one day to the next, a "never measured" category was created and risk was compared to the other categories or cut-points and pooled into the appropriate reference group. Documentation by exception is a common approach in the predictive model literature [13,14,19,20].
We determined relative contribution of each predictor to RED events by examining the marginal increase in the model chi-square accounted for by each predictor as it was added and removed from the final automated model [21,22].
Comparing performance of the automated model to the MEWS Patients were classified to be at risk of RED events at a probability threshold of 4% as determined by the automated model. Since the baseline risk for RED events was assumed to be 1%, we considered a four times greater than average risk an important threshold for concern. Variables used to calculate the MEWS were obtained in the previous calendar day between 12:00 AM to 11:59 PM. If a patient experienced a RED event, data from the previous calendar day and those up to one hour prior to the event were used to calculate the MEWS. A MEWS of ≥5 was considered the critical threshold based on the literature [12]. Sensitivity, specificity, positive predictive value, and negative predictive value were determined for both the automated model and the MEWS. The test operating characteristics of the automated model and the MEWS were compared using the c-statistic. Confidence intervals were constructed for the c-statistics at the 95% level [18].

Comparing performance of the automated model to the institutional RRT
The institutional RRT is deployed when one or more of the following is present in a patient: 1) heart rate <40 or >130 beats/min, 2) systolic blood pressure <90 mmHg, 3) respiratory rate <8 or >30 breaths/min, 4) partial pressure of oxygen <88% on room air, 5) oxygen requirement >50%, and 6) acute change in mental status. We calculated the sensitivity, specificity, positive predictive value and negative predictive value, along with 95% confidence intervals, for both the automated model and the institutional RRT. Moreover, we evaluated a subgroup of patients that experienced an event who activated the institutional RRT and had a predicted probability of a RED event of 4% by the automated model (model activation). In this subgroup, we aimed to determine the difference in time between model activation and RRT deployment. We also evaluated the time difference between the automated trigger of a RED event (patient's predicted probability of a RED event exceeds 4%) and RRT deployment, regardless of an event. Our hypothesis was that the automated model would detect a patient who had a RED event well in advance of the institutional RRT. We compared this time difference using a paired Student's t-test. Analyses were conducted using STATA statistical software (version 10.0; STATA Corp, College Station, TX) and RTREE [23].

Patient characteristics
A total of 7,466 hospitalized patients accounted for 46,974 patient-days. The derivation and validation cohorts were evenly matched across demographic, clinical, provider orders, administered medications and summary variables (Table 1). Mean age was 51.2 in the derivation cohort and 51.4 in the validation cohort, and 56.1% and 54% were male, respectively.

Performance of the automated model
The final automated model had good discrimination in both the derivation and validation dataset with a c-statistic of 0.87 (95% CI 0.85-0.89) and 0.85 (95% CI 0.82 -0.87), and was well-calibrated (Hosmer Lemeshow test p=0.12). It also stratified patients across a wide spectrum of risk from 0.14% in the lowest quintile to 4.3% in the highest one ( Figure 1). The principal influencing variables in the automated model as assessed by the uniquely attributable chi-square were high risk floor assignment (37.9%) followed by the MEWS (25.5%), demographics, laboratory and vital signs (18.2%), and physician orders (18.4%).    There were a total of 17 patients who were at risk of RED events by the automated model, where the institutional RRT was deployed and experienced a RED event. The automated model predicted an event 15.9 (±7.7) hours before the actual event occurred compared to the RRT which was called a mean of 8.4 (±8.5) hours prior the actual event (p=0.003). Overall, the automated model also determined a patient to be at risk 5.7 hours (95% CI 3.1-8.3) earlier than the RRT was called for all types of RED events.

Discussion
We developed and validated a novel, automated model using the EMR for predicting RED events in patients admitted to the hospital. From a statistical perspective, the automated model had excellent discrimination, was well-calibrated, and had outstanding specificity (94.3%) and good sensitivity (51.6%). The automated model also had better discrimination, sensitivity and specificity than the previously published MEWS. From a practical standpoint, the model identified patients destined to have RED event on average 16 hours (or more than one nursing shift) before they actually experienced a major clinical event. Further, the automated model was able to accurately predict RED events using information obtained from the previous 24 hours. Together with its ability to screen all patients systematically and automatically, low false positive rate, and advance notice, the automated model appears to provide both accurate and actionable intelligence.
Since the growing standard of care is to use the RRTs to meet this goal, we were particularly interested in the more practical comparison of the new model to the human or manually activated RRT approach used in our hospital. Overall, the automated model had twice the sensitivity of the RRT (51.6% v. 25.8%), demonstrating that computerized surveillance is likely to identify more patients at risk for major adverse events compared to providers' clinical judgment. The automated model achieved this much higher sensitivity with only a small trade-off in specificity (94.3% v. 98.8%). Perhaps of greatest importance from a patient safety viewpoint, the automated model flagged patients 5.7 hours sooner than the RRT. Accurately identifying patients earlier in of the course of physiological deterioration should be expected to yield greater opportunity for rescue.
The superior performance of the new model likely came from the richer source of information available in the EMR which is unavailable to simpler vital sign based models. In addition, monitoring physician orders for ECG, ABG or other STAT orders appears to be an important predictive measure, perhaps reflecting a physician's escalating  concern about a patient's stability. Novel variables, such as high risk floor assignment, may be a proxy for nurse staffing ratios, physician team composition, or other unknown system or process-related factors that are associated with increased acuity or risk. We were somewhat surprised that none of the medication variables were included in the final model, despite looking at many candidate predictors. This result may be due to the administration of antidote medicines that occur late in the process of clinical deterioration. The risk of causing RED events due to use of high risk medicines may be mediated through their effect on vital sign and laboratory abnormalities and partly depend on a patient's underlying hepatic and renal physiological reserve. There is a need to explore more complex drug interactions and their association with adverse events. The 1.3% prevalence in this study is similar to that seen in other studies [3,6]. The performance of the MEWS in this study was also consistent with prior reports (c-statistic=0.75), confirming its moderate predictive capabilities [12,15]. Our institution had an RRT call rate similar to those observed elsewhere [24].
Several limitations are worth noting. First, we used retrospective data from a single urban health system to derive and validate our model. While the rate of RED events and RRT calls in this sample is similar to other studies, the generalizability of this model to other patient populations and health systems is unknown and merits further investigation [3]. Second, the derivation and validation of the novel model was done retrospectively, so the next step would be prospective validation ideally in more than one setting. Third, and even more importantly, the ultimate value of the automated model will depend on whether it can realistically be used in real-time and if flagging patients at high risk will change clinical management, improves patient outcomes and/or reduces human surveillance burden. While we hypothesize that earlier warning and proper identification of patients at risk will decrease RED events, this has yet to be shown. Fourth, although the automated model achieves a c-statistic of 0.85, there is a moderate false positive rate. However, given the severity of RED events, we accept the false positive rate in exchange for greater model sensitivity. More work is necessary to prevent the activation of overburdened clinical staff to false alerts. Fifth, there may be some difficulty generalizing "high risk floors", although, institutions can determine the rate of RED for each floor and establish which areas comprise the top 15% of events. Finally, our model uses data derived from a comprehensive EMR, so it may only be useful in such settings. However, the deployment of integrated EMRs in hospitals has been accelerating greatly due to recent federal investments in health information technology and is expected to continue over the next 5 to 10 years [25][26][27]. While our model has robust predictive capabilities, we believe employing additional technologies such as natural language processing may further improve prediction. Another area of promise involves more sophisticated adverse drug event detection software to further classify risk and improve prediction of poor hospital outcomes.

Conclusion
One in 100 hospitalized medical patients experienced RED events, among the most serious of all adverse patient safety outcomes. The novel, EMR-based model we developed was better at predicting these serious adverse events compared to prior risk models and the human judgment based RRT approach. While formal prospective implementation and evaluation of such a computerized RED event risk detection strategy is needed in the form of a controlled trial, this automated prediction model could be a powerful tool in the effort to reduce out of ICU CPA, unplanned transfers to the ICU, and death. Models such as ours may foreshadow higher level "meaningful use" of EMRs to improve inpatient outcomes.