A GP model that used patient data of the first 4 hours after ICU admission, stored in a PDMS, was able to make accurate predictions on the probability of ICU discharge on the day after surgery, and was able to predict the day of discharge. The GP models were constructed in a development cohort, and tested in a previously unseen validation cohort. The GP model showed a significantly better discrimination than EuroSCORE and the ICU nurses, and was at least as discriminative as the ICU physicians. GP models were the best calibrated models, whereas the EuroSCORE, ICU nurses and physician's predictions showed overfitting.
In this era of computerized medical files, a large amount of patient information in the ICU has become available in an assessable format. Until now, only a few applications fully exploited the data-rich environment of intensive care, with its several information sources. Analyzing such large quantity of information, and using it for research purposes, remains a major challenge . Since ICU clinicians use the data for vital decision making every day, it can be assumed that PDMS databases are important sources of information on the condition of ICU patients. A clinical PDMS database contains validated laboratory data, as well as clinical observations which are observer dependent, validated and unvalidated raw monitoring data, with artifacts and missing values. Several methods exist for inputting missing data; each of them has specific drawbacks. Replacing missing numerical data by the population mean for that value, as was done in the present study, might lead to regression towards the mean. Nevertheless, when a different imputation method was used, such as replacing missings by values corresponding to a normal healthy condition, the results did not change significantly (data not shown). For categorical data, the value that corresponds to a normal healthy condition was used to replace a missing value, which might have introduced some additional bias. Time series analysis in the present study was done after applying several low pass filters on the signal that have removed all high frequency components. We have done this under the assumption that the trend of a time series is more predictive for outcome than high frequency variability, very similar to the way doctors look at continuous parameters.
Although not set up to perform predictions, our results show that a GP model derived from these data was able to predict ICU discharge. At the population level, calibration and accuracy of the second day discharge predictions was good, with aROC well above and Brier score well below the predefined thresholds. At the individual level, the GP models showed to be the only well calibrated models. This is extremely important in clinical practice, when using the models for patient counseling. The exact day of discharge could be predicted in 40% of the patients, and the GP model showed the lowest RMSRE. Figure 3 shows that the GP tended to overestimate the number of patients discharged on the day after surgery, and underestimated the LOS in the longer staying patients. The relatively lower number of longer staying patients in the development cohort, upon which the GP models were learned, explains the higher uncertainty when predicting discharge in these patients.
This is the first study describing the use of GP for the prediction of length of stay in the field of medicine and intensive care. Currently, outcome prediction in the cardiac ICU is based on scoring systems that were developed and validated in large multicentre patient databases, such as EuroSCORE for a cardiac surgery patients , or APACHE for ICU patients . These risk scores have been designed as benchmarking tools in order to compare different patient groups rather than for individual outcome prediction. APACHE IV uses data of the first 24 hours in ICU, and in cardiovascular surgery patients, there was no significant difference between mean predicted and observed ICU LOS, but aROC or calibration were not reported in this subgroup . When using the additive EuroSCORE for the prediction of prolonged LOS, aROC's were 0.76, 0.72 and 0.67 when predicting a LOS of > 7 days, > 5 days and > 2 days respectively . In the present study, the locally developed GP model outperformed the European gold standard risk stratification model, the EuroSCORE. In the classification task, this was demonstrated by the Brier score, which was unacceptably high in the case of EuroSCORE. In the regression task, EuroSCORE predicted a too long median LOS. It is well known that the EuroSCORE tends to overestimate the operative risk, especially in high risk patients. EuroSCORE was developed more than two decades ago, and in the case of mortality prediction, several authors have already proposed a recalibration [33, 34]. Because the GP model was built on the local ICU database, it lacks the generalizability of the severity of illness scoring systems. Nevertheless, any computerized ICU could use the same methodology to build customized predictive models. Furthermore, such models can be systematically recalibrated over time, by relearning the models on an updated development cohort with more recent patients.
The GP models predicted better than the ICU nurses in the classification and the regression task. Although there was a trend towards a better performance than the ICU physicians, this was not statistically significant. When comparing the GP model with ICU nurses and physicians, one should realize that the clinicians had a few major advantages over the computer model. First, we did not obtain predictions within 6 hours in all 499 validation patients. This way, we lost statistical power. On the other hand, not being able to predict within time might be regarded as an extra evaluation criterion, as opposed to the GP model which was able to generate a prediction in every patient. Second, the predictions by physicians and nurses might have been biased in a sense that they could have postponed their predictions in the more difficult to predict patients whereas the GP models have always delivered a prediction within the allotted time, regardless of the uncertainty. Third, nurses and ICU physicians had an advantage of up to 2 hours of data over the GP model. In practice, it turned out to be impossible to respond immediately to the first pop-up window 4 hours after admission. Fourth, nurses and physicians will always have more information at their disposal than is present in the computerized chart of the patient.
A locally derived risk prediction model will not replace or compete with the validated scoring systems with regards to generalizability and benchmarking. Nevertheless, these locally developed models, based on the analysis of the computerized patient chart with machine learning techniques, have the potential to support the clinician in the care for critically ill patients. First, they could be the basis of an early warning monitor, which alerts the clinician when a patient deviates from the expected path with regards to his outcome. Second, in the same sense, they could be of use for the clinician when counseling a patient or his relatives to offer a realistic estimate of the expected clinical course of a particular patient. Third, the ICU LOS of a patient following cardiac surgery has, besides its medical and clinical relevance, also importance on the management level. ICU bed capacity is in many hospitals the bottleneck when planning cardiac surgery. In order to make optimal use of the available capacity, and in order to avoid an excessive number of patients that will die while awaiting cardiac surgery, a locally derived predictive model could be the basis of an ICU capacity planner. GP can take full advantage of the different information sources available in ICU, both static and dynamic, including therapy and the response to therapy. Although the discriminative power of the GP models is not significantly higher than the physicians, they can be of added value because they will deliver their predictions in a more reliable and consistent way (not postponing the predictions in the most difficult cases as the physicians probably did in the present study). In theory, each ICU could build its own predictive models based on its own patient database, because this methodology is able to take into account the specific local situation, and can be adapted and recalibrated over time. The results from this study should first be confirmed in other centers, and preferably in databases from multiple centers.