A comparative analysis of multi-level computer-assisted decision making systems for traumatic injuries

Ji, Soo-Yeon; Smith, Rebecca; Huynh, Toan; Najarian, Kayvan

doi:10.1186/1472-6947-9-2

Research article
Open access
Published: 14 January 2009

A comparative analysis of multi-level computer-assisted decision making systems for traumatic injuries

Soo-Yeon Ji¹,
Rebecca Smith¹,
Toan Huynh² &
…
Kayvan Najarian¹

BMC Medical Informatics and Decision Making volume 9, Article number: 2 (2009) Cite this article

6695 Accesses
27 Citations
Metrics details

Abstract

Background

This paper focuses on the creation of a predictive computer-assisted decision making system for traumatic injury using machine learning algorithms. Trauma experts must make several difficult decisions based on a large number of patient attributes, usually in a short period of time. The aim is to compare the existing machine learning methods available for medical informatics, and develop reliable, rule-based computer-assisted decision-making systems that provide recommendations for the course of treatment for new patients, based on previously seen cases in trauma databases. Datasets of traumatic brain injury (TBI) patients are used to train and test the decision making algorithm. The work is also applicable to patients with traumatic pelvic injuries.

Methods

Decision-making rules are created by processing patterns discovered in the datasets, using machine learning techniques. More specifically, CART and C4.5 are used, as they provide grammatical expressions of knowledge extracted by applying logical operations to the available features. The resulting rule sets are tested against other machine learning methods, including AdaBoost and SVM. The rule creation algorithm is applied to multiple datasets, both with and without prior filtering to discover significant variables. This filtering is performed via logistic regression prior to the rule discovery process.

Results

For survival prediction using all variables, CART outperformed the other machine learning methods. When using only significant variables, neural networks performed best. A reliable rule-base was generated using combined C4.5/CART. The average predictive rule performance was 82% when using all variables, and approximately 84% when using significant variables only. The average performance of the combined C4.5 and CART system using significant variables was 89.7% in predicting the exact outcome (home or rehabilitation), and 93.1% in predicting the ICU length of stay for airlifted TBI patients.

Conclusion

This study creates an efficient computer-aided rule-based system that can be employed in decision making in TBI cases. The rule-bases apply methods that combine CART and C4.5 with logistic regression to improve rule performance and quality. For final outcome prediction for TBI cases, the resulting rule-bases outperform systems that utilize all available variables.

Peer Review reports

Background

According to a 2001 National Vital Statistics Report [1], nearly 115,200 deaths occur each year due to traumatic injury, and many patients who survive suffer life-long disabilities. Among all causes of death and permanent disability, traumatic brain injury (TBI) is the most prevalent. Of the 29,000 children who are hospitalized each year with TBI, a significant percentage will suffer from neurological impairment [2]. It has also been reported that the traumatic brain injuries are the most expensive affliction in the United States, with an estimated cost of $224 billion [3].

Computer-aided systems can significantly improve trauma decision making and resource allocation. Since trauma injuries have specific causes, all with established methods of treatment, fatal complications and long-term disabilities can be reduced by making less subjective and more accurate decisions in trauma units [4]. In addition, it has been suggested that an inclusive trauma system with an emphasis on computer-aided resource utilization and decision making may significantly reduce the cost of trauma care [1].

Since the treatment of traumatic brain injuries is extremely time-sensitive, optimal and prompt decisions during the course of treatment can increase the likelihood of patient survival [5, 6]. It is also believed that the predicted length of stay in the ICU is an important factor when deciding on the patient transport method (i.e. ambulance or helicopter), as more critical patients are expected to spend more time in the ICU, and these stand to benefit the most from helicopter transport. Studies have emphasized the critical impact of helicopter transport on trauma mortality rates, since the speed of ambulance transport is limited by road and weather conditions, and may also be constrained by traffic congestion. However, it is difficult to compare ground and helicopter transportation and the corresponding care provided to the patients [7]. Cunningham [8] attempts a comparison based on the outcome of the treatment given to trauma patients. Based on his study, patients in critical condition are more likely to survive if transported via helicopter. However, the high cost of helicopter transport remains a major problem [9, 10]. In recent studies, Gearhart evaluated the cost-effectiveness of helicopter for trauma patients and suggested that on average the helicopter transport cost is about $2,214 per patient, and $15,883 for each additional survivor [11]. Eventually, the cost is almost $61,000 per surviving trauma patient. Eckstein [12] states that 33% of patients who are transported by helicopter are discharged home from the emergency department [12], rather than being sent to ICU. This indicates that a significant number of trauma patients transported by helicopter actually have relatively minor injuries. This emphasizes the necessity of a comprehensive transport policy based on patient condition and predicted outcome.

Several computer-assisted systems already exist for decision-making in trauma medicine. The majority of these systems [13, 14] are designed to perform a statistical survey of similar cases in trauma databases, based only on patient demographics. As such, they may not be sufficiently accurate and/or specific for practical implementation. Other medical decision making systems employ the predictive capabilities of artificial neural networks [15–17]; however, due to the 'black box' nature of these systems, the reasoning behind the predictions and recommended decisions is obscured. Currently, none of these existing systems are in widespread use in trauma centers. There are three main reasons: the use of non-transparent methods, such as neural networks; the lack of a comprehensive database integrating all relevant available patient information for specific prediction processes; and poor performance due to the exclusion of relevant attributes and the inclusion of those irrelevant to the current task, resulting in rules that are too complicated to be clinically meaningful.

Several machine learning algorithms are commonly applied to medical applications. These include support vector machines (SVM), and decision tree algorithms such as Classification and Regression Trees (CART) and C4.5. Boosting is also employed for improving classification accuracy. However, despite the relatively successful performance of these algorithms in medical applications, they have limited success in separating and identifying important variables in applications where there are a large number of available attributes. This suggests that combining machine learning with a method to identify the most uncorrelated set of attributes can increase our understanding of the patterns in medical data and thus create more reliable rules. The literature of biomedical informatics reinforces the benefits of this approach. Andrews et al. [18] use decision tree (DT) and logistic regression (LR) methods to identify the commonalities and differences in medical database variables. Kuhnert [19] emphasises that non-parametric methods, such as CART and multivariate adaptive regression splines, can provide more informative models. Signorini et al. [20] design a simple model containing variables such as age and GCS, but the small number of attributes may limit the reliability of the generated rules. Guo [21] finds that CART is more effective when combined with the logistic model, and Hasford [22] compares CART and logistic regression, and finds that CART is more successful in outcome prediction than logistic regression alone.

Therefore, a possible approach to create accurate and reliable rules for decision making is to combine machine learning and statistical techniques [23, 24]. This paper analyzes the performance of several combinations of machine learning algorithms and logistic regression, specifically in the extraction of significant variables and the generation of reliable predictions. Though a transparent rule-based system is preferable, other methods (such as neural networks) are also tested in the interest of comparision. A computational model is developed to predict final outcome (home or rehab and alive or dead) and ICU length of stay. In addition, we identify the factors and attributes that most affect decision making in the treatment of traumatic injury.

Our hypotheses are as follows:

1. We hypothesize that a rule-based system, attractive to physicians as the reasoning behind the rules is transparent and easy to understand, can be as accurate as "black-box" methods such as neural networks and SVM.

2. We hypothesize that when trained correctly, a computer-aided decision making system can provide clinically useful rules with a high degree of accuracy.

3. Studies mentioned earlier have examined which variables are most significant in the recommendation/prediction making process. We hypothesize that airway status, age, and pre-existing conditions such as myocardial infarction and coagulopathy are significant variables.

Methods

Rules are created by processing patterns discovered in the traumatic brain injury (TBI) datasets. More specifically, they are generated by analyzing the logical and grammatical relationships among the input features and the resulting outcomes. Rules are formally defined as grammatical expressions of knowledge extracted using specific logical operations on the available features [6].

CART and C4.5 are among the most popular algorithms for creating reliable rules, but they are limited in their ability to identify the most significant variables. We therefore perform statistical analysis using logistic regression, which is typically effective in discovering statistically significant regression coefficients [24]. Although stepwise regression is designed to find significant variables, it may not perform well with CART when dealing with small scale datasets [25]. Therefore, in this paper, logistic regression with direct maximum likelihood estimation (Direct MLE) is used.

Dataset

Three different datasets are used in the study: on-site, off-site, and helicopter. The on-site dataset contains data captured at the site of the accident; the off-site dataset is formed at the hospital after patients are admitted; and the helicopter dataset consists of the records for patients who are transported to hospital by helicopter. The on and off-site datasets are used to predict patient survival (dead/alive) and final outcome (home/rehab), and the helicopter dataset is used to predict ICU length of stay, which is a measure used in estimating the need for helicopter transportation. The datasets are provided to us by the Carolinas Healthcare System (CHS) and the National Trauma Data Bank (NTDB).

On-site dataset

When making decisions based on the variables available at the accident scene, one has to consider the unavailability of important factors such as pre-existing conditions (comorbidities). Decisions must therefore be made without knowledge of these factors. Some physiological measurements are also excluded because they are only collected after arrival at the hospital. Table 1 presents the variables collected for this dataset, which consist of four categorical and six numerical attributes.

Table 1 On-site dataset

A comparative analysis of multi-level computer-assisted decision making systems for traumatic injuries

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Dataset

On-site dataset

Off-site dataset

Helicopter dataset

Learning algorithms

Classification and Regression Tree (CART)

C4.5

Adaptive Boost (AdaBoost)

Support Vector Machine (SVM)

Neural networks

Pre-processing

Rule performance metrics

Improving rule quality

Constructing reliable rules

Results

Significant variable selection

Measuring performance

Constructed database using CART and C4.5

Discussion

Conclusion

References

Pre-publication history

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords