Prognostic evaluation of the patients without CT scan findings may have limited applicability considering the availability of CT scans in the majority of the trauma centers. Nevertheless, the idea of prognostic models based on initial clinical data at admission is still worth trying and seems to be of practical value in some situations. Although omission of CT data weakens our armamentarium significantly, patients without paraclinical and imaging data make the real trauma scenes at which medical staff should have an evaluation. Actually as neurosurgeons we have initial clinical judgments based on our growing experiences and in many situations they prove to be true. We can expect computers do similar things and help us. In fact, vague situations like what we see in primary evaluation of head trauma patients are where ANN may prove to be superior to traditional linear modeling. This is one of the reasons that although many paraclinical and imaging factors are known to be of significant predictive value in the outcome of the head trauma patients, this study only used clinical measures which are simply available to a physician in the emergency department.
Study limitations
Calculating ISS and AIS are not simple tasks and needs training. This can reduce the practicability of the results. The pupillary size and reactivity are one of the clinical signs with prognostic value that were not considered in the study due to defect of the main database in this regard.
As is seen the mortality rate and the percentage of patients with GCS < 8 are the same (7.5%). This is a coincidence, but emphasizes the need for further studies in the populations with different rates of outcome.
The mean GCS was 13.5 ± 3. The standard deviation is over the top score of GCS (15) and means that the GCSs were skewed towards higher levels of consciousness.
Not considering the exact time interval between head trauma and admission which is simply available for the medical staff is one of the weak points of this study.
The lower intubation rate in this population study related to the distribution of the GCS may show the lower quality of the pre-hospital care prevailing in the hospitals selected for this study. This may render the reproduction of the results using the same network (Downloadable from http://www.tums.ac.ir/download) somewhat difficult in other situations where pre-hospital care services are more advanced.
Using ISS as an independent variable underlines the role of general trauma in the models used for this study. This was unavoidable, because the original dataset had been a subset of general trauma patients.
Comparison of two models
Currently, the logistic regression and the artificial neural networks are the most widely used models in biomedicine, as measured by the number of publications indexed in Pubmed as attested by 45646 cases for the logistic regression and 8015 for the neural network.
Logistic regression is a commonly accepted statistical tool, which can generate excellent models. Its popularity may be attributed to the interpretability of model parameters and ease of use, although it has limitations. For example, logistic regression models use linear combinations of variables and, therefore, are not adept at modeling grossly nonlinear complex interactions as has been demonstrated in biologic and complex epidemiologic systems.
Not withstanding its limitations, neural networks are appealing for a number of reasons, namely; they seem to "learn" without supervision, they can be created by workers with very little mathematical model building experience, and software for building neural networks is now readily available. Neural networks have perhaps a special appeal to the medical community because of their superficial resemblance to the human brain (a structure with which most physicians are comfortable), and seem to promise "prediction" without the difficulties associated with use of mathematics.
ANNs are rich and flexible nonlinear systems that show robust performance in dealing with noisy or incomplete data and have the ability to generalize from the input data. They may be better suited than other modeling systems to predict outcomes when the relationships between the variables are complex, multidimensional, and nonlinear as found in complex biological systems.
The difficulty in developing models using artificial neural networks is that there are no set methods for constructing the architecture of the network. The most common type of artificial neural networks is the feed-forward back propagation multiperceptron (used in this study).
Another limitation of neural network models is that standardized coefficients and odds ratios corresponding to each variable cannot be easily calculated and presented as they are in regression models. Neural network analysis generates weights, which are difficult to interpret as they are affected by the program used to generate them [26]. This lack of interpretability at the level of individual variables (predictors) is one of the most criticized features in neural network models [27].
Several early applications of neural networks in medicine reported an excellent fit of the ANN model to a given set of data. The impressive results usually were derived from over fitted models, where too many free parameters were allowed. Linear and logistic regression models have less potential for overfitting primarily because the range of functions they can model is limited.
Neural network models require sophisticated software. The complexity and unfamiliarity of ANN has been a major drawback of this technique so far. However, as palmtop computing becomes increasingly powerful and popular, the complexity of ANNs may become less onerous in real-time clinical settings. [4]
Furthermore, there are some theoretical advantages comparing a predictive ANN model over conventional models such as logistic regression.
One such advantage is that ANN model allows the inclusion of a large number of variables [28]. Another advantage of the neural network approach is that there are not many assumptions (such as normality) that need to be verified before the models can be constructed.
Although, one of the strengths of ANNs is their ability to still find patterns despite missing data, in this study a dataset with no related missing values was used.
Recently the task of comparison between these two models has been addressed from different points of view. Considering the publication bias, several published works in the medical literature have demonstrated the success of the ANN approaches. In a review carried out by Sargent on 28 major studies, ANN outperformed regression in 10 cases (36%), was outperformed by regression in 4 cases (14%) and the 2 methods had similar performance in the remaining cases. Sargent concluded that both methods should continue to be used and explored in a complementary manner. [29]
Gaudart et al. using simulated data, have compared the performance of ANN and linear regression models for epidemiological data and concluded that both had comparable performance and robustness and despite the flexibility of connectionist models (like ANN), their predictions were stable. [30]
This study was primarily designed to compare the performance of an ANN and a multivariable logistic regression analysis with the goal of developing a model for predicting the outcome in head injury and for studying their internal validity (reproducibility). Also setting up a standalone practical model for prediction of mortality in the head trauma patients was a secondary goal of this study.
Using freely downloadable software (PDP++) and making the networks and scripts accessible by the researchers can be perceived as an advantage of this study.
This study showed that ANN models significantly outperformed logistic models in both senses of discrimination and calibration, although lagged behind in accuracy. It is pointed out that the calibration values should be treated with some caution in this study, since according to Hosmer and Lemeshow, [31] in describing the statistic, HL statistic should be used where at least one of the predictor variables is continuous.
This study clearly shows that in a single comparison of these models based on the same data there is 22.2% chance of getting discrimination results contrary to our findings in majority of comparisons. This ratio is 43.6% for calibration and 32% for the accuracy results. These figures are practically important and imply that any single comparison between these two models cannot reliably represent their final performance.
Although considerable efforts, through many trial-and-errors, were made to optimize the design of the network, the designed ANN models could, and should, be further improved. In line with any other predictive models, likewise the findings of this study need to be externally validated. The networks are downloadable and the results can be studied in other study populations with divergent data and different survival ratios.
So far, there is no single algorithm that performs better than all other algorithms for any set of given head injury data. To this end, there is room for much more work to be done before a definite conclusion can be reached.
Potential clinical use
Should the results be reproducible in other populations, using a simple preprogrammed calculator (or other programmable computing devices) and minimal training of the personnel, this model and similar ones may emerge to be of considerable practical value in triage of the patients. At that time a dedicated instead of general purpose ANN software should be designed for this purpose.
The authors concur with the conclusion arrived by Tu [25] that logistic regression remains the clear choice when the primary goal of model development is to examine possible causal relationships among variables. However, it appears that ANNs or some form of hybrid technique incorporating the best features of both logistic regression and neural network models might lead to development of optimum prediction models for head injured patients.