Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention

Background Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox’s proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher’s previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox’s model. Results The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox’s model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients’ future observation plan. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0508-3) contains supplementary material, which is available to authorized users.


Akaike information Criterion (AIC)
AIC was introduced by Akaike in 1977. It measures the quality of each candidate model. It is based on minimizing the Kullback Leibler distance which is the distance between the true and candidates' models. AIC takes into consideration the number of free parameters in the candidate's model and the goodness of their fit. The chosen model is the one that minimizes the AIC which is equivalent to the lowest distance to the true model 92 . It can be calculated using equation (1), where, L is the maximum likelihood of the model given the data and K is the number of parameters in a given model.

Bayesian information criterion (BIC)
BIC was introduced by Schwarz in 1978. BIC also measures the quality of each candidate model. The penalty term in BIC is greater than AIC. BIC takes into account the number of observations available for each model which is not the case for AIC.
Therefore, some researchers prefer it when they deal with models with small or different sample sizes. Again, the model which minimizes BIC is the one chosen 93 . It is calculated using equation (2), where; L is the maximum likelihood of the model given the data and K is the number of parameters in a given model, and n is the number of observations.

Least Absolute Shrinkage and Selection Operator (Lasso)
It was introduced by Robert Tibshirani in 1997 94 . It is a 1 L penalized estimation method that shrinks the regression coefficients estimates β of Cox regression model towards zero using a tuning parameter λ which gives a penalty on their absolute values. This leads to removing the irrelevant variables from the predictive model.
Shrinkage prevents over-fitting that may occur due to collinearity of the variables.
The β coefficients of the predictive model are fitted by maximizing penalized partial log likelihood (PPLL) for all data with an absolute value LASSO penalty λ on β using where,  is the censor indicator for patient i with variables x. λ ≥ 0 and 1  stands for 1 L norm. λ equal to zero means no shrinkage and infinity means infinity shrinkage.
Penalized R-software package was used for implementing LASSO. The tuning parameter was selected using likelihood cross validation optimization method.

Smoothly Clipped Absolute Deviation (SCAD)
It was proposed by Fan and Li 46 as a concave penalty which corresponds to quadratic spline function with ties at λ and aλ. . SCAD is continuous and differentiable on (−∞, 0) U (0,∞) but singular at 0 with its derivatives zero outside the range [−aλ, aλ]. This function fulfills three properties when estimating regression coefficients β. These properties are the un-biasedness, sparsity and continuity. SCAD ends up with small coefficients being set to zero, while other coefficients are being penalized towards zero and finally holding the large coefficients as they are. SCAD can be solved using

Classification Models and Evaluation Metrics
The evaluation metrics that were employed to test the performance of the final selected model are discussed below.
 Log Rank Test is a popular statistical test used in clinical trials to distinguish between survival probabilities of two groups of patients that were either treated with different treatment, or have different risks of an occurrence of an event. This test uses chi squared test; which is the difference between patients that the event was observed and those that are expected over the square root of the variance of the expected ones of each group. The result of this test is a coefficient called pvalue. A p-value smaller than 0.05 indicates that the two groups are significantly different, separable and discriminative 52 .
 Concordance Index (CI) is the most frequently used predictive discrimination metric in the field of survival analysis for accessing model performance. It deals with the censoring nature that is found in survival data. CI 86 is the probability that, given two randomly selected patients, at least one of them must have experienced the event of interest at shorter follow up time; this patient must have higher probability of the event occurrence than the other. The greater CI indicates better performance and the prediction is more concordant and discriminative. A value of 1 signifies a perfect discrimination and concordance, while 0.5 shows no discrimination between the risk groups in predictions. In the recent decades, the CI has become popular and used extensively especially in the field of biomedical

Simulation Study
A simulation study was performed beside the real EVAR dataset to demonstrate the effectiveness of the proposed feature selection algorithm. This study explains that a is formed by the coefficients of [2; 2; 5; 3; 3; 8] for the first six variables while the remaining has no influence on risk-time event. Survival T and censoring C times were given to each instance depending on the predictor values. T is generated from the exponential distribution exp(f(x)), and C is formed from the exponential distribution with parameter equals to 5. Afterwards, the survival data, {(ti = min(Ti,Ci), δi = I(Ti ≤ Ci))|i = 1,. . .,n} with approximately 80% of right censoring.

Results of the Simulation Study
A Monte Carlo Simulation was performed and the average of the results of the proposed after 50 iterations has shown the effectiveness of the algorithm. This is because as shown in table 1, the number of feature has been reduced from 27 to 5.33 which are correctly recovered. However, the number of features that was falsely recovered is 0.67. Moreover, the final model has AUC of 0.653 which is the greater than that of the full model's AUC of 0.429. Moreover, the p-value is enhanced from 0.05 to 0.009 which indicates that the predictions using the reduced model can be separated significantly. Lasso methods respectively. These p values mean that the two risk groups are successively separated and distinguished except for the SCAD model.