Skip to main content

Table 3 Details of methods applied to the analysis in eligible studies

From: Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

References Internal validation Evaluation of model fit/performance Handling of missing values
Alaa et al. [23] Training set corresponding to 90% of the sample; ten-fold cross validation Area under the receiver operating characteristic curve and 95% confidence intervals (Wilson score intervals), Brier score Retained as an informative variable, and only variables missing for 85% or more of participants were excluded
Anderson et al. [48] Split the data based on site of care Bayesian Information Criterion, prediction model ensembles, ß estimates, predicted probabilities, and area under the receiver operating characteristic curve estimates Missing covariate values were included in models as a discrete category
Azimi et al. [38] 2:1:1 ratio to generate training, testing, and validation cohorts Receiver-operating characteristic curves, positive predictive value, negative predictive value, area under the curve from the receiver operating curve analysis, Homer-Lemeshow statistic Cases with missing outcome data were excluded
Bannister et al. [22] Derivation set of approximately 66.67% and a validation set of approximately 33.33% as noted in text (abstract states 70:30 split) Akaike Information Criterion Single imputation, followed by multiple imputation in the final model to evaluate differences in model parameters
Baxter et al. [24] Leave-one-out cross-validation (LOOCV) approach, also known as the jackknife method Area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, the Youden index Not mentioned
Bowman [39] Re-analysis of models using clinic data during a different time period Area under the receiver operating characteristic curve Not mentioned
Bertsimas et al. [21] 60/40 split Positive predictive value, area under the curve, accuracy Imputed using an optimal-impute algorithm
Dong et al. [25] 9/1 random split, ten-fold cross validation Accuracy, precision, F1 score, true negative rate, area under the receiver operating characteristic curve Missing values were filled based on the mean value of data from the same attribute of the same patient. If a patient had fewer than 3 records, imputed the mean value of that attribute from all patients
Hearn et al. [40] 100-iteration Monte Carlo cross validation, 75/25 split, five-fold cross validation Mean area under the receiver operating characteristic curve, true- and false-positive rates, true- and false-negative rates, positive and negative predictive values Variables with > 10% of values missing were discarded from the analysis, whereas the remaining missing values were filled in using multiple imputation by chained random forests (maximum number of iterations = 5, number of trees = 10). The sole exception to the 10% cutoff was heart rate recovery, which featured 32% missing values but was kept in the data set and imputed with the above procedure because of its wide usage in prognostication from cardiopulmonary exercise test
Hertroijs et al. [45] Five-fold cross validation Akaike Information Criterion, Bayesian Information Criterion, Lo-Mendel-Rubin likelihood ratio test The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model, but patients with missing covariate values at baseline were excluded from the validation of the model
Hill et al. [26] 2/1 split Area under the receiver operating characteristic curve, positive predictive value, potential number reeded-to-screen Imputed with last-observation-carried-forward
Hische et al. [20] Ten-fold cross validation Models with a mean sensitivity above 90% after cross-validation were selected, specificity, positive predictive value, negative predictive value Not mentioned
Isma'eel et al. [41] For myocardial perfusion imaging: 120 out the 479 patients who tested positive were added randomly to the derivation cohort; 120 of the remaining 4875 patients who tested negative were also added randomly to the derivation cohort. The remaining 5114 patients were all added to the validation cohort. For coronary artery disease: the derivation cohort was randomly chosen as follows: 93 out of the 278 patients who tested positive were added randomly to the derivation cohort and 93 out of the remaining 5076 patients who tested negative were also added randomly to the derivation cohort. The remaining 5169 patients are all added to the validation cohort Sensitivity and specificity, discriminatory power and 95% confidence interval, number of tests avoided, negative predictive value, positive predictive value Not mentioned
Isma'eel et al. [42] The derivation cohort was randomly chosen 30 out of the 59 patients who tested positive were added randomly to the derivation cohort, and 30 out of the remaining 427 patients who tested negative were also added randomly to the derivation cohort. The remaining 426 patients (29 positive, 397 negative) were all added to the testing cohort; during the training phase, the 60 patients that are used for training were split 80% for pure training and 20% for validation Negative and positive predictive values, descriminatory power, percentage of avoided tests, sensitivity and specificity Not mentioned
Jovanovic et al. [43] The sample was randomly divided into 3 parts: training, testing, and validation sample Area under the receiver operating curve, sensitivity, specificity, and positive and negative predictive values Not mentioned
Kang et al. [27] Four-fold cross validation, 75/25 split (training/validation) Area under the receiver operating curve, accuracy, precision, recall Not mentioned
Karhade et al. [28] tenfold cross validation Discrimination (c-statistic or area under the receiver operating curve), calibration (calibration slope, calibration intercept), and overall performance (Brier score) Multiple imputation with the missForest methodology was undertaken for variables with less than 30% missing data
Kebede et al. [29] 10% cross validation, 90/10 split (training/testing) Area under the receiver operating curve; classification accuracy-true positive, false positive, precision, recall If information is incomplete, un-readable or their manual record is lost, patients were excluded from the study
Khanji et al. [47] Ten-fold cross validation Akaike Information Criterion, area under the receiver operating curve Excluded patients with missing data at the end of the study (± 6 months)
Kim et al. [30] 70/30 split (training/testing) Area under the receiver operating curve, sensitivity, specificity, precision, accuracy Not mentioned
Kwon et al. [32] Derivation set (June 2010-July 2016) and validation set (Aug 2016–2017) split by date Receiver operating characteristic curve, the area under the precision–recall curve, net reclassification index, sensitivity, positive predictive value, negative predictive value, net reclassification index, F-measure Not mentioned
Kwon et al. [31] Split into derivation and validation datasets according to the year. The derivation data was the patient data for 2012–2015, and the validation data was the patient data for 2016 Area under the receiver operating characteristic curve Excluded patients with missing values
Lopez-de-Andres et al. [34] Random 60/20/20 split, where the third group was selected for model selection purposes prior to validation Area under the receiver operating characteristic curve, accuracy rate, error rate, sensitivity, specificity, precision, positive likelihood, negative likelihood, F1 score, false positive rate, false negative rate, false discovery rate, positive predictive value, negative predictive value, Matthews correlation, informedness, markedness Not mentioned
Mubeen et al. [19] Ten-fold cross validation Accuracy, sensitivity, specificity based on prediction of out-of-bag samples, area under the receiver operating characteristic curve Not mentioned
Neefjes et al. [18] Five-fold cross validation Area under the receiver operating characteristic curve Not mentioned
Ng et al. [36] 60/40 split, five-fold cross validation Area under the curve, sensitivity, specificity, accuracy Excluded patients with missing data
Oviedo et al. [46] 80/20 split (training/testing) Matthews correlation coefficient Removed records from the model with missing data
Pei et al. [17] 70/30 split (training/testing) True positives, true negatives, false negatives, false positives, area under the receiver operating characteristic curve Removed patients from the model with missing data
Perez-Gandia et al. [37] Three subjects (each with two daily profiles) were used for training, the remaining patients were used for validation Accuracy Spline techniques were used to impute missing data in the training set
Ramezankhani et al. [16] 70/30 split (training/validation) Sensitivity, specificity, positive predictive value, negative predictive value, geometric mean, F-measure, area under the curve Single imputation was used, for continuous variables- CART, for categorical variables- weighted K-Nearest Neighbor approach
Rau et al. [35] 70/30 split, ten-fold cross validation Sensitivity, specificity, area under the curve Not mentioned
Scheer et al. [33] 70/30 split (training/testing) Accuracy, area under the receiver operating characteristic curve, predictor importance Missing values within the database were imputed using standard techniques such as mean and median imputation
Toussi et al. [15] Ten-fold cross validation Precision (proportion of true positive records to the proportion of true positive and false positive records) imputed using a model-based approach
Zhou et al. [44] Training cohort: 142 patients were part of a prior phase III randomized controlled trial (Oct 2013–Mar 2016) + 182 eligible consecutive patients (Jan 2012–Dec 2016). Validation cohort: 61 eligible consecutive patients (Jan 2017–Aug 2017) The concordance index (c-index) was calculated as the area under the receiver operating characteristic curve, calibration plot Patients were excluded if clinical data were missing before or 30 days after percutaneous transhepatic biliary drainage
\