Skip to main content

Table 3 Details of methods applied to the analysis in eligible studies

From: Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

References

Internal validation

Evaluation of model fit/performance

Handling of missing values

Alaa et al. [23]

Training set corresponding to 90% of the sample; ten-fold cross validation

Area under the receiver operating characteristic curve and 95% confidence intervals (Wilson score intervals), Brier score

Retained as an informative variable, and only variables missing for 85% or more of participants were excluded

Anderson et al. [48]

Split the data based on site of care

Bayesian Information Criterion, prediction model ensembles, ß estimates, predicted probabilities, and area under the receiver operating characteristic curve estimates

Missing covariate values were included in models as a discrete category

Azimi et al. [38]

2:1:1 ratio to generate training, testing, and validation cohorts

Receiver-operating characteristic curves, positive predictive value, negative predictive value, area under the curve from the receiver operating curve analysis, Homer-Lemeshow statistic

Cases with missing outcome data were excluded

Bannister et al. [22]

Derivation set of approximately 66.67% and a validation set of approximately 33.33% as noted in text (abstract states 70:30 split)

Akaike Information Criterion

Single imputation, followed by multiple imputation in the final model to evaluate differences in model parameters

Baxter et al. [24]

Leave-one-out cross-validation (LOOCV) approach, also known as the jackknife method

Area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, the Youden index

Not mentioned

Bowman [39]

Re-analysis of models using clinic data during a different time period

Area under the receiver operating characteristic curve

Not mentioned

Bertsimas et al. [21]

60/40 split

Positive predictive value, area under the curve, accuracy

Imputed using an optimal-impute algorithm

Dong et al. [25]

9/1 random split, ten-fold cross validation

Accuracy, precision, F1 score, true negative rate, area under the receiver operating characteristic curve

Missing values were filled based on the mean value of data from the same attribute of the same patient. If a patient had fewer than 3 records, imputed the mean value of that attribute from all patients

Hearn et al. [40]

100-iteration Monte Carlo cross validation, 75/25 split, five-fold cross validation

Mean area under the receiver operating characteristic curve, true- and false-positive rates, true- and false-negative rates, positive and negative predictive values

Variables with > 10% of values missing were discarded from the analysis, whereas the remaining missing values were filled in using multiple imputation by chained random forests (maximum number of iterations = 5, number of trees = 10). The sole exception to the 10% cutoff was heart rate recovery, which featured 32% missing values but was kept in the data set and imputed with the above procedure because of its wide usage in prognostication from cardiopulmonary exercise test

Hertroijs et al. [45]

Five-fold cross validation

Akaike Information Criterion, Bayesian Information Criterion, Lo-Mendel-Rubin likelihood ratio test

The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model, but patients with missing covariate values at baseline were excluded from the validation of the model

Hill et al. [26]

2/1 split

Area under the receiver operating characteristic curve, positive predictive value, potential number reeded-to-screen

Imputed with last-observation-carried-forward

Hische et al. [20]

Ten-fold cross validation

Models with a mean sensitivity above 90% after cross-validation were selected, specificity, positive predictive value, negative predictive value

Not mentioned

Isma'eel et al. [41]

For myocardial perfusion imaging: 120 out the 479 patients who tested positive were added randomly to the derivation cohort; 120 of the remaining 4875 patients who tested negative were also added randomly to the derivation cohort. The remaining 5114 patients were all added to the validation cohort. For coronary artery disease: the derivation cohort was randomly chosen as follows: 93 out of the 278 patients who tested positive were added randomly to the derivation cohort and 93 out of the remaining 5076 patients who tested negative were also added randomly to the derivation cohort. The remaining 5169 patients are all added to the validation cohort

Sensitivity and specificity, discriminatory power and 95% confidence interval, number of tests avoided, negative predictive value, positive predictive value

Not mentioned

Isma'eel et al. [42]

The derivation cohort was randomly chosen 30 out of the 59 patients who tested positive were added randomly to the derivation cohort, and 30 out of the remaining 427 patients who tested negative were also added randomly to the derivation cohort. The remaining 426 patients (29 positive, 397 negative) were all added to the testing cohort; during the training phase, the 60 patients that are used for training were split 80% for pure training and 20% for validation

Negative and positive predictive values, descriminatory power, percentage of avoided tests, sensitivity and specificity

Not mentioned

Jovanovic et al. [43]

The sample was randomly divided into 3 parts: training, testing, and validation sample

Area under the receiver operating curve, sensitivity, specificity, and positive and negative predictive values

Not mentioned

Kang et al. [27]

Four-fold cross validation, 75/25 split (training/validation)

Area under the receiver operating curve, accuracy, precision, recall

Not mentioned

Karhade et al. [28]

tenfold cross validation

Discrimination (c-statistic or area under the receiver operating curve), calibration (calibration slope, calibration intercept), and overall performance (Brier score)

Multiple imputation with the missForest methodology was undertaken for variables with less than 30% missing data

Kebede et al. [29]

10% cross validation, 90/10 split (training/testing)

Area under the receiver operating curve; classification accuracy-true positive, false positive, precision, recall

If information is incomplete, un-readable or their manual record is lost, patients were excluded from the study

Khanji et al. [47]

Ten-fold cross validation

Akaike Information Criterion, area under the receiver operating curve

Excluded patients with missing data at the end of the study (± 6 months)

Kim et al. [30]

70/30 split (training/testing)

Area under the receiver operating curve, sensitivity, specificity, precision, accuracy

Not mentioned

Kwon et al. [32]

Derivation set (June 2010-July 2016) and validation set (Aug 2016–2017) split by date

Receiver operating characteristic curve, the area under the precision–recall curve, net reclassification index, sensitivity, positive predictive value, negative predictive value, net reclassification index, F-measure

Not mentioned

Kwon et al. [31]

Split into derivation and validation datasets according to the year. The derivation data was the patient data for 2012–2015, and the validation data was the patient data for 2016

Area under the receiver operating characteristic curve

Excluded patients with missing values

Lopez-de-Andres et al. [34]

Random 60/20/20 split, where the third group was selected for model selection purposes prior to validation

Area under the receiver operating characteristic curve, accuracy rate, error rate, sensitivity, specificity, precision, positive likelihood, negative likelihood, F1 score, false positive rate, false negative rate, false discovery rate, positive predictive value, negative predictive value, Matthews correlation, informedness, markedness

Not mentioned

Mubeen et al. [19]

Ten-fold cross validation

Accuracy, sensitivity, specificity based on prediction of out-of-bag samples, area under the receiver operating characteristic curve

Not mentioned

Neefjes et al. [18]

Five-fold cross validation

Area under the receiver operating characteristic curve

Not mentioned

Ng et al. [36]

60/40 split, five-fold cross validation

Area under the curve, sensitivity, specificity, accuracy

Excluded patients with missing data

Oviedo et al. [46]

80/20 split (training/testing)

Matthews correlation coefficient

Removed records from the model with missing data

Pei et al. [17]

70/30 split (training/testing)

True positives, true negatives, false negatives, false positives, area under the receiver operating characteristic curve

Removed patients from the model with missing data

Perez-Gandia et al. [37]

Three subjects (each with two daily profiles) were used for training, the remaining patients were used for validation

Accuracy

Spline techniques were used to impute missing data in the training set

Ramezankhani et al. [16]

70/30 split (training/validation)

Sensitivity, specificity, positive predictive value, negative predictive value, geometric mean, F-measure, area under the curve

Single imputation was used, for continuous variables- CART, for categorical variables- weighted K-Nearest Neighbor approach

Rau et al. [35]

70/30 split, ten-fold cross validation

Sensitivity, specificity, area under the curve

Not mentioned

Scheer et al. [33]

70/30 split (training/testing)

Accuracy, area under the receiver operating characteristic curve, predictor importance

Missing values within the database were imputed using standard techniques such as mean and median imputation

Toussi et al. [15]

Ten-fold cross validation

Precision (proportion of true positive records to the proportion of true positive and false positive records)

imputed using a model-based approach

Zhou et al. [44]

Training cohort: 142 patients were part of a prior phase III randomized controlled trial (Oct 2013–Mar 2016) + 182 eligible consecutive patients (Jan 2012–Dec 2016). Validation cohort: 61 eligible consecutive patients (Jan 2017–Aug 2017)

The concordance index (c-index) was calculated as the area under the receiver operating characteristic curve, calibration plot

Patients were excluded if clinical data were missing before or 30 days after percutaneous transhepatic biliary drainage