Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

Brnabic, Alan; Hess, Lisa M.

doi:10.1186/s12911-021-01403-2

BMC Medical Informatics and Decision Making

Table 3 Details of methods applied to the analysis in eligible studies

From: Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

References	Internal validation	Evaluation of model fit/performance	Handling of missing values
Alaa et al. [23]	Training set corresponding to 90% of the sample; ten-fold cross validation	Area under the receiver operating characteristic curve and 95% confidence intervals (Wilson score intervals), Brier score	Retained as an informative variable, and only variables missing for 85% or more of participants were excluded
Anderson et al. [48]	Split the data based on site of care	Bayesian Information Criterion, prediction model ensembles, ß estimates, predicted probabilities, and area under the receiver operating characteristic curve estimates	Missing covariate values were included in models as a discrete category
Azimi et al. [38]	2:1:1 ratio to generate training, testing, and validation cohorts	Receiver-operating characteristic curves, positive predictive value, negative predictive value, area under the curve from the receiver operating curve analysis, Homer-Lemeshow statistic	Cases with missing outcome data were excluded
Bannister et al. [22]	Derivation set of approximately 66.67% and a validation set of approximately 33.33% as noted in text (abstract states 70:30 split)	Akaike Information Criterion	Single imputation, followed by multiple imputation in the final model to evaluate differences in model parameters
Baxter et al. [24]	Leave-one-out cross-validation (LOOCV) approach, also known as the jackknife method	Area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, the Youden index	Not mentioned
Bowman [39]	Re-analysis of models using clinic data during a different time period	Area under the receiver operating characteristic curve	Not mentioned
Bertsimas et al. [21]	60/40 split	Positive predictive value, area under the curve, accuracy	Imputed using an optimal-impute algorithm
Dong et al. [25]	9/1 random split, ten-fold cross validation	Accuracy, precision, F1 score, true negative rate, area under the receiver operating characteristic curve	Missing values were filled based on the mean value of data from the same attribute of the same patient. If a patient had fewer than 3 records, imputed the mean value of that attribute from all patients
Hearn et al. [40]	100-iteration Monte Carlo cross validation, 75/25 split, five-fold cross validation	Mean area under the receiver operating characteristic curve, true- and false-positive rates, true- and false-negative rates, positive and negative predictive values	Variables with > 10% of values missing were discarded from the analysis, whereas the remaining missing values were filled in using multiple imputation by chained random forests (maximum number of iterations = 5, number of trees = 10). The sole exception to the 10% cutoff was heart rate recovery, which featured 32% missing values but was kept in the data set and imputed with the above procedure because of its wide usage in prognostication from cardiopulmonary exercise test
Hertroijs et al. [45]	Five-fold cross validation	Akaike Information Criterion, Bayesian Information Criterion, Lo-Mendel-Rubin likelihood ratio test	The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model, but patients with missing covariate values at baseline were excluded from the validation of the model
Hill et al. [26]	2/1 split	Area under the receiver operating characteristic curve, positive predictive value, potential number reeded-to-screen	Imputed with last-observation-carried-forward
Hische et al. [20]	Ten-fold cross validation	Models with a mean sensitivity above 90% after cross-validation were selected, specificity, positive predictive value, negative predictive value	Not mentioned
Isma'eel et al. [41]	For myocardial perfusion imaging: 120 out the 479 patients who tested positive were added randomly to the derivation cohort; 120 of the remaining 4875 patients who tested negative were also added randomly to the derivation cohort. The remaining 5114 patients were all added to the validation cohort. For coronary artery disease: the derivation cohort was randomly chosen as follows: 93 out of the 278 patients who tested positive were added randomly to the derivation cohort and 93 out of the remaining 5076 patients who tested negative were also added randomly to the derivation cohort. The remaining 5169 patients are all added to the validation cohort	Sensitivity and specificity, discriminatory power and 95% confidence interval, number of tests avoided, negative predictive value, positive predictive value	Not mentioned
Isma'eel et al. [42]	The derivation cohort was randomly chosen 30 out of the 59 patients who tested positive were added randomly to the derivation cohort, and 30 out of the remaining 427 patients who tested negative were also added randomly to the derivation cohort. The remaining 426 patients (29 positive, 397 negative) were all added to the testing cohort; during the training phase, the 60 patients that are used for training were split 80% for pure training and 20% for validation	Negative and positive predictive values, descriminatory power, percentage of avoided tests, sensitivity and specificity	Not mentioned
Jovanovic et al. [43]	The sample was randomly divided into 3 parts: training, testing, and validation sample	Area under the receiver operating curve, sensitivity, specificity, and positive and negative predictive values	Not mentioned
Kang et al. [27]	Four-fold cross validation, 75/25 split (training/validation)	Area under the receiver operating curve, accuracy, precision, recall	Not mentioned
Karhade et al. [28]	tenfold cross validation	Discrimination (c-statistic or area under the receiver operating curve), calibration (calibration slope, calibration intercept), and overall performance (Brier score)	Multiple imputation with the missForest methodology was undertaken for variables with less than 30% missing data
Kebede et al. [29]	10% cross validation, 90/10 split (training/testing)	Area under the receiver operating curve; classification accuracy-true positive, false positive, precision, recall	If information is incomplete, un-readable or their manual record is lost, patients were excluded from the study
Khanji et al. [47]	Ten-fold cross validation	Akaike Information Criterion, area under the receiver operating curve	Excluded patients with missing data at the end of the study (± 6 months)
Kim et al. [30]	70/30 split (training/testing)	Area under the receiver operating curve, sensitivity, specificity, precision, accuracy	Not mentioned
Kwon et al. [32]	Derivation set (June 2010-July 2016) and validation set (Aug 2016–2017) split by date	Receiver operating characteristic curve, the area under the precision–recall curve, net reclassification index, sensitivity, positive predictive value, negative predictive value, net reclassification index, F-measure	Not mentioned
Kwon et al. [31]	Split into derivation and validation datasets according to the year. The derivation data was the patient data for 2012–2015, and the validation data was the patient data for 2016	Area under the receiver operating characteristic curve	Excluded patients with missing values
Lopez-de-Andres et al. [34]	Random 60/20/20 split, where the third group was selected for model selection purposes prior to validation	Area under the receiver operating characteristic curve, accuracy rate, error rate, sensitivity, specificity, precision, positive likelihood, negative likelihood, F1 score, false positive rate, false negative rate, false discovery rate, positive predictive value, negative predictive value, Matthews correlation, informedness, markedness	Not mentioned
Mubeen et al. [19]	Ten-fold cross validation	Accuracy, sensitivity, specificity based on prediction of out-of-bag samples, area under the receiver operating characteristic curve	Not mentioned
Neefjes et al. [18]	Five-fold cross validation	Area under the receiver operating characteristic curve	Not mentioned
Ng et al. [36]	60/40 split, five-fold cross validation	Area under the curve, sensitivity, specificity, accuracy	Excluded patients with missing data
Oviedo et al. [46]	80/20 split (training/testing)	Matthews correlation coefficient	Removed records from the model with missing data
Pei et al. [17]	70/30 split (training/testing)	True positives, true negatives, false negatives, false positives, area under the receiver operating characteristic curve	Removed patients from the model with missing data
Perez-Gandia et al. [37]	Three subjects (each with two daily profiles) were used for training, the remaining patients were used for validation	Accuracy	Spline techniques were used to impute missing data in the training set
Ramezankhani et al. [16]	70/30 split (training/validation)	Sensitivity, specificity, positive predictive value, negative predictive value, geometric mean, F-measure, area under the curve	Single imputation was used, for continuous variables- CART, for categorical variables- weighted K-Nearest Neighbor approach
Rau et al. [35]	70/30 split, ten-fold cross validation	Sensitivity, specificity, area under the curve	Not mentioned
Scheer et al. [33]	70/30 split (training/testing)	Accuracy, area under the receiver operating characteristic curve, predictor importance	Missing values within the database were imputed using standard techniques such as mean and median imputation
Toussi et al. [15]	Ten-fold cross validation	Precision (proportion of true positive records to the proportion of true positive and false positive records)	imputed using a model-based approach
Zhou et al. [44]	Training cohort: 142 patients were part of a prior phase III randomized controlled trial (Oct 2013–Mar 2016) + 182 eligible consecutive patients (Jan 2012–Dec 2016). Validation cohort: 61 eligible consecutive patients (Jan 2017–Aug 2017)	The concordance index (c-index) was calculated as the area under the receiver operating characteristic curve, calibration plot	Patients were excluded if clinical data were missing before or 30 days after percutaneous transhepatic biliary drainage

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com