Skip to main content

Table 2 Summary of eligible publications

From: Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

References Study objective Disease state Data source(s) Statistical modeling method(s) Software Sample size Number of different models
Alaa et al. [23] To develop machine-learning-based risk prediction models Cardiovascular disease Prospective cohort study—UK Biobank Cox proportional hazards models, linear support vector machines, random forest, neural networks, AdaBoost, and gradient boosting machines Python 423,604 6
Anderson et al. [48] To identify patient characteristics that predict progression to prediabetes and type 2 diabetes in a US adult population Diabetes Retrospective database—electronic health records (Humedica) Novel analytical platform based on a Bayesian approach Reverse Engineering and Forward Simulation (REFS™) 24,331 1
Azimi et al. [38] To select patients for surgery or non-surgical options Neurology Retrospective database Logistic regression, Artificial neural network SPSS for windows (Version 17.0), STATISTICA 10.0 Neural Networks 346 2
Bannister et al. [22] To determine the utility of genetic programming for the automatic development of clinical prediction models Cardiovascular disease Prospective observational cohort Cox regression models, Tree-based genetic programming R Package version 3.0.1 and 3.1.2 3873 2
Baxter et al. [24] To predict the need for surgical intervention in patients with primary open-angle glaucoma Neurology Retrospective database—electronic health records (EpicCare) Logistic regression, random forest, artificial neural network Random Forest R package; nnet package in R 385 3
Bertsimas et al. [21] To predict patients at high risk of mortality before the start of treatment regimes Oncology Retrospective database—electronic health records and social security death index Logistic regression, Decision tree analysis (Gradient boosted, optimal classification, and Classification and Regression Tree [CART]) Not specified 23,983 2
Bowman et al. [39] To develop and validate a comprehensive, multivariate prognostic model for carpal tunnel surgery Neurology Retrospective database—clinic data Logistic regression, artificial neural network Stata v 14, MATLAB v. 8.3.0.532 200 2
Dong et al. [25] To present and validate a novel surgical predictive model to facilitate therapeutic decision-making Inflammatory bowel disease Retrospective database- electronic health records Random forest, logistic regression, decision tree, support vector machine, artificial neural network Python 3.6 239 5
Hearn et al. [40] To assess whether the prognostication of heart failure patients using cardiopulmonary exercise test data could be improved by considering the entirety of the data generated during a cardiopulmonary exercise test, as opposed to using summary indicators alone Cardiovascular disease Retrospective database -electronic health records and exercise test data Logistic regression, least absolute shrinkage and selection operator (LASSO) model, generalized additive model, feedforward neural network R Project for Statistical Computing v3.4.2, and Python Programming Language v3.6.2 1156 4
Hertroijs et al. [45] To identify subgroups of people with newly diagnosed type 2 diabetes with distinct glycaemic trajectories; to predict trajectory membership using patient characteristics Diabetes Retrospective database -electronic health records Latent growth mixture modeling, K-nearest neighbor/Parzen, Fisher, linear/quadratic discriminant classifier, supper vector machine, radial basis function, logistic regression Mplus Version 7.1 14,305 7
Hill et al. [26] To develop a clinically applicable risk prediction model to identify associations between baseline and time-varying factors and the identification of atrial fibrillation Cardiovascular disease Retrospective database—electronic health records (Clinical Practice Research Datalink, CPRD) Logistic least absolute shrinkage and selector operator (LASSO), random forests, support vector machines, neural networks R v3.3.1 2,994,837—baseline model; 162,672—time varying model 4
Hische et al. [20] To create a simple and reliable tool to identify individuals with impaired glucose metabolism Diabetes Cross-sectional study Decision tree analysis Quinlan C4.5 1737 1
Isma'eel et al. [41] To investigate the use of artificial neural networks to improve risk stratification and prediction of myocardial perfusion imaging and angiographic results Cardiovascular disease Retrospective medical records Artificial neural network Not specified 5354 1
Isma'eel et al. [42] To compare artificial neural network-based prediction models to other risk models that are being used in clinical practice Cardiovascular disease Prospective cohort Artificial neural network Not specified 486 1
Jovanovic et al. [43] to determine whether an artificial neural network model could be constructed to accurately predict the need for therapeutic ERCP in patients with a firm clinical suspicion of having common bile duct stones and to compare it with our previously reported predictive model Choledocholithiasis Prospective cohort Artificial neural network SPSS v20.0 291 1
Kang et al. [27] To investigate the feasibility of developing a machine-learning model to predict postinduction hypotension Cardiovascular disease Retrospective database—electronic health records Developed Naïve Bayes, logistic regression, random forest, and artificial neural network caret R package 222 4
Karhade et al. [28] To develop algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation Lumbar disk herniation Retrospective chart review Random forest, stochastic gradient boosting, neural network, support vector machine, elastic-net penalized logistic regression Anaconda Distribution, R version 3.5.0, RStudio version 1.1.453, and Python version 3.6 5413 5
Kebede et al. [29] to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART HIV/AIDS Retrospective database and chart review J48 decision tree/random forest, neural network WEKA 3.8 3104 2
Khanji et al. [47] To identify an effective method to build prediction models and assess predictive validity of pre-defined indicators Cardiovascular disease Observational trial, cluster randomization Logistic regression, LASSO regression, Hybrid approach (combination of both approaches) SAS version 9.1, SPSS version 24, and R version 3.3.2 759 2
Kim et al. [30] to develop a prediction tool using machine learning for high- or low-risk oncotype dx criteria Oncology Retrospective chart review Two-class Decision Forest, Two-class Decision Jungle, Two-class Bayes Point Machine, Two-class Support Vector Machine, Two-class Neural Network SAS 9.4; Azure Machine Learning Platform 284 4
Kwon et al. [32] to predict cardiac arrest using deep learning Cardiovascular disease Retrospective database—electronic health records Random forest, logistic regression, recurrent neural network Not specified 52,131 3
Kwon et al. [31] to predict prognosis of out-of-hospital cardiac arrest using deep-learning Cardiovascular disease Retrospective database – registry Logistic regression, support vector machine, random forest Not specified 36,190 3
Lopez-de-Andres et al. [34] To estimate predictive factors of in-hospital mortality in patients with type 2 diabetes after major lower extremity amputation Diabetes Retrospective database—hospital discharge database Artificial neural network Neural Designer; Stata MP version 10.1 40,857 lower extremity amputation events 1
Mubeen et al. [19] To assess risk of developing Alzheimer’s Disease in mildly cognitively impaired subjects; to classify subjects in two groups: those who would remain stable and those who would progress to develop Alzheimer’s disease Alzheimer’s Disease Retrospective database – (Alzheimer’s Disease Neuroimaging Project) Random Forest algorithm Random Forest R package 247 1
Neefjes et al. [18] To develop a prediction model to identify patients with cancer at high risk for delirium Oncology Retrospective database –hospital inpatient data Decision tree analysis R program Rpart version 3.1; Statistical Package for the Social Sciences (SPSS) v20.0 574 1
Ng et al. [36] To create a clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy Oncology Retrospective database—electronic health records and case notes Naïve Bayes, neural network, and support vector machine SIMCA-P + version 12.0.1; SPSS version 19.0; RapidMiner version 5.0.010 325 3
Oviedo et al. [46] To focus on patient-specific prediction of hypoglycemic events Diabetes Retrospective database – hospital clinic data Support vector classifier Python 10 1
Pei et al. [17] To identify individuals with potential diabetes Diabetes Retrospective medical records review Decision tree analysis WEKA 3.8.1 and SPSS version 20.0 10,436 1
Perez-Gandia et al. [37] To predict future glucose concentration levels from continuous glucose monitoring data Diabetes Retrospective database—device dataset Artificial neural network Not specified 15 1
Ramezankhani et al. [16] To gain more information on interactions between factors contributing to the incidence of type 2 diabetes Diabetes Prospective cohort Decision tree analysis (CART, Quick Unbiased Efficient Statistical Tree [QUEST], and commercial version [C5.0]) IBM SPSS modeler 14.2 6647 1
Rau et al. [35] To predict the development of liver cancer within 6 years of diagnosis with type 2 diabetes Diabetes and Oncology Retrospective database—claims linked to registry data Logistic regression, artificial neural network, support vector machine, and decision tree analysis STATISTICA, version 10 2060 4
Scheer et al. [33] To develop a model based on baseline demographic, radiographic, and surgical factors that can predict if patients will sustain an intraoperative or perioperative major complication Spinal deformity Retrospective database Decision tree analysis SPSS version 22; SPSS modeler version 16 557 1
Toussi et al. [15] To identify knowledge gaps in guidelines and to explore physicians' therapeutic decisions using data mining techniques to fill these knowledge gaps Diabetes Retrospective database—electronic health records Decision tree analysis Quinlan’s C5.0 decision-tree learning algorithm; SPSS Clementine software version 10.1 463 1
Zhou et al. [44] To assess pre-procedural independent risk factors and to establish a “Risk Prediction for Early Biliary Infection” nomogram for patients with malignant biliary obstruction who underwent percutaneous transhepatic biliary stent Oncology Retrospective medical record and trial data Logistic regression, artificial neural network SPSS version 22; R package (version 3.4.3) 243 2