References | Study objective | Disease state | Data source(s) | Statistical modeling method(s) | Software | Sample size | Number of different models |
---|---|---|---|---|---|---|---|
Alaa et al. [23] | To develop machine-learning-based risk prediction models | Cardiovascular disease | Prospective cohort study—UK Biobank | Cox proportional hazards models, linear support vector machines, random forest, neural networks, AdaBoost, and gradient boosting machines | Python | 423,604 | 6 |
Anderson et al. [48] | To identify patient characteristics that predict progression to prediabetes and type 2 diabetes in a US adult population | Diabetes | Retrospective database—electronic health records (Humedica) | Novel analytical platform based on a Bayesian approach | Reverse Engineering and Forward Simulation (REFS™) | 24,331 | 1 |
Azimi et al. [38] | To select patients for surgery or non-surgical options | Neurology | Retrospective database | Logistic regression, Artificial neural network | SPSS for windows (Version 17.0), STATISTICA 10.0 Neural Networks | 346 | 2 |
Bannister et al. [22] | To determine the utility of genetic programming for the automatic development of clinical prediction models | Cardiovascular disease | Prospective observational cohort | Cox regression models, Tree-based genetic programming | R Package version 3.0.1 and 3.1.2 | 3873 | 2 |
Baxter et al. [24] | To predict the need for surgical intervention in patients with primary open-angle glaucoma | Neurology | Retrospective database—electronic health records (EpicCare) | Logistic regression, random forest, artificial neural network | Random Forest R package; nnet package in R | 385 | 3 |
Bertsimas et al. [21] | To predict patients at high risk of mortality before the start of treatment regimes | Oncology | Retrospective database—electronic health records and social security death index | Logistic regression, Decision tree analysis (Gradient boosted, optimal classification, and Classification and Regression Tree [CART]) | Not specified | 23,983 | 2 |
Bowman et al. [39] | To develop and validate a comprehensive, multivariate prognostic model for carpal tunnel surgery | Neurology | Retrospective database—clinic data | Logistic regression, artificial neural network | Stata v 14, MATLAB v. 8.3.0.532 | 200 | 2 |
Dong et al. [25] | To present and validate a novel surgical predictive model to facilitate therapeutic decision-making | Inflammatory bowel disease | Retrospective database- electronic health records | Random forest, logistic regression, decision tree, support vector machine, artificial neural network | Python 3.6 | 239 | 5 |
Hearn et al. [40] | To assess whether the prognostication of heart failure patients using cardiopulmonary exercise test data could be improved by considering the entirety of the data generated during a cardiopulmonary exercise test, as opposed to using summary indicators alone | Cardiovascular disease | Retrospective database -electronic health records and exercise test data | Logistic regression, least absolute shrinkage and selection operator (LASSO) model, generalized additive model, feedforward neural network | R Project for Statistical Computing v3.4.2, and Python Programming Language v3.6.2 | 1156 | 4 |
Hertroijs et al. [45] | To identify subgroups of people with newly diagnosed type 2 diabetes with distinct glycaemic trajectories; to predict trajectory membership using patient characteristics | Diabetes | Retrospective database -electronic health records | Latent growth mixture modeling, K-nearest neighbor/Parzen, Fisher, linear/quadratic discriminant classifier, supper vector machine, radial basis function, logistic regression | Mplus Version 7.1 | 14,305 | 7 |
Hill et al. [26] | To develop a clinically applicable risk prediction model to identify associations between baseline and time-varying factors and the identification of atrial fibrillation | Cardiovascular disease | Retrospective database—electronic health records (Clinical Practice Research Datalink, CPRD) | Logistic least absolute shrinkage and selector operator (LASSO), random forests, support vector machines, neural networks | R v3.3.1 | 2,994,837—baseline model; 162,672—time varying model | 4 |
Hische et al. [20] | To create a simple and reliable tool to identify individuals with impaired glucose metabolism | Diabetes | Cross-sectional study | Decision tree analysis | Quinlan C4.5 | 1737 | 1 |
Isma'eel et al. [41] | To investigate the use of artificial neural networks to improve risk stratification and prediction of myocardial perfusion imaging and angiographic results | Cardiovascular disease | Retrospective medical records | Artificial neural network | Not specified | 5354 | 1 |
Isma'eel et al. [42] | To compare artificial neural network-based prediction models to other risk models that are being used in clinical practice | Cardiovascular disease | Prospective cohort | Artificial neural network | Not specified | 486 | 1 |
Jovanovic et al. [43] | to determine whether an artificial neural network model could be constructed to accurately predict the need for therapeutic ERCP in patients with a firm clinical suspicion of having common bile duct stones and to compare it with our previously reported predictive model | Choledocholithiasis | Prospective cohort | Artificial neural network | SPSS v20.0 | 291 | 1 |
Kang et al. [27] | To investigate the feasibility of developing a machine-learning model to predict postinduction hypotension | Cardiovascular disease | Retrospective database—electronic health records | Developed Naïve Bayes, logistic regression, random forest, and artificial neural network | caret R package | 222 | 4 |
Karhade et al. [28] | To develop algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation | Lumbar disk herniation | Retrospective chart review | Random forest, stochastic gradient boosting, neural network, support vector machine, elastic-net penalized logistic regression | Anaconda Distribution, R version 3.5.0, RStudio version 1.1.453, and Python version 3.6 | 5413 | 5 |
Kebede et al. [29] | to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART | HIV/AIDS | Retrospective database and chart review | J48 decision tree/random forest, neural network | WEKA 3.8 | 3104 | 2 |
Khanji et al. [47] | To identify an effective method to build prediction models and assess predictive validity of pre-defined indicators | Cardiovascular disease | Observational trial, cluster randomization | Logistic regression, LASSO regression, Hybrid approach (combination of both approaches) | SAS version 9.1, SPSS version 24, and R version 3.3.2 | 759 | 2 |
Kim et al. [30] | to develop a prediction tool using machine learning for high- or low-risk oncotype dx criteria | Oncology | Retrospective chart review | Two-class Decision Forest, Two-class Decision Jungle, Two-class Bayes Point Machine, Two-class Support Vector Machine, Two-class Neural Network | SAS 9.4; Azure Machine Learning Platform | 284 | 4 |
Kwon et al. [32] | to predict cardiac arrest using deep learning | Cardiovascular disease | Retrospective database—electronic health records | Random forest, logistic regression, recurrent neural network | Not specified | 52,131 | 3 |
Kwon et al. [31] | to predict prognosis of out-of-hospital cardiac arrest using deep-learning | Cardiovascular disease | Retrospective database – registry | Logistic regression, support vector machine, random forest | Not specified | 36,190 | 3 |
Lopez-de-Andres et al. [34] | To estimate predictive factors of in-hospital mortality in patients with type 2 diabetes after major lower extremity amputation | Diabetes | Retrospective database—hospital discharge database | Artificial neural network | Neural Designer; Stata MP version 10.1 | 40,857 lower extremity amputation events | 1 |
Mubeen et al. [19] | To assess risk of developing Alzheimer’s Disease in mildly cognitively impaired subjects; to classify subjects in two groups: those who would remain stable and those who would progress to develop Alzheimer’s disease | Alzheimer’s Disease | Retrospective database – (Alzheimer’s Disease Neuroimaging Project) | Random Forest algorithm | Random Forest R package | 247 | 1 |
Neefjes et al. [18] | To develop a prediction model to identify patients with cancer at high risk for delirium | Oncology | Retrospective database –hospital inpatient data | Decision tree analysis | R program Rpart version 3.1; Statistical Package for the Social Sciences (SPSS) v20.0 | 574 | 1 |
Ng et al. [36] | To create a clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy | Oncology | Retrospective database—electronic health records and case notes | Naïve Bayes, neural network, and support vector machine | SIMCA-P + version 12.0.1; SPSS version 19.0; RapidMiner version 5.0.010 | 325 | 3 |
Oviedo et al. [46] | To focus on patient-specific prediction of hypoglycemic events | Diabetes | Retrospective database – hospital clinic data | Support vector classifier | Python | 10 | 1 |
Pei et al. [17] | To identify individuals with potential diabetes | Diabetes | Retrospective medical records review | Decision tree analysis | WEKA 3.8.1 and SPSS version 20.0 | 10,436 | 1 |
Perez-Gandia et al. [37] | To predict future glucose concentration levels from continuous glucose monitoring data | Diabetes | Retrospective database—device dataset | Artificial neural network | Not specified | 15 | 1 |
Ramezankhani et al. [16] | To gain more information on interactions between factors contributing to the incidence of type 2 diabetes | Diabetes | Prospective cohort | Decision tree analysis (CART, Quick Unbiased Efficient Statistical Tree [QUEST], and commercial version [C5.0]) | IBM SPSS modeler 14.2 | 6647 | 1 |
Rau et al. [35] | To predict the development of liver cancer within 6 years of diagnosis with type 2 diabetes | Diabetes and Oncology | Retrospective database—claims linked to registry data | Logistic regression, artificial neural network, support vector machine, and decision tree analysis | STATISTICA, version 10 | 2060 | 4 |
Scheer et al. [33] | To develop a model based on baseline demographic, radiographic, and surgical factors that can predict if patients will sustain an intraoperative or perioperative major complication | Spinal deformity | Retrospective database | Decision tree analysis | SPSS version 22; SPSS modeler version 16 | 557 | 1 |
Toussi et al. [15] | To identify knowledge gaps in guidelines and to explore physicians' therapeutic decisions using data mining techniques to fill these knowledge gaps | Diabetes | Retrospective database—electronic health records | Decision tree analysis | Quinlan’s C5.0 decision-tree learning algorithm; SPSS Clementine software version 10.1 | 463 | 1 |
Zhou et al. [44] | To assess pre-procedural independent risk factors and to establish a “Risk Prediction for Early Biliary Infection” nomogram for patients with malignant biliary obstruction who underwent percutaneous transhepatic biliary stent | Oncology | Retrospective medical record and trial data | Logistic regression, artificial neural network | SPSS version 22; R package (version 3.4.3) | 243 | 2 |