Skip to main content

Table 2 Summary of eligible publications

From: Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

References

Study objective

Disease state

Data source(s)

Statistical modeling method(s)

Software

Sample size

Number of different models

Alaa et al. [23]

To develop machine-learning-based risk prediction models

Cardiovascular disease

Prospective cohort study—UK Biobank

Cox proportional hazards models, linear support vector machines, random forest, neural networks, AdaBoost, and gradient boosting machines

Python

423,604

6

Anderson et al. [48]

To identify patient characteristics that predict progression to prediabetes and type 2 diabetes in a US adult population

Diabetes

Retrospective database—electronic health records (Humedica)

Novel analytical platform based on a Bayesian approach

Reverse Engineering and Forward Simulation (REFS™)

24,331

1

Azimi et al. [38]

To select patients for surgery or non-surgical options

Neurology

Retrospective database

Logistic regression, Artificial neural network

SPSS for windows (Version 17.0), STATISTICA 10.0 Neural Networks

346

2

Bannister et al. [22]

To determine the utility of genetic programming for the automatic development of clinical prediction models

Cardiovascular disease

Prospective observational cohort

Cox regression models, Tree-based genetic programming

R Package version 3.0.1 and 3.1.2

3873

2

Baxter et al. [24]

To predict the need for surgical intervention in patients with primary open-angle glaucoma

Neurology

Retrospective database—electronic health records (EpicCare)

Logistic regression, random forest, artificial neural network

Random Forest R package; nnet package in R

385

3

Bertsimas et al. [21]

To predict patients at high risk of mortality before the start of treatment regimes

Oncology

Retrospective database—electronic health records and social security death index

Logistic regression, Decision tree analysis (Gradient boosted, optimal classification, and Classification and Regression Tree [CART])

Not specified

23,983

2

Bowman et al. [39]

To develop and validate a comprehensive, multivariate prognostic model for carpal tunnel surgery

Neurology

Retrospective database—clinic data

Logistic regression, artificial neural network

Stata v 14, MATLAB v. 8.3.0.532

200

2

Dong et al. [25]

To present and validate a novel surgical predictive model to facilitate therapeutic decision-making

Inflammatory bowel disease

Retrospective database- electronic health records

Random forest, logistic regression, decision tree, support vector machine, artificial neural network

Python 3.6

239

5

Hearn et al. [40]

To assess whether the prognostication of heart failure patients using cardiopulmonary exercise test data could be improved by considering the entirety of the data generated during a cardiopulmonary exercise test, as opposed to using summary indicators alone

Cardiovascular disease

Retrospective database -electronic health records and exercise test data

Logistic regression, least absolute shrinkage and selection operator (LASSO) model, generalized additive model, feedforward neural network

R Project for Statistical Computing v3.4.2, and Python Programming Language v3.6.2

1156

4

Hertroijs et al. [45]

To identify subgroups of people with newly diagnosed type 2 diabetes with distinct glycaemic trajectories; to predict trajectory membership using patient characteristics

Diabetes

Retrospective database -electronic health records

Latent growth mixture modeling, K-nearest neighbor/Parzen, Fisher, linear/quadratic discriminant classifier, supper vector machine, radial basis function, logistic regression

Mplus Version 7.1

14,305

7

Hill et al. [26]

To develop a clinically applicable risk prediction model to identify associations between baseline and time-varying factors and the identification of atrial fibrillation

Cardiovascular disease

Retrospective database—electronic health records (Clinical Practice Research Datalink, CPRD)

Logistic least absolute shrinkage and selector operator (LASSO), random forests, support vector machines, neural networks

R v3.3.1

2,994,837—baseline model; 162,672—time varying model

4

Hische et al. [20]

To create a simple and reliable tool to identify individuals with impaired glucose metabolism

Diabetes

Cross-sectional study

Decision tree analysis

Quinlan C4.5

1737

1

Isma'eel et al. [41]

To investigate the use of artificial neural networks to improve risk stratification and prediction of myocardial perfusion imaging and angiographic results

Cardiovascular disease

Retrospective medical records

Artificial neural network

Not specified

5354

1

Isma'eel et al. [42]

To compare artificial neural network-based prediction models to other risk models that are being used in clinical practice

Cardiovascular disease

Prospective cohort

Artificial neural network

Not specified

486

1

Jovanovic et al. [43]

to determine whether an artificial neural network model could be constructed to accurately predict the need for therapeutic ERCP in patients with a firm clinical suspicion of having common bile duct stones and to compare it with our previously reported predictive model

Choledocholithiasis

Prospective cohort

Artificial neural network

SPSS v20.0

291

1

Kang et al. [27]

To investigate the feasibility of developing a machine-learning model to predict postinduction hypotension

Cardiovascular disease

Retrospective database—electronic health records

Developed Naïve Bayes, logistic regression, random forest, and artificial neural network

caret R package

222

4

Karhade et al. [28]

To develop algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation

Lumbar disk herniation

Retrospective chart review

Random forest, stochastic gradient boosting, neural network, support vector machine, elastic-net penalized logistic regression

Anaconda Distribution, R version 3.5.0, RStudio version 1.1.453, and Python version 3.6

5413

5

Kebede et al. [29]

to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART

HIV/AIDS

Retrospective database and chart review

J48 decision tree/random forest, neural network

WEKA 3.8

3104

2

Khanji et al. [47]

To identify an effective method to build prediction models and assess predictive validity of pre-defined indicators

Cardiovascular disease

Observational trial, cluster randomization

Logistic regression, LASSO regression, Hybrid approach (combination of both approaches)

SAS version 9.1, SPSS version 24, and R version 3.3.2

759

2

Kim et al. [30]

to develop a prediction tool using machine learning for high- or low-risk oncotype dx criteria

Oncology

Retrospective chart review

Two-class Decision Forest, Two-class Decision Jungle, Two-class Bayes Point Machine, Two-class Support Vector Machine, Two-class Neural Network

SAS 9.4; Azure Machine Learning Platform

284

4

Kwon et al. [32]

to predict cardiac arrest using deep learning

Cardiovascular disease

Retrospective database—electronic health records

Random forest, logistic regression, recurrent neural network

Not specified

52,131

3

Kwon et al. [31]

to predict prognosis of out-of-hospital cardiac arrest using deep-learning

Cardiovascular disease

Retrospective database – registry

Logistic regression, support vector machine, random forest

Not specified

36,190

3

Lopez-de-Andres et al. [34]

To estimate predictive factors of in-hospital mortality in patients with type 2 diabetes after major lower extremity amputation

Diabetes

Retrospective database—hospital discharge database

Artificial neural network

Neural Designer; Stata MP version 10.1

40,857 lower extremity amputation events

1

Mubeen et al. [19]

To assess risk of developing Alzheimer’s Disease in mildly cognitively impaired subjects; to classify subjects in two groups: those who would remain stable and those who would progress to develop Alzheimer’s disease

Alzheimer’s Disease

Retrospective database – (Alzheimer’s Disease Neuroimaging Project)

Random Forest algorithm

Random Forest R package

247

1

Neefjes et al. [18]

To develop a prediction model to identify patients with cancer at high risk for delirium

Oncology

Retrospective database –hospital inpatient data

Decision tree analysis

R program Rpart version 3.1; Statistical Package for the Social Sciences (SPSS) v20.0

574

1

Ng et al. [36]

To create a clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy

Oncology

Retrospective database—electronic health records and case notes

Naïve Bayes, neural network, and support vector machine

SIMCA-P + version 12.0.1; SPSS version 19.0; RapidMiner version 5.0.010

325

3

Oviedo et al. [46]

To focus on patient-specific prediction of hypoglycemic events

Diabetes

Retrospective database – hospital clinic data

Support vector classifier

Python

10

1

Pei et al. [17]

To identify individuals with potential diabetes

Diabetes

Retrospective medical records review

Decision tree analysis

WEKA 3.8.1 and SPSS version 20.0

10,436

1

Perez-Gandia et al. [37]

To predict future glucose concentration levels from continuous glucose monitoring data

Diabetes

Retrospective database—device dataset

Artificial neural network

Not specified

15

1

Ramezankhani et al. [16]

To gain more information on interactions between factors contributing to the incidence of type 2 diabetes

Diabetes

Prospective cohort

Decision tree analysis (CART, Quick Unbiased Efficient Statistical Tree [QUEST], and commercial version [C5.0])

IBM SPSS modeler 14.2

6647

1

Rau et al. [35]

To predict the development of liver cancer within 6 years of diagnosis with type 2 diabetes

Diabetes and Oncology

Retrospective database—claims linked to registry data

Logistic regression, artificial neural network, support vector machine, and decision tree analysis

STATISTICA, version 10

2060

4

Scheer et al. [33]

To develop a model based on baseline demographic, radiographic, and surgical factors that can predict if patients will sustain an intraoperative or perioperative major complication

Spinal deformity

Retrospective database

Decision tree analysis

SPSS version 22; SPSS modeler version 16

557

1

Toussi et al. [15]

To identify knowledge gaps in guidelines and to explore physicians' therapeutic decisions using data mining techniques to fill these knowledge gaps

Diabetes

Retrospective database—electronic health records

Decision tree analysis

Quinlan’s C5.0 decision-tree learning algorithm; SPSS Clementine software version 10.1

463

1

Zhou et al. [44]

To assess pre-procedural independent risk factors and to establish a “Risk Prediction for Early Biliary Infection” nomogram for patients with malignant biliary obstruction who underwent percutaneous transhepatic biliary stent

Oncology

Retrospective medical record and trial data

Logistic regression, artificial neural network

SPSS version 22; R package (version 3.4.3)

243

2