Skip to main content

Table 5 Number of selected features and influence of each feature selection method on evaluation metrics

From: Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods

FS method

Number of selected features

Machine learning method

Support vector machine-RBF

K-nearest neighbor

Random forest

P

R

F1

Acc

AUROC

AUPRC

P

R

F1

Acc

AUROC

AUPRC

P

R

F1

Acc

AUROC

AUPRC

Base selectors

Baseline

361

0.62

0.30

0.41

0.80

0.69

0.40

0.69

0.49

0.57

0.86

0.86

0.54

0.77

0.65

0.71

0.90

0.93

0.96

Chi2

106

0.83

0.41

0.55

0.88

0.81

0.55

0.72

0.59

0.65

0.89

0.89

0.65

0.87

0.70

0.77

0.93

0.95

0.77

MI

172

0.75

0.43

0.54

0.87

0.76

0.54

0.67

0.50

0.57

0.87

0.88

0.57

0.77

0.64

0.70

0.90

0.94

0.70

FIS

107

0.78

0.44

0.56

0.88

0.81

0.56

0.69

0.71

0.76

0.88

0.89

0.65

0.82

0.71

0.76

0.92

0.95

0.76

FGS

66

0.74

0.36

0.48

0.87

0.79

0.48

0.68

0.52

0.59

0.87

0.89

0.59

0.79

0.71

0.75

0.92

0.95

0.48

RFE-RF

131

0.79

0.64

0.71

0.91

0.90

0.66

0.79

0.74

0.76

0.92

0.95

0.74

0.83

0.85

0.84

0.94

0.97

0.83

Ensemble selectors

IFFS

58

0.73

0.33

0.45

0.86

0.74

0.45

0.64

0.53

0.58

0.87

0.82

0.57

0.74

0.61

0.67

0.88

0.91

0.66

UFFS

223

0.81

0.56

0.66

0.87

0.90

0.66

0.75

0.73

0.74

0.94

0.91

0.74

0.87

0.81

0.84

0.94

0.98

0.83

IFS

19

0.82

0.17

0.27

0.85

0.62

0.27

0.49

0.47

0.48

0.82

0.79

0.48

0.75

0.36

0.49

0.87

0.89

0.48

UFS

272

0.81

0.65

0.72

0.91

0.91

0.72

0.80

0.75

0.77

0.92

0.95

0.77

0.89

0.91

0.90

0.96

0.98

0.89

  1. Bold values indicate better predictive performances as well as a better subset of features
  2. Precision (P), Recall (R), F1-Score (F1), Accuracy (ACC), Area Under the Receiver Operating Characteristic (AUROC), Area Under the Precision-Recall Curve (AUPRC), Mutual information (MI), Chi-squared (chi2), Fisher score (FIS), Forward Greedy search (FGS), Recursive Feature Elimination using a Random Forest (RFE-RF),Union Filter FS (UFFS), Intersection Filter FS (IFFS), Union FS (UFS), Intersection FS (IFS)