Skip to main content

Table 3 Coding results for Fuwai dataset with \(f_s = 200\)

From: Comparison of different feature extraction methods for applicable automated ICD coding

Feature extraction & classifiers

Macro-F1 (%)

Micro-F1 (%)

Macro-AUC (%)

Micro-AUC (%)

BoW

 LR_uni

84.44

91.54

88.58

93.75

 SVM_uni

84.69

91.78

89.27

94.10

 LR_uni_bi

84.83

92.27

89.08

94.41

 SVM_uni_bi

83.02

91.57

88.23

93.93

 LR_uni_bi_tri

83.01

91.50

88.00

93.88

 SVM_uni_bi_tri

78.21

89.45

85.20

92.19

W2V

 LR_word

53.14

75.07

71.60

82.05

 SVM_word

35.73

64.92

64.09

75.10

 LR_char

48.03

70.54

68.77

79.04

 SVM_char

26.30

58.86

60.37

71.64

 LR_comb

61.73

80.27

75.75

85.47

 SVM_comb

46.26

73.68

69.17

80.51

RoBERTa_embeddings

 LR_char

64.56

78.59

77.90

85.51

 SVM_char

51.30

75.24

71.86

82.45

 LR_comb

72.41

84.20

82.23

89.07

 SVM_comb

64.25

81.41

77.57

86.44

RoBERTa_finetune

 top_layer

4.31

40.59

69.56

80.32

 whole

83.39

93.87

98.65

99.55

  1. For BoW, _uni, _uni_bi and _uni_bi_tri mean unigram, unigram+bigram and unigram+bigram+trigram respectively. For W2V, _comb means concatenating character and word embeddings, while _char (_word) means merely character (word) embeddings. For RoBERTa_embeddings, _char means merely the RoBERTa-Mini embeddings, and _comb means concatenating the RoBERTa-Mini embeddings and W2V word embbeddings. For RoBERTa_finetune, whole and top_layer mean fine-tuning the whole network and only the top fully connected layer respectively