Skip to main content

Table 3 Coding results for Fuwai dataset with \(f_s = 200\)

From: Comparison of different feature extraction methods for applicable automated ICD coding

Feature extraction & classifiers Macro-F1 (%) Micro-F1 (%) Macro-AUC (%) Micro-AUC (%)
BoW
 LR_uni 84.44 91.54 88.58 93.75
 SVM_uni 84.69 91.78 89.27 94.10
 LR_uni_bi 84.83 92.27 89.08 94.41
 SVM_uni_bi 83.02 91.57 88.23 93.93
 LR_uni_bi_tri 83.01 91.50 88.00 93.88
 SVM_uni_bi_tri 78.21 89.45 85.20 92.19
W2V
 LR_word 53.14 75.07 71.60 82.05
 SVM_word 35.73 64.92 64.09 75.10
 LR_char 48.03 70.54 68.77 79.04
 SVM_char 26.30 58.86 60.37 71.64
 LR_comb 61.73 80.27 75.75 85.47
 SVM_comb 46.26 73.68 69.17 80.51
RoBERTa_embeddings
 LR_char 64.56 78.59 77.90 85.51
 SVM_char 51.30 75.24 71.86 82.45
 LR_comb 72.41 84.20 82.23 89.07
 SVM_comb 64.25 81.41 77.57 86.44
RoBERTa_finetune
 top_layer 4.31 40.59 69.56 80.32
 whole 83.39 93.87 98.65 99.55
  1. For BoW, _uni, _uni_bi and _uni_bi_tri mean unigram, unigram+bigram and unigram+bigram+trigram respectively. For W2V, _comb means concatenating character and word embeddings, while _char (_word) means merely character (word) embeddings. For RoBERTa_embeddings, _char means merely the RoBERTa-Mini embeddings, and _comb means concatenating the RoBERTa-Mini embeddings and W2V word embbeddings. For RoBERTa_finetune, whole and top_layer mean fine-tuning the whole network and only the top fully connected layer respectively