Comparison of different feature extraction methods for applicable automated ICD coding

Table 4 Coding results for CodiEsp dataset with \(f_s = 180\)

Feature extraction & classifiers	Macro-F1 (%)	Micro-F1 (%)	Macro-AUC (%)	Micro-AUC (%)
BoW
LR_uni	63.55	63.68	70.44	70.04
SVM_uni	70.68	70.34	75.13	74.45
LR_uni_bi	63.93	63.85	72.36	71.08
SVM_uni_bi	72.46	72.27	77.22	75.95
LR_uni_bi_tri	62.41	62.26	71.39	69.97
SVM_uni_bi_tri	69.48	69.26	75.39	73.90
W2V
LR_word	56.07	56.07	64.39	64.62
SVM_word	59.52	59.63	66.86	67.11
BERT_embeddings
LR_word	64.00	63.90	69.33	68.61
SVM_word	59.15	59.02	64.29	64.11
LR_comb	61.26	60.91	66.32	65.85
SVM_comb	62.52	62.45	67.68	67.73
BERT_finetune
top_layer	17.21	22.19	48.79	49.40
whole	85.32	85.41	91.44	92.82

Aside from BERT_embeddings, the suffixes have the same meanings as those in Table 3. For BERT_embeddings, _word means merely the BERT-mini embeddings, and _comb means concatenating the BERT-mini embeddings and W2V word embbeddings

ISSN: 1472-6947