Skip to main content

Table 1 Descriptive statistics of the datasets

From: Comparison of different feature extraction methods for applicable automated ICD coding

  Fuwai CodiEsp
Word Character Code Word Code
Token size 691,418 1,557,769 44,366 161,078 11,158
Vocabulary size 9130 1768 1532 14,885 2557
Average length 99.5 224.2 6.4 161.1 11.2