Skip to main content

Table 1 Descriptive statistics of the datasets

From: Comparison of different feature extraction methods for applicable automated ICD coding

 

Fuwai

CodiEsp

Word

Character

Code

Word

Code

Token size

691,418

1,557,769

44,366

161,078

11,158

Vocabulary size

9130

1768

1532

14,885

2557

Average length

99.5

224.2

6.4

161.1

11.2