Skip to main content

Table 1 Corpora statistics

From: When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification

 

HaoDaiFu

ChinaRe

# of documents

51,374

86,663

# of diseases

805

44

# of rare diseases

89

5

Vocabulary size

59,879

41,087

Average # of words/doc

27

30