Skip to main content

Table 1 Corpora statistics

From: When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification

  HaoDaiFu ChinaRe
# of documents 51,374 86,663
# of diseases 805 44
# of rare diseases 89 5
Vocabulary size 59,879 41,087
Average # of words/doc 27 30