Skip to main content

Table 1 Corpora statistics

From: Improving rare disease classification using imperfect knowledge graph

 

HaoDaiFu

ChinaRe

# of documents

51,374

86,663

# of classes (diseases)

805

44

Vocabulary size

59,879

41,087

Average # of words/doc

26.7

29.7

Average # of knowledge terms/doc

10.8

4.0

  1. A “knowledge terms” is a term appearing in medical knowledge graph (see “Acquiring knowledge features from KG entities” section)