Skip to main content

Table 3 Statistics comparing our versions of the GENIA-MK and EU-ADR corpora, both annotated with new knowledge and research hypothesis labels

From: Identification of research hypotheses and new knowledge from scientific literature

 

GENIA-MK

EU-ADR

Base type for annotations

Events

Relations

Number of annotations

6899

622

Number of abstracts

150

159

Number of new knowledge annotations

2356 (34.2%)

406 (65.3%)

Number of research hypothesis annotations

366 (5.31%)

38 (6.11%)

  1. The GENIA-MK corpus is much more densely annotated than the EU-ADR corpus, with over ten times more annotated events in the former than annotated relations in the latter. Research Hypotheses are particularly sparse in both corpora, constituting just over 5% of all annotated relations and events in each case. There is a disparity in the proportion of New Knowledge between the two corpora, in part because the EU-ADR corpus appeared to favour the annotation of relationships denoting New Knowledge