Skip to main content

Table 4 Relation extraction results over Variome corpus relations with at least 100 examples, based on 10-fold cross-validation and assuming gold-standard entity annotations

From: Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts

Relation N Baseline PKDE4J
   P R F P R F
Disease-has-ConceptIdeas 431 0.704 0.922 0.764 0.799 0.746 0.781
Disease-has-Physiology 188 0.752 0.873 0.788 0.885 0.684 0.806
Disease-has-Disorders 349 0.754 0.884 0.793 0.773 0.713 0.752
Disease-relatedTo-BodyPart 445 0.763 0.853 0.791 0.794 0.666 0.746
Mutation-relatedTo-Disease 126 0.702 0.986 0.777 0.683 0.973 0.758
Gene-has-Physiology 180 0.844 0.897 0.861 0.857 0.723 0.807
Gene-has-Mutation 538 0.835 0.866 0.845 0.910 0.569 0.758
Cohort-has-Mutation 307 0.873 0.921 0.888 0.909 0.726 0.839
Cohort-has-Disease 717 0.715 0.813 0.745 0.865 0.654 0.781
Cohort-has-Size 669 0.857 0.793 0.835 0.910 0.736 0.844
Cohort-has-Disorders 119 0.859 0.918 0.878 0.903 0.748 0.845
  1. P = Precision, R = Recall, F = F0.5 score (weighting P more than R due to the importance of Precision). The {P,R}-Base results refer to results from a simple co-occurrence baseline. The best F score (F-Base or F-PKDE4J) is bolded