Skip to main content

Table 4 Relation extraction results over Variome corpus relations with at least 100 examples, based on 10-fold cross-validation and assuming gold-standard entity annotations

From: Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts

Relation

N

Baseline

PKDE4J

  

P

R

F

P

R

F

Disease-has-ConceptIdeas

431

0.704

0.922

0.764

0.799

0.746

0.781

Disease-has-Physiology

188

0.752

0.873

0.788

0.885

0.684

0.806

Disease-has-Disorders

349

0.754

0.884

0.793

0.773

0.713

0.752

Disease-relatedTo-BodyPart

445

0.763

0.853

0.791

0.794

0.666

0.746

Mutation-relatedTo-Disease

126

0.702

0.986

0.777

0.683

0.973

0.758

Gene-has-Physiology

180

0.844

0.897

0.861

0.857

0.723

0.807

Gene-has-Mutation

538

0.835

0.866

0.845

0.910

0.569

0.758

Cohort-has-Mutation

307

0.873

0.921

0.888

0.909

0.726

0.839

Cohort-has-Disease

717

0.715

0.813

0.745

0.865

0.654

0.781

Cohort-has-Size

669

0.857

0.793

0.835

0.910

0.736

0.844

Cohort-has-Disorders

119

0.859

0.918

0.878

0.903

0.748

0.845

  1. P = Precision, R = Recall, F = F0.5 score (weighting P more than R due to the importance of Precision). The {P,R}-Base results refer to results from a simple co-occurrence baseline. The best F score (F-Base or F-PKDE4J) is bolded