Skip to main content

Table 9 Results of 10-fold cross validation on both datasets for Research Hypothesis and New Knowledge

From: Identification of research hypotheses and new knowledge from scientific literature

   

P

R

F1

GENIA-MK

Majority Baseline

New Knowledge

0.000

0.000

0.000

  

Other knowledge

0.659

1.000

0.794

  

Average

0.329

0.500

0.397

  

Hypothetical

0.000

0.000

0.000

  

Non-Hypothetical

0.947

1.000

0.973

  

Average

0.473

0.500

0.486

 

Rule-based Baseline

New Knowledge

0.580

0.767

0.660

  

Other knowledge

0.855

0.712

0.777

  

Average

0.717

0.739

0.719

  

Hypothetical

0.054

0.077

0.063

  

Non-Hypothetical

0.947

0.924

0.936

  

Average

0.500

0.500

0.499

 

Random Forest

New Knowledge

0.863

0.920

0.891

  

Other knowledge

0.823

0.719

0.767

  

Average

0.843

0.819

0.829

  

Hypothetical

0.928

0.762

0.836

  

Non-Hypothetical

0.987

0.997

0.992

  

Average

0.958

0.880

0.914

EU-ADR

Majority Baseline

New Knowledge

0.644

1.000

0.784

  

Other knowledge

0.000

0.000

0.000

  

Average

0.322

0.5

0.392

  

Hypothetical

0.000

0.000

0.000

  

Non-Hypothetical

0.939

1.000

0.968

  

Average

0.469

0.500

0.484

 

Random Forest

New Knowledge

0.853

0.921

0.884

  

Other knowledge

0.831

0.692

0.748

  

Average

0.842

0.807

0.816

  

Hypothetical

1.00

0.533

0.668

  

Non-Hypothetical

0.970

1.00

0.9848

  

Average

0.985

0.767

0.827

  1. We report precision (P), recall (R) and F1-score. In each major row below, the first two sub-rows represent the macro average of 10-fold cross validation on each class. The third sub-row represents the average of the two classes above it. We have included a majority class baseline below for comparison. This was calculated by assigning every event to the majority class and then calculating the results of precision, recall and F1 score. The majority class is the negative class for both New Knowledge and Hypothesis in the GENIA-MK corpus. In the EU-ADR corpus, the majority class is the positive class for New Knowledge and the negative class for Hypothesis. In addition, we include results for the rule-based baseline from Thompson et al. [28], as described previously