Skip to main content

Table 4 Types of features used in training the Knowledge Type classification model

From: Identification of research hypotheses and new knowledge from scientific literature

Feature type

Features

Sentence

SE1: length in words; SE2: length in characters; SE3: mean number of characters per word; SE4: median number of characters per word; POS tag ratios (SE5: noun-to-verb, SE6: noun-to-adjective, SE7: noun-to-adverb, SE8: verb-to-adjective, SE9: verb-to-adverb; SE10: adjective-to-adverb)

Structural

ST1: whether any participant is an event; ST2: the sentence number containing this event; ST3: whether this event is a participant in another event; ST4: whether the event is a noun phrase; ST5: whether the event is an instance of “regulation”; ST6: total number of themes; ST7: total number of causes

Participant

PA1: POS tag of the first participant; PA2: POS tag of the first cause; PA3: whether any theme is an event; PA4: whether any cause is an event; PA5: POS tag of the word in a governing dependency over the theme; PA6: POS tag of the word in a governing dependency over the cause

Lexical

L1: distance between nearest clue and event trigger; L2: whether sentence contains at least one clue; L-N which clues (in a precompiled list) are matched within the sentence; features of matched clue (L3: surface form, L4: POS tag, L5: position relative to trigger, L6: whether in auxiliary form); L7: whether trigger contains a cue; features of nearest clue (L8: tense, L9: aspect, L10: voice); L11-L15: whether clue usually occurs in the context of each Knowledge Type; L16: number of matched clues;

Constituency

Relationships between clue and event trigger (C1: s-commands, C2: vp-commands, C3: np-commands); relationships between clue and any event participant (C4: s-commands, C5: vp-commands, C6: np-commands); C7: whether scope of any clue is within the same scope as the trigger

Dependency

Direct dependencies (D1: between clue and trigger, D2: between clue and any event participant); one-hop dependencies (D3: between clue and trigger, D4: between clue and any event participant); two-hop dependencies (D5: between clue and trigger, D6: between clue and any event participant)

Parse Tree

Distances: PT1: between theme and furthest leaf node; PT2: between cause and furthest leaf node; PT3: between theme and root node; PT4: between cause and root node

  1. A detailed explanation of each feature with examples is given in the Additional files