Skip to main content

Table 1 EBM and CT corpora

From: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Corpus

Text type and size

Annotations (count)

NICTA-PIBOSO [14]

10 000 sentences from 1000 MEDLINE abstracts

Sentences classified in the PIBOSO model: Population (812), Intervention (690), Background (2557), Outcome (4523), Study design (233) and Other (1564)

Deléger et al. [16]

52 FDA labels (96 675 tokens), 3503 clinical notes (>1M tokens) and CTAs (241 annotated with drugs, 51 793 tokens; 3000 annotated with disorders/symptoms, 647 246 tokens)

Disease and symptoms (12 388), medications and drug attributes (74 507)

EBM corpus [17]

Clinical Enquiries section from the Journal of Family Practice, and excerpts from PubMed

Medical questions (456), bottom-line answers (1396), justifications (3036); these are matched to 2908 abstracts

EBM-NLP [18]

5000 abstracts about clinical trials from PubMed (>1M tokens)

Entities corresponding to PICO elements (counts not reported)

Evidence Inference corpus [19]

More than 10 137 evidence questions (prompts) matched to 2419 PubMed articles about RCTs

Intervention results significantly increase (2428), significantly decrease (4470) or show no significant difference (3239)

EBMSASS [22]

1000 pairs of sentences of clinical evidence

Elements from the PIBOSO model (200 pairs for each class)

Koroleva et al. [20]

Sentences from clinical trial studies in PubMed Central

Outcomes: Primary (2000 sentences) and Reported (1940)

Chia [24]

1000 texts from ClinicalTrials.gov (12 409 elibility criteria)

15 entity types (41 487) and 12 different relationships (25 017)