Skip to main content

Table 1 EBM and CT corpora

From: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Corpus Text type and size Annotations (count)
NICTA-PIBOSO [14] 10 000 sentences from 1000 MEDLINE abstracts Sentences classified in the PIBOSO model: Population (812), Intervention (690), Background (2557), Outcome (4523), Study design (233) and Other (1564)
Deléger et al. [16] 52 FDA labels (96 675 tokens), 3503 clinical notes (>1M tokens) and CTAs (241 annotated with drugs, 51 793 tokens; 3000 annotated with disorders/symptoms, 647 246 tokens) Disease and symptoms (12 388), medications and drug attributes (74 507)
EBM corpus [17] Clinical Enquiries section from the Journal of Family Practice, and excerpts from PubMed Medical questions (456), bottom-line answers (1396), justifications (3036); these are matched to 2908 abstracts
EBM-NLP [18] 5000 abstracts about clinical trials from PubMed (>1M tokens) Entities corresponding to PICO elements (counts not reported)
Evidence Inference corpus [19] More than 10 137 evidence questions (prompts) matched to 2419 PubMed articles about RCTs Intervention results significantly increase (2428), significantly decrease (4470) or show no significant difference (3239)
EBMSASS [22] 1000 pairs of sentences of clinical evidence Elements from the PIBOSO model (200 pairs for each class)
Koroleva et al. [20] Sentences from clinical trial studies in PubMed Central Outcomes: Primary (2000 sentences) and Reported (1940)
Chia [24] 1000 texts from ClinicalTrials.gov (12 409 elibility criteria) 15 entity types (41 487) and 12 different relationships (25 017)