Skip to main content

Table 2 BioNLP corpora in Spanish

From: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Corpus Text type and size Annotated entities (count)
MultiMedica [27] Technical/popularizing texts; 4204 in Spanish, >4M tokens No entities annotated, only part-of-speech
MANTRA corpus [28] Multilingual; in Spanish, texts from EMA (100; 1961 tokens) & Medline (100; 1087 tokens) UMLS semantic types and CUIs; 5530 total annotations (756 in Spanish)
IxaMedGS [29] 75 clinical reports (41 633 tokens) Disease (2766), Drug (1191) and adverse drug reactions relations (228)
Spanish ADR [30] 397 texts from ForumClinic               (26 519 tokens) Drugs (187) and adverse drug reactions (636)
Drug Semantics [31] 30 texts from Summaries of Product Characteristics                      (226 729 tokens) Disease (724), Drug (657), Measurement (557), Excipient (66), Composition (62), Dose Form (45), Route (42), Medicament (37), Food (31), Therapeutic Action (20)
IULA-SCRC [32] 3194 sentences from 300 anonymized clinical records Body part (7), Substance (14), Finding (1064), Procedure (93), Negation (1207)
Cotik et al. [33] 513 radiology reports Anatomy (4398), Finding (2637), Location (722), Measure (3210), Texture (1890), Measure Type (1127), Negation (1207), Uncertainty (109), Abbreviation (880), Temporal (35), Multiword (788); 9 relation types (10 987)
BARR2 [34] 3563 report cases            (1 433 685 tokens) Abbreviations, acronyms and expanded terms (9552 annotations)
SPACCC [35] 1000 clinical cases published in journals from SciELO (396 988 tokens) PharmaCoNER: Proteins (3009), Normalizable to SNOMED CT (4398), Not-normalizable (50), Unclear (167). CODIESP: 18 483 ICD-10 codes
eHealth Discovery 1173 Spanish health-related sentences from MedlinePlus Entities (7188), Roles (3586) and 4 types of relations (2339)
NUBes [41] 29 682 sentences from 7019 anonymized EHRs Negation (7567 sentences) and Speculation (2219 sentences)
CWLC [42] 1912 sentences (36 157 tokens) from 900 referrals 9029 entities (Symptom, Diagnostic, Therapeutic or Laboratory Procedure, Family Member, Disease, Body part, Medication, Result, Abbreviation), 385 attributes (5 types), 284 relations