Corpus | Text type and size | Annotated entities (count) |
---|---|---|
MultiMedica [27] | Technical/popularizing texts; 4204 in Spanish, >4M tokens | No entities annotated, only part-of-speech |
MANTRA corpus [28] | Multilingual; in Spanish, texts from EMA (100; 1961 tokens) & Medline (100; 1087 tokens) | UMLS semantic types and CUIs; 5530 total annotations (756 in Spanish) |
IxaMedGS [29] | 75 clinical reports (41 633 tokens) | Disease (2766), Drug (1191) and adverse drug reactions relations (228) |
Spanish ADR [30] | 397 texts from ForumClinic (26 519 tokens) | Drugs (187) and adverse drug reactions (636) |
Drug Semantics [31] | 30 texts from Summaries of Product Characteristics (226 729 tokens) | Disease (724), Drug (657), Measurement (557), Excipient (66), Composition (62), Dose Form (45), Route (42), Medicament (37), Food (31), Therapeutic Action (20) |
IULA-SCRC [32] | 3194 sentences from 300 anonymized clinical records | Body part (7), Substance (14), Finding (1064), Procedure (93), Negation (1207) |
Cotik et al. [33] | 513 radiology reports | Anatomy (4398), Finding (2637), Location (722), Measure (3210), Texture (1890), Measure Type (1127), Negation (1207), Uncertainty (109), Abbreviation (880), Temporal (35), Multiword (788); 9 relation types (10 987) |
BARR2 [34] | 3563 report cases (1 433 685 tokens) | Abbreviations, acronyms and expanded terms (9552 annotations) |
SPACCC [35] | 1000 clinical cases published in journals from SciELO (396 988 tokens) | PharmaCoNER: Proteins (3009), Normalizable to SNOMED CT (4398), Not-normalizable (50), Unclear (167). CODIESP: 18 483 ICD-10 codes |
eHealth Discovery | 1173 Spanish health-related sentences from MedlinePlus | Entities (7188), Roles (3586) and 4 types of relations (2339) |
NUBes [41] | 29 682 sentences from 7019 anonymized EHRs | Negation (7567 sentences) and Speculation (2219 sentences) |
CWLC [42] | 1912 sentences (36 157 tokens) from 900 referrals | 9029 entities (Symptom, Diagnostic, Therapeutic or Laboratory Procedure, Family Member, Disease, Body part, Medication, Result, Abbreviation), 385 attributes (5 types), 284 relations |