Abstracts | EudraCT | Total | |
---|---|---|---|
Texts | 500 | 700 | 1200 |
Sentences | 7160 | 11 995 | 19 155 |
M (SD) | 14.32 (±4.24) | 17.14 (±5.24) | 15.96 (±5.04) |
Annotated sentences | 5444 | 8607 | 14 051 |
M (SD) | 10.89 (±3.00) | 12.29 (±4.63) | 11.71 (±4.09) |
Tokens | 141 245 | 150 928 | 292 173 |
M (SD) | 282.49 (±70.21) | 215.61 (±69.38) | 243.48 (±77.11) |
Entities | 20 031 | 26 668 | 46 699 |
M (SD) | 40.06 (±13.67) | 38.10 (±14.39) | 38.92 (±14.12) |
Nested entities | 2613 (13.04%) | 3914 (14.68%) | 6527 (13.98%) |
Normalized | 13 627 | 19 382 | 33 009 |
to UMLS CUIs | (68.03%) | (72.68%) | (70.68%) |