Skip to main content

Table 13 Analysis of annotated entities (mean ±standard deviation) per label

From: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

 

ANAT

CHEM

DISO

PROC

Mean tokens

1.20 (±0.57)

1.33 (±0.91)

2.06 (±1.35)

2.20 (±1.83)

Mean characters

9.32 (±5.09)

11.53 (±6.88)

16.74 (±10.28)

18.31 (±13.55)

Coordination

0.28% (±5.31)

0.15% (±3.89)

2.38% (±15.24)

3.86% (±19.27)

Has hyphen

0.30% (±5.44)

4.66% (±21.08)

2.88% (±16.72)

2.04% (±14.12)

Has numerals

0.37% (±6.08)

6.97% (±25.47)

3.24% (±17.70)

2.13% (±14.45)

Has punctuation

0.03% (±1.72)

0.18% (±4.29)

0.80% (±8.89)

3.48% (±18.34)

Has stop words

1.72% (±13.02)

6.35% (±24.39)

18.33% (±38.69)

22.26% (±41.60)

Uppercase

3.67% (±18.81)

13.65% (±34.33)

10.55% (±30.73)

10.46% (±30.60)