Skip to main content

Table 13 Analysis of annotated entities (mean ±standard deviation) per label

From: A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

  ANAT CHEM DISO PROC
Mean tokens 1.20 (±0.57) 1.33 (±0.91) 2.06 (±1.35) 2.20 (±1.83)
Mean characters 9.32 (±5.09) 11.53 (±6.88) 16.74 (±10.28) 18.31 (±13.55)
Coordination 0.28% (±5.31) 0.15% (±3.89) 2.38% (±15.24) 3.86% (±19.27)
Has hyphen 0.30% (±5.44) 4.66% (±21.08) 2.88% (±16.72) 2.04% (±14.12)
Has numerals 0.37% (±6.08) 6.97% (±25.47) 3.24% (±17.70) 2.13% (±14.45)
Has punctuation 0.03% (±1.72) 0.18% (±4.29) 0.80% (±8.89) 3.48% (±18.34)
Has stop words 1.72% (±13.02) 6.35% (±24.39) 18.33% (±38.69) 22.26% (±41.60)
Uppercase 3.67% (±18.81) 13.65% (±34.33) 10.55% (±30.73) 10.46% (±30.60)