Skip to main content

Table 2 The corpora used to generate the embeddings

From: Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

 

Swedish

Spanish

Corpora

Size

Vocabulary size

Size

Vocabulary size

Out-of-domain (gen)

2.89 GB

1 040 025

8.3 GB

1 000 655

General medical (genMed)

130 MB

118 683

176 MB

168 500

EHR

1.2 GB

300 825

1.1 GB

286 986