Skip to main content

Table 2 The corpora used to generate the embeddings

From: Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

CorporaSizeVocabulary sizeSizeVocabulary size
Out-of-domain (gen)2.89 GB1 040 0258.3 GB1 000 655
General medical (genMed)130 MB118 683176 MB168 500
EHR1.2 GB300 8251.1 GB286 986