Skip to main content

Table 1 Extrinsic evaluation of Word2vec embeddings with identical hyperparameters, but different training corpora

From: Supporting the classification of patients in public hospitals in Chile by designing, deploying and validating a system based on natural language processing

Training corpus

Vocabulary size (tokens)

ROC AUC

General dataset

57,112

0.94

Biomedical literature

183,766

0.90

General Spanish language

1,000,653

0.90