Skip to main content

Table 1 Number of distinct vocabularies in each dataset

From: Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec

Percentage

# of vocabularies

# of pairs identified (sim)

# of pairs identified (rel)

10%

1,451,218

374 (66%)

368 (63%)

20%

2,339,000

406 (72%)

399 (68%)

30%

3,313,239

432 (76%)

429 (73%)

40%

3,961,051

447 (79%)

444 (76%)

50%

4,572,957

467 (83%)

465 (79%)

60%

5,319,879

479 (85%)

480 (82%)

70%

5,856,126

486 (86%)

487 (83%)

80%

6,369,803

486 (86%)

488 (83%)

90%

7,016,215

491 (87%)

494 (84%)

100%

7,797,722

491 (87%)

494 (84%)