Skip to main content

Table 2 Dimension of feature sets using different data representations

From: Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

Dimension of the feature set

iDASH

MGH

Bag-of-words (Vocabulary size)

8704

145,991

UMLS concepts

4751

25,457

UMLS concepts restricted to five semantic groups

4532

24,458

UMLS concepts restricted to 15 semantic types

3635

18,521

Bag-of-words + UMLS concepts

13,455

171,448

Bag-of-words + UMLS concepts restricted to five semantic groups

13,236

170,449

Bag-of-words + UMLS concepts restricted to 15 semantic types

12,339

164,512

Paragraph vector (distributed memory model)

600

600