Skip to main content

Table 4 Top 25 word stems for BOWs according to the variable importance extracted from scikit-learn’s ExtraTreesClassifier and stemmed using nltk’s SnowballStemmer [17, 22]

From: Word2Vec inversion and traditional text classifiers for phenotyping lupus

Rank

Word

VIMP

1

C3

0.0311

2

Sle

0.0225

3

Graviti

0.0172

4

Sole

0.0126

5

Phurin

0.0113

6

Epitheli

0.0084

7

C4

0.0065

8

Yet

0.0065

9

Lymph

0.0063

10

Hemlymph

0.0059

11

Educ

0.0059

12

Resolv

0.0054

13

912

0.0054

14

Fatigu

0.0050

15

Thrombocytopenia

0.0047

16

2500

0.0047

17

Need

0.0047

18

Naugl

0.0047

19

Clot

0.0043

20

Screen

0.0042

21

Antidoubl

0.0040

22

Beat

0.0040

23

Acut

0.0038

24

843identificationremov

0.0038

25

Pregnanc

0.0036

  1. A graph of the degradation of variable importance for these word stems can be found in Fig. 3