Skip to main content

Table 1 Variable selection methods to be used

From: Quad-phased data mining modeling for dementia diagnosis

Variable Selection Method Definition Selection Criterion
Chi-square Test (univariate) \( {\chi}^2={\displaystyle \sum_j}\frac{{\left({O}_j-{E}_j\right)}^2}{E_j} \)
Oj is the observed frequency and Ej is the expected frequency of class j
p value < 0.05
Decision Tree CHAID (based on Chi − square Test) Importance > 0.001
CART \( \begin{array}{l} Entropy(t)=-{\displaystyle \sum_j} p\left( j\Big| t\right) \log p\left( j\Big| t\right)\hfill \\ {} GAI{N}_{split}= Entropy(p)-\left({\displaystyle \sum_{i=1}^k}\frac{n_i}{n} Entropy(i)\right)\hfill \end{array} \) Importance > 0.001
C4.5 \( \begin{array}{l}\begin{array}{l} GIN I(t)=1-{\displaystyle \sum_j}{\left[ p\left( j\Big| t\right)\right]}^2\hfill \\ {} p\left( j\Big| t\right)\ is\ t he\ relative\ frequency\ of\ class\ j\ at\ node\ t\hfill \end{array}\\ {} GIN{I}_{split}={\displaystyle \sum_{i=1}^k}\frac{n_i}{n} GIN I(i)\end{array} \) Importance > 0.001
Logistic Regression LR (1) \( F(x)=\frac{1}{1+ exp\left({\beta}_0+{\beta}_1{x}_1 \dots {\beta}_n{x}_n\right)} \) p value < 0.05
LR (1) p value < 0.01
  1. * Note that the importance in selection criterion in Decision Tree is different from the aforementioned ‘importance’. The former is simply the weights imposed on a largely contributing variable for classification of sample with growth of the tree