From: Quad-phased data mining modeling for dementia diagnosis
Variable Selection Method | Definition | Selection Criterion | |
---|---|---|---|
Chi-square Test (univariate) | \( {\chi}^2={\displaystyle \sum_j}\frac{{\left({O}_j-{E}_j\right)}^2}{E_j} \) Oj is the observed frequency and Ej is the expected frequency of class j | p value < 0.05 | |
Decision Tree | CHAID | (based on Chi − square Test) | Importance > 0.001 |
CART | \( \begin{array}{l} Entropy(t)=-{\displaystyle \sum_j} p\left( j\Big| t\right) \log p\left( j\Big| t\right)\hfill \\ {} GAI{N}_{split}= Entropy(p)-\left({\displaystyle \sum_{i=1}^k}\frac{n_i}{n} Entropy(i)\right)\hfill \end{array} \) | Importance > 0.001 | |
C4.5 | \( \begin{array}{l}\begin{array}{l} GIN I(t)=1-{\displaystyle \sum_j}{\left[ p\left( j\Big| t\right)\right]}^2\hfill \\ {} p\left( j\Big| t\right)\ is\ t he\ relative\ frequency\ of\ class\ j\ at\ node\ t\hfill \end{array}\\ {} GIN{I}_{split}={\displaystyle \sum_{i=1}^k}\frac{n_i}{n} GIN I(i)\end{array} \) | Importance > 0.001 | |
Logistic Regression | LR (1) | \( F(x)=\frac{1}{1+ exp\left({\beta}_0+{\beta}_1{x}_1 \dots {\beta}_n{x}_n\right)} \) | p value < 0.05 |
LR (1) | p value < 0.01 |