Quad-phased data mining modeling for dementia diagnosis

Table 1 Variable selection methods to be used

Variable Selection Method		Definition	Selection Criterion
Chi-square Test (univariate)		\( {\chi}^2={\displaystyle \sum_j}\frac{{\left({O}_j-{E}_j\right)}^2}{E_j} \) O_j is the observed frequency and E_j is the expected frequency of class j	p value < 0.05
Decision Tree	CHAID	(based on Chi − square Test)	Importance > 0.001
	CART	\( \begin{array}{l} Entropy(t)=-{\displaystyle \sum_j} p\left( j\Big\| t\right) \log p\left( j\Big\| t\right)\hfill \\ {} GAI{N}_{split}= Entropy(p)-\left({\displaystyle \sum_{i=1}^k}\frac{n_i}{n} Entropy(i)\right)\hfill \end{array} \)	Importance > 0.001
	C4.5	\( \begin{array}{l}\begin{array}{l} GIN I(t)=1-{\displaystyle \sum_j}{\left[ p\left( j\Big\| t\right)\right]}^2\hfill \\ {} p\left( j\Big\| t\right)\ is\ t he\ relative\ frequency\ of\ class\ j\ at\ node\ t\hfill \end{array}\\ {} GIN{I}_{split}={\displaystyle \sum_{i=1}^k}\frac{n_i}{n} GIN I(i)\end{array} \)	Importance > 0.001
Logistic Regression	LR (1)	\( F(x)=\frac{1}{1+ exp\left({\beta}_0+{\beta}_1{x}_1 \dots {\beta}_n{x}_n\right)} \)	p value < 0.05
Logistic Regression	LR (1)		p value < 0.01

* Note that the importance in selection criterion in Decision Tree is different from the aforementioned ‘importance’. The former is simply the weights imposed on a largely contributing variable for classification of sample with growth of the tree

ISSN: 1472-6947