Creating sparser prediction models of treatment outcome in depression: a proof-of-concept study using simultaneous feature selection and hyperparameter tuning

Table 1 Common feature reduction approaches for supervised machine learning

Method	Description	Examples	Evaluation
Feature selection
Intrinsic/embedded methods	Feature selection is implemented into the learning algorithm and performed during training	Regularized regression models Decision trees	Computationally efficient Interconnected with learning algorithm No guarantee of optimal sparsity
Filter methods	Feature selection based on associations with target variable	Associations are calculated using, e.g., correlations or ANOVA; top N features (or N%) are retained for training	Computationally efficient Relations between features ignored Independent of learning algorithm
Wrapper methods	Selection of best performing subset of features	Recursive feature elimination Sequential forward selection	Extensive search over input feature space Interconnected with learning algorithm Consider relations between features Computationally expensive
Feature transformation
Projection into lower-dimensional feature space	Data are transformed and new features are created	Principal component analysis Multidimensional scaling Matrix factorization	Further methods of dimensionality reduction Alternative approaches to feature selection

ISSN: 1472-6947