Skip to main content

Table 2 Workflow for SVM analyses

From: Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines

HBV

 Extract 9170 individuals with HBV recorded of which 172 positive, 8998 negative

 Split data into training (70%) and testing (30%) with 120 positive and 6300 negative in each split

 Either

Downsize the training data into 52 sets of 120 positive plus 120 negative

 Or

SMOTE the training data 400% oversampling and 100% under sampling leading to 52 sets of 3960 individuals with 1920 positive, 2040 negative

 Or

Multiply downsize the training data into 11 sets of 120 positive and 120 negative

 Then either

grow a random forest and pick the top five variables, apply SVM with the top five variables from the random forest

 Or

proceed straight to SVM

HCV

 Extract 7820 individuals with HCV recorded with 533 positive, 7287 negative

 Split data into training (70%) and testing (30%) with 373 positive and 5100 negative in each split

 Either

Downsize the training data into 13 sets of 373 positive, 373 negative

 Or

SMOTE the training data at 400% oversampling and 100% under sampling leading to 13 sets of 4797 individuals with 1492 positive, 1865 negative

 Or

Multiply downsize the training data into 11 sets of 373 positive and 373 negative

 Then either

grow a random forest and pick the top five variables, apply SVM with the top five variables from the random forest

 Or

proceed straight to SVM