HBV | |
 Extract 9170 individuals with HBV recorded of which 172 positive, 8998 negative | |
 Split data into training (70%) and testing (30%) with 120 positive and 6300 negative in each split | |
 Either | Downsize the training data into 52 sets of 120 positive plus 120 negative |
 Or | SMOTE the training data 400% oversampling and 100% under sampling leading to 52 sets of 3960 individuals with 1920 positive, 2040 negative |
 Or | Multiply downsize the training data into 11 sets of 120 positive and 120 negative |
 Then either | grow a random forest and pick the top five variables, apply SVM with the top five variables from the random forest |
 Or | proceed straight to SVM |
HCV | |
 Extract 7820 individuals with HCV recorded with 533 positive, 7287 negative | |
 Split data into training (70%) and testing (30%) with 373 positive and 5100 negative in each split | |
 Either | Downsize the training data into 13 sets of 373 positive, 373 negative |
 Or | SMOTE the training data at 400% oversampling and 100% under sampling leading to 13 sets of 4797 individuals with 1492 positive, 1865 negative |
 Or | Multiply downsize the training data into 11 sets of 373 positive and 373 negative |
 Then either | grow a random forest and pick the top five variables, apply SVM with the top five variables from the random forest |
 Or | proceed straight to SVM |