Process | Variables (+Target Classes) | Patients (N) | |
---|---|---|---|
First CRC Dataset | 142 (+ 1) | 1511 | |
Chart Review | 1) Check extraction method and location | 142 (+ 1) | 1508 |
2) Check for inappropriate data | 142 (+ 1) | 1496 | |
3) Select priority variables (First Processed CRC Dataset) | 40 (+ 1) | 1496 | |
Data Preprocessing | 1) Drop redundant variables | 37 (+ 1) | 1496 |
2) Drop variables including 90% ↑ missing values | 32 (+ 1) | 1496 | |
3) Drop instances containing missing values | 32 (+ 1) | 1169 | |
4) One-hot encoding (Final CRC Dataset) | 54 (+ 5) | 1169 | |
Data Split | 1) Data split (training/testing) | 54 (+ 5) | 935 / 234 |