Skip to main content

Table 2 Dataset changes due to chart review and data preprocessing

From: A data-driven approach to a chemotherapy recommendation model based on deep learning for patients with colorectal cancer in Korea

Process

Variables (+Target Classes)

Patients (N)

First CRC Dataset

142 (+ 1)

1511

Chart Review

1) Check extraction method and location

142 (+ 1)

1508

2) Check for inappropriate data

142 (+ 1)

1496

3) Select priority variables (First Processed CRC Dataset)

40 (+ 1)

1496

Data Preprocessing

1) Drop redundant variables

37 (+ 1)

1496

2) Drop variables including 90% ↑ missing values

32 (+ 1)

1496

3) Drop instances containing missing values

32 (+ 1)

1169

4) One-hot encoding (Final CRC Dataset)

54 (+ 5)

1169

Data Split

1) Data split (training/testing)

54 (+ 5)

935 / 234