Skip to main content

Table 4 Completeness and correctness before and after data cleaning (top 10 variables with the largest volume)

From: An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

Test name

Completeness: percentage of missing values (%)

Correctness: percentage of normal values (%)

Number of observations

Original

After preprocessing

After unit change

After all steps

Original

After all steps

Hemoglobin

0

0.02

0.02

0.03

92.68

92.71

1,061,333

Lymphocyte

0

0.03

0.03

0.04

14.23

54.10

1,060,664

Eosinophils

0

0.03

0.03

3.87

15.71

96.43

1,055,109

Monocyte

0

0.03

0.03

0.26

14.12

69.93

1,053,768

Basophil

0

0.06

0.06

0.06

17.91

73.09

1,027,615

Hematocrit

0

0.02

0.02

0.02

94.02

97.17

1,025,484

Erythrocyte

0

0.08

0.08

0.09

70.61

74.82

1,025,399

Leukocyte

0

0.10

0.10

9.70

34.04

98.21

1,013,012

MCV

0

0.03

0.03

0.24

92.53

92.80

1,003,326

MCH

0

0.03

0.03

0.24

84.16

84.42

997,867