Skip to main content

Table 3 PHI category breakdown in gold standard corpus.

From: Automated de-identification of free-text medical records

PHI Type Original Count/Distribution Added PHI (Enrichment) Total Count/Distribution After Enrichment
Patient Name 34 (2.17%) 20 54 (3.04%)
Patient Name Initial 2 (0.13%) 0 2 (0.11%)
Relative/Proxy Name 125 (7.97%) 50 175 (9.84%)
Clinician Name 518 (33.04%) 75 593 (33.33%)
Date (not year) 475 (30.29%) 6 482 (27.09%)
Year 42 (2.68%) 4 46 (2.59%)
Location 328 (20.92%) 40 367 (20.63%)
Phone 37 (2.36%) 16 53 (2.98%)
Age over 89 4 (0.26%) 0 4 (0.22%)
Undefined 3 (0.19%) 0 3 (0.17%)
Total 1,568 211 1,779