Skip to main content

Table 3 PHI category breakdown in gold standard corpus.

From: Automated de-identification of free-text medical records

PHI Type

Original Count/Distribution

Added PHI (Enrichment)

Total Count/Distribution After Enrichment

Patient Name

34 (2.17%)

20

54 (3.04%)

Patient Name Initial

2 (0.13%)

0

2 (0.11%)

Relative/Proxy Name

125 (7.97%)

50

175 (9.84%)

Clinician Name

518 (33.04%)

75

593 (33.33%)

Date (not year)

475 (30.29%)

6

482 (27.09%)

Year

42 (2.68%)

4

46 (2.59%)

Location

328 (20.92%)

40

367 (20.63%)

Phone

37 (2.36%)

16

53 (2.98%)

Age over 89

4 (0.26%)

0

4 (0.22%)

Undefined

3 (0.19%)

0

3 (0.17%)

Total

1,568

211

1,779