Skip to main content

Table 3 Performance of de-identification on simulated data

From: CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Mutator type

True positives

False positives

False negatives

Precision

Recall

Character substitution (3%)

8 191

538

391

93.9

95.5

Character substitution (10%)

7 740

447

826

94.6

90.4

Character substitution (20%)

6 969

271

1 537

96.3

82

Address Alias Substitution

8 171

486

455

94.4

94.8

Address Token Removal

2 761

99

237

96.6

92.1

OCR (3% char. sub. 3% white space

8 464

160

1555

98.2

84.5

OCR (10% char. sub. 10% white space

5 327

180

7282

96.8

42.3

OCR (20% char. sub. 20% white space

1 802

151

14719

92.3

11.0