Skip to main content

Table 3 Performance of de-identification on simulated data

From: CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Mutator type True positives False positives False negatives Precision Recall
Character substitution (3%) 8 191 538 391 93.9 95.5
Character substitution (10%) 7 740 447 826 94.6 90.4
Character substitution (20%) 6 969 271 1 537 96.3 82
Address Alias Substitution 8 171 486 455 94.4 94.8
Address Token Removal 2 761 99 237 96.6 92.1
OCR (3% char. sub. 3% white space 8 464 160 1555 98.2 84.5
OCR (10% char. sub. 10% white space 5 327 180 7282 96.8 42.3
OCR (20% char. sub. 20% white space 1 802 151 14719 92.3 11.0