Skip to main content
Figure 2 | BMC Medical Informatics and Decision Making

Figure 2

From: Improved de-identification of physician notes through integrative modeling of both public and private medical text

Figure 2

Phases of the Scrubber annotation pipeline. Lexical Phase: split document into sentences, tag part of speech for each token. Frequency Phase: calculate term frequency with and without part of speech tag. Dictionary Phase: search for each word/phrase in ten medical dictionaries. Known PHI Phase: match US census names and textual patterns for each PHI type.

Back to article page