Figure 2From: Improved de-identification of physician notes through integrative modeling of both public and private medical text Phases of the Scrubber annotation pipeline. Lexical Phase: split document into sentences, tag part of speech for each token. Frequency Phase: calculate term frequency with and without part of speech tag. Dictionary Phase: search for each word/phrase in ten medical dictionaries. Known PHI Phase: match US census names and textual patterns for each PHI type.Back to article page