Skip to main content
Figure 3 | BMC Medical Informatics and Decision Making

Figure 3

From: Improved de-identification of physician notes through integrative modeling of both public and private medical text

Figure 3

Classifier results. Lexical features include part of speech, capitalization usage, and token length. Frequency features refer to the token frequency across 10,000 medical journal publications. UMLS features refer to the number of matches for each token or phrase in ten medical dictionaries. Known PHI features include the US Census List and the patterns previously provided by Beckwith et al. Baseline classifier utilizes all feature groups using the J48 algorithm. Boosting used ten iterations of the Adaboost method. The false positive filter used in the final score to address potential false positives created during the boosting process.

Back to article page