Natural language processing to identify lupus nephritis phenotype in electronic health records

Table 2 Model performance

Dataset	Algorithm	Sensitivity	Specificity	PPV	NPV	F Measure
NU (testing set)	Baseline	0.43	0.6	0.39	0.64	0.41
NU (testing set)	Regex + structured	0.49	0.93	0.81	0.76	0.61
NU (testing set)	Full MetaMap (binary)	0.63	0.92	0.82	0.81	0.71
NU (testing set)	Full MetaMap (counts)	0.6	0.95	0.88	0.80	0.71
NU (testing set)	MetaMap mixed	0.74	0.92	0.84	0.86	0.79
VUMC	Baseline	0.86	0.67	0.38	0.95	0.52
VUMC	MetaMap mixed	0.93	0.98	0.93	0.98	0.93

For logistic regression-based models, probability of 0.5 is used as the threshold for classification
Abbreviations: SLE systemic lupus erythematosus, NU Northwestern University, VUMC Vanderbilt University Medical Center, NLP natural language processing, PPV positive predictive value, NPV negative predicted value

ISSN: 1472-6947