Skip to main content

Advertisement

Table 1 Summary of Testing and Training Data Available for Algorithm Development

From: Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes

Smoking status1 i2b2 Local EHR
Smoking Status Smoking Status Pack years Cessation Date
Train n = 398 Test n = 104 Train N = 533 Test N = 223 Train N = 84 Test N = 36 Train N = 54 Test N = 19
Never 66 16 117 51
Ever 80 25 139 64 84 26 54 19
Former 36 11 71 30 38 12 54 19
Current 35 11 58 31 39 23
Smoker 9 3 10 3 7 1
Unknown 252 63 277 108
  1. Distribution of annotations for smoking status, pack years, and cessation date for the training and testing data from the i2b2 Challenge and our local EHR. Smoking status was determined by a manual review, with notes classified as: Never smoker, former smoker, current smoker, smoker temporality unknown (referred to as smoker), or no smoking status information (referred to as unknown). For the local EHR pack year and cessation date counts, we indicate the number of notes for which this information was identified by manual review