Skip to main content

Table 4 Classification performance results for diseases of interest: Influenza, Diabetes, Pneumonia and HIV

From: Automatic classification of diseases from free-text death certificates for real-time surveillance

   

(a) Rule-based

    

Disease

Precision

Recall

F-measure

Confusion matrix

    

Classifier

Ground truth

    

-

+

  

Influenza

0.94

0.89

0.92

68430

2

-

 
    

4

34

+

Influenza

Pneumonia

0.98

0.97

0.97

59351

215

-

 
    

274

8630

+

Pneumonia

Diabetes

0.98

0.96

0.97

62,519

100

-

 
    

212

5639

+

Diabetes

HIV

0.93

0.85

0.89

68,373

6

-

 
    

14

77

+

HIV

Macro-averagea

0.94

0.96

0.95

    

Micro-averageb

0.98

0.98

0.98

    
   

(b) Machine learning

    

Disease

Precision

Recall

F-measure

Confusion matrix

    

Classifier

Ground truth

    

-

+

  

Influenza

0.84

0.95

0.89

68425

7

-

 
    

2

36

+

Influenza

Pneumonia

0.98

0.97

0.97

59364

202

-

 
    

279

8625

+

Pneumonia

Diabetes

0.98

0.99*

0.99*

62522

97

-

 
    

72

5779

+

Diabetes

HIV

0.91

0.96

0.93

68370

9

-

 
    

4

87

+

HIV

Macro-average

0.93

0.97

0.94

    

Micro-average

0.98

0.98

0.98

    
  1. aMacro-average is the mean of the precision, recall, and f-measure values from the four classes above
  2. bMicro-average aggregates the values from the confusion matrix for all the classes and calculates the measures over all the data
  3. Statistically significant differences between rules and machine learning as measured with a two-tailed z-test are marked with *, representing p<0.05