Skip to main content

Advertisement

Table 3 Comparison of performance for heuristic and machine learning models tested on holdout data

From: Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections

Model Name AUC Score Accuracy (%) p-value** PPV NPV Sensitivity (%) Specificity (%) Relative Workload Reduction (%)
All Patients Pregnant Children < 11 Yrs   
Heuristic model (30 WBC/μl or 100 bacteria/μl)   63·92 NA 42.73 [± 0.51] 97.01 [±0.28] 95·70 [± 0·15] 85·9 [± 0·72] 91·5 [± 0·92] 52·10 [± 0·36] 39·06 [± 0·38]
Random Forest (Class weight - 1:20) 0·908 71·96 < 0.001 40.47 [± 0.54] 97.67 [± 0.25] 95·95 [± 0·23] 70·5 [± 2·14] 89·8 [± 1·49] 63·40 [± 0·54] 47·58 [± 0·39]
Neural Network 0·906 85·00 < 0.001 71.70 [± 0.46] 90.18 [± 0.50] 74·03 [± 0·64] 27·6 [± 5·74] 69·3 [± 3·38] 89·09 [± 0·29] 71·98 [± 0·35]
Neural Network (with resampling*) 0·904 79·35 < 0.001 57.66 [± 0.74] 95.54 [± 0.19] 90·60 [± 0·35] 56·6 [± 3·43] 84·8 [± 2·04] 75·16 [± 0·44] 57·33 [± 0·38]
XGBoost (Class weight - 1:20) 0·910 65·68 < 0.001 44.05 [± 0.74] 97.77 [± 0.13] 96·70 [± 0·18] 77·1 [± 1·65] 93·1 [± 1·13] 54·14 [± 0·61] 40·36 [± 0·38]
  1. [95% Confidence Interval]
  2. *Resampling (without replacement) at a ratio of 2:1 for positive samples to offset class imbalance
  3. ** p-values obtained by comparison to heuristic model by McNemar test