Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification

Table 4 Average specificity, sensitivity and F-measure attained by applying different classification methods to the task of CKD severity level assignment to patients’ records. Results are shown for classifiers developed based on random-forests (RF): Our hierarchical meta-classifier (Hier-MC), simple meta-classifier – without employing hierarchical stage partitioning (MC), Random under-sampling (Under-Sampling), Over-sampling using SMOTE (SMOTE) and a simple random forests baseline classifier (Baseline). Classifiers were trained on office visit records gathered during the period 2007–2014, while records from 2015 were used as the test set. All patient records were represented using the set of 455 features. The highest value for each measure is shown in boldface. Std. deviation is shown in parentheses. See Fig. 3 for detailed analysis of performance per stage

Methods	Sensitivity	Specificity	F-measure
RF-Hier-MC (Our Method)	.93 (0.02)	.97 (0.02)	.93 (0.02)
RF-MC	.90 (0.04)	.85 (0.04)	.78 (0.04)
RF-Under-Sampling	.83 (0.08)	.91 (0.07)	.83 (0.08)
RF-SMOTE	.92 (0.06)	.95 (0.06)	.92 (0.06)
RF-Baseline	.92 (0.02)	.94 (0.02)	.92 (0.02)

ISSN: 1472-6947