Skip to main content
Fig. 1 | BMC Medical Informatics and Decision Making

Fig. 1

From: Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records

Fig. 1

Main experiment steps (1) Split cohort into Training and Test sets; (2) Apply multiple correspondence analysis (MCA) to the Training set using all 15 potential cluster-generating features, results in 3 components; (3) Use 3 components derived in Step 2 from MCA analysis in k-means algorithm, results in k = 5 clusters; (4) Split Training set into a decision tree classifier (DTC) Training and DTC Test set to predict cluster labels obtained from k-means algorithm; (5) Train and validate DTC; (6) Apply DTC to Test set to predict cluster labels; (7) Apply MCA to Test set as in Step 2, results in 3 components; (8) Use 3 components derived in Step 7 from MCA analysis in k-means algorithm, results in k = 5 clusters; (9) Compare cluster assignments in Test set from Steps 6 and 8 by calculating the Jaccard Index (% of patients overlapping in the same cluster between the two solutions)

Back to article page