Skip to main content

Application of multi-label classification models for the diagnosis of diabetic complications



Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy.


We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications.


The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application.


Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications.

Peer Review reports


Complications of diabetes are the leading cause of death in diabetic patients [1], with 76.4% of diabetic patients reporting at least one complication [2]. These include cardio- or cerebrovascular diseases, neuropathy, nephropathy, retinopathy, and foot disease [3], which compromise patient quality of life and bring an economic burden to the healthcare system [4]. Therefore, how to quickly and accurately diagnose and analyze diabetic complications is a topic worthy studying. Modern electronic health records (EHRs) [5, 6] is a rich resource for clinical data from which newer physical indicators can be identified as predictors of diabetic complications to assist in treatment planning. However, comprehensive analysis of such data remains a challenge.

Machine learning has great advantages when dealing with massive data with both high dimensional attributes and tremendous number of instances, which has been widely applied in disease prediction [7]. The binary classification model is typically adopted by most approaches [8,9,10,11,12,13,14] and shows promising results for the prediction of diabetic complications. However, each diabetic complication was modeled and predicted independently in these studies, making it impossible to leverage the potential correlations among diabetes complications.

Multi-label classification (MLC) models have shown great promise in text categorization, image classification, automatic annotation for multimedia content, bioinformatics, web mining, rule mining, information retrieval, tag recommendation, and other diverse fields [15].By modeling multiple labels simultaneously, multi-label learning can identify correlations among different labels more easily. Therefore, we sought to apply an MLC model to leverage the correlations among diabetic complications and further improve their predictive performance.

Related work

Compared to traditional medical studies with a hypothesis-driven perspective, [16,17,18] data mining based on data features is better suited for studying nonlinear interactions in large diabetes data sets and can more-accurately assess and predict disease risk. For predicting the outcomes of diabetes-related complications and death, the Risk Equations for Complications of Type 2 Diabetes equations [19, 20], were derived from the Action to Control Cardiovascular Risk in Diabetes randomized trial [21] and the UK Prospective Diabetes Study Outcomes Model 2 [22] that used data of 3,642 patients from the United Kingdom Prospective Diabetes Study. These Cox proportional hazards models have good discrimination and calibration, and they are used widely. However, they were developed based on clinical trial data with limited case numbers and potential study-specific biases. Conversely, machine learning predicts diabetes complications by using EHRs, a preferable data set with an abundance of valuable information describing a patient’s healthcare experience.

While the machine learning models [8,9,10,11,12,13,14]are widely used in predictions of diabetic complications, a more suitable method, multi-label learning, has been used rarely. Single-label methods predict diabetes complications separately based on whether or not one complication occurs. MLC is a machine learning method used in many clinical applications. A chronic disease risk dataset containing 110,300 patients, 62 symptoms, and six disease labels (hypertension, diabetes, fatty liver, cholecystitis, heart disease, and obesity) was transformed into a multiclass classification, and the MLC classifiers proved to have good performance [23]. To explore a diagnostic method that classifies and evaluates each psychotic disorder simultaneously, Folorunso et al. used an MLC and demonstrated that the MLC was more efficient than a previous study [24]. In the diabetes field, MLC has been applied to label the retinal image of normal/abnormal regions, patient age, ethnicity, race, and diabetic type for diabetic retinopathy differentiation and can improve retinal image classification [25].

Diabetic complications are highly related. Retinal vascular trait changes have been linked with cardiovascular disease [26, 27]and stroke [28] and structural variations in retinal vasculature can predict cardiovascular risk [29,30,31]. Additionally, Xu et al. found that microalbuminuria in type 2 diabetes is associated with both retinal vascular caliber and geometry, and that retinal and renal microvasculature share similar pathophysiological changes during early diabetes due to abnormal glucose metabolism and other processes [32]. Interactions among diabetes complications need to be identified and applied to facilitate early diagnoses. Since diabetes patients may suffer from multiple diabetic complications, the diagnosis of complications (labels) using indicators (attributes) is an MLC problem. These class/label variables usually exhibit conditionally dependent relationships among themselves, which must be modeled and learned. To frame the diabetic complication classification into an MLC problem, multi-label methods were first applied to predict diabetes complications from EHRs by Bai et al. The results indicated that random k-label sets and chained classifiers performed better than binary relevance, least combination, and pruned sets [33].We aimed to identify the best MLC model to predict diabetic complications and inform clinical decisions that could help personalize type 2 diabetes management.


Traditional methods utilized the binary classification model to predict each diabetic complication independently and failed to leverage the potential correlations among diabetic complications. Through statistical analysis of the data in EHRs, we aimed to: (1) find correlations between different diabetic complications and (2) leverage these correlations to improve the predictive performance via an MLC model.

Material and methods

Data source

The study used demographic characteristics and laboratory data obtained from EHRs data, from May 2013 to June 2020, for patients admitted to Changzhou No.2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China. This study was approved by the ethics committee of Changzhou No.2 People’s Hospital, the affiliated hospital of Nanjing Medical University. A total of 9765 adult patients with diabetes, were eligible for this study. The demographic variables of patients with diabetes are summarized in Table 1. Due to the absence of BMI values for some patients, Table 1 only displays BMI characteristics for 8,831 patients.

Table 1 Overview of demographic characteristics

Data preprocessing

There are total 141 indicators in the raw data of EHRs. However, there are 47 indicators that are missing in more than 95% patients. These indicators can be seen as useless, therefore we deleted them in our study. Besides, we combined the height and weight to be BMI, and finally 93 clinical parameters collected from EHRs are considered as possible risk factors (see Additional file 1: Table 1). The data assignment methods are shown in Additional file 1: Table 2. Besides, for indicators which have few missing values in some patients, we adopt the mean value and mode to fill the continuous and discrete indicators, respectively.

Multi-label classification models

In general, multi-label classification algorithms can be divided into two categories: problem transformation methods and algorithm adaptation methods [15]. As the name suggests, problem transformation methods tackle the multi-label classification problem by transforming it into other well-established learning problems. Representative algorithms include first-order approach Binary Relevance (BR) [34] and high-order approach Classifier Chains (CC) [35, 36] which transform the task of multi-label learning into the task of binary classification, second-order approach Calibrated Label Ranking (CLR) [37] which transforms the task of multi-label learning into the task of label ranking, and high-order approaches Random k-Label Sets (RAkEL) and Label Power Set (LP) [38] which transform the task of multi-label learning into the task of multi-class classification. Different from problem transformation methods, algorithm adaptation methods tackle multi-label learning problem by adapting some existing learning algorithms to the multi-label learning scenario directly. Representative algorithms include first-order approach Multi-Label k-Nearest Neighbor (ML-kNN) [39] which adapts the k-nearest neighbor learning algorithms, Multi-Label Decision Tree (ML-DT) [40] which adapts decision tree techniques, second-order approach Ranking Support Vector Machine (Rank-SVM) [41] which adapts kernel techniques, and second-order approach Collective Multi-Label Classifier (CML) [42] which adapts information-theoretic techniques. In this paper, we compare some representative methods which are detailly introduced in the following sections.

Binary relevance

The basic idea of binary relevance is to decompose the multi-label classification problem into multiple independent binary classification problems, where each binary classification problem corresponds to a possible label in the label space [34]. For class j, binary relevance method first constructs a binary training set by the following metric:

$$\begin{aligned} & {\mathcal{D}}_{j} = \left\{ {\left( {x_{i} ,\phi \left( {Y_{i} ,y_{j} } \right)} \right)\left| {1 \le i \le m} \right.} \right\} \\ { } & {\text{where }}\,\phi \left( {Y_{i} ,y_{j} } \right) = \left\{ {\begin{array}{*{20}c} { + 1,} & {{\text{if }}\,y_{j} \in Y_{i} } \\ { - 1,} & {{\text{otherwise}}} \\ \end{array} .} \right. \\ \end{aligned}$$

Then, a binary classifier for class j is built using existing binary classification algorithms to justify if the instance belongs to class j. Finally, total Q binary classifiers will be built where Q is the number of all possible classes. Therefore, in multi-label classification learning, the instance will be classified by Q binary classifiers. The final classification result for an instance is the combination of all binary classifiers.

Label power set

This method takes each unique subset (distinct label set) of labels that exists in multi-label dataset as a single label. Therefore, label power set introduces new sets of labels and transforms the multi-label classification into a multi-class classification task [38]. For each test instance, the classifier of label power set predicts it into one label which is originally a set of multiple labels in the multi-label dataset. Label power set is simple but it may introduce some even many classes which only have a few samples, and further aggravate the class imbalance.

Classifier chains

The basic idea of classifier chains is to transform the multi-label learning problem into a chain of binary classification problems, where subsequent binary classifiers in the chain are built upon the predictions of previous ones [35, 36]. For example, when building the j-th binary classifier, the input of the model will be

$$x_{i}^{*} = [x_{i} ,p_{1} ,p_{2} ,...,p_{j - 1} ]$$

where \(x_{i}\) is the original feature, \(p_{k}\) is the prediction of the kth binary classifier, and \(x_{i}^{*}\) is the concatenation of \(x_{i}\) and \(p_{1} ,p_{2} ,...,p_{j - 1}\). Then, the jth binary classifier will be built on this concatenated feature and produce the probability of this sample belonging to the jth class. For test instances, the associated label set is predicted by traversing the classifier chain iteratively. It is obvious that, for the classifier chain method, its effectiveness will be largely affected by the ordering of classes during building the classifier chains. Besides, classifier chains has the advantage of exploiting the label correlations while loses the opportunity of parallel implementation due to the chaining property [35, 36].

Calibrated label ranking

The basic idea of this algorithm is to transform the multi-label learning problem into the label ranking problem, where ranking among labels is fulfilled by techniques of pairwise comparison. For multi-label learning problem with Q classes, a total of Q(Q-1)/2 binary classifiers will be constructed, one for each label pair [37]. In detail, for each pair, we first construct a corresponding binary training set, where each instance has distinct relevance to the two labels in the label pair and only belongs to one label in the label pair. Then, some binary learning algorithms will be utilized to induce a binary classifier for each label pair. For testing, each instance will first be fed into the Q(Q−1)/2 learned binary classifiers to obtain the overall votes on each possible class label. And then the labels can be ranked according to their obtained votes.

Model evaluation metrics

In the literature, there is no generally adopted metrics for evaluating multi-label classification models. Therefore, several measures from multi-class classification and information retrieval were usually adopted and adapted to measure multi-label classification performance [43]. In our experiments, we used various evaluation measures that have been suggested by previous studies [43], which are defined in Table 2. The evaluation metrics can be divided into two types: example-based metrics and label-based metrics. In Table 2, \(y_{i}\) denotes the set of true labels of example \(x_{i}\); \(h\left( {x_{i} } \right)\) denotes the set of predicted labels for the same sample; \(\Delta\) stands for the symmetric difference between the two sets; N is the number of examples; Q is the total number of possible class labels; \(tp_{j}\) is the number of true positive for label \(j\); \(fp_{j}\) is the number of false positive for label \(j\); \(p_{j}\) and \(r_{j}\) are the precision and recall for class \(j\). It should be noted that, the accuracy and subset accuracy used here for multi-label classification are example-based measures and computed on the label set, which are different from the general accuracy used in binary or multi-class classification. Besides, the macro average metrics compute the metric individually for each class and then take an average, and the micro average metrics aggregate the contributions of all classes to compute the average metric. Following previous studies [40], we use both macro and micro metrics to evaluate the model.

Table 2 Evaluation metrics

Correlation analysis

The Pearson correlation coefficient (PCC) is a statistic that measures the linear correlation between two variables X and Y, with values between + 1 and − 1. A value of + 1 is a 100% positive linear correlation, 0 is no linear correlation, and − 1 is a 100% negative linear correlation. Pearson correlation coefficients were calculated using the EHRs raw data to analyze the statistical relationships among different diabetic complications, as shown in Fig. 1a. This motivated the introduction of an MLC model to leverage this correlation toward further predictions.

Fig. 1

a Pearson Correlation Coefficient; b Model Correlation Coefficient. Different complications share correlation to different extend. In a, a relatively correlation (coefficient > 0.1) is marked in deep orange while a relatively weak correlation (coefficient < 0.1) is marked in light orange. In b the coefficient below 0.001 (low correlation) is marked in light orange and the others marked in deep orange

The relationships among diabetic complications could also be evaluated using our models. In the CC model, the label of the previous class in the chain was used as an input to predict the current class. We used the weight (importance) of the previous class as the correlation coefficient of the two classes. The correlation coefficient matrix obtained in the CC model is presented in Fig. 1b.

The relative strength of both correlation coefficient matrices was surprisingly consistent (Fig. 1a, b), which strongly suggests the validity of our model. Our model further indicates that nephropathy, retinopathy, and peripheral vascular disease are all correlated with each other directly, except for retinopathy with neuropathy.


We compared the binary classification model with 4 MLC models under 11 different performance evaluation metrics. Then, we leveraged the receiver operating characteristic (ROC) curve and area under the curve (AUC) to further evaluate the performance of different models. Besides, we provided the top 10 key indicators for each complication. Finally, we provided a comparison between the correlation coefficient matrix derived from our model and statistical analysis of the EHRs’ raw data.

Default experimental setup

For all the experiments, we used classes 0, 1, 2, 3, and 4 to represent five different categories: (0) diabetes without complications, (1) retinopathy, (2) nephropathy, (3) neuropathy and (4) peripheral vascular disease. The whole dataset was randomly split into two disjoint subsets: training set (75%) and test set (25%). Then, five-fold cross validation was employed for all experiments to tune the hyperparameters within the training dataset. As for the base model, we mainly adopted Random Forest technique as the base model to conduct the classification task, which is usually taken as the base model in previous related studies and performed best in compared models [13], and is easy to implement. Besides, we also conducted experiments using XGBoost as the base model, whose results are displayed in Additional materials (see Additional file 1: Table 3 and Fig. 1). In addition, to alleviate the class imbalance problem, we set different weights for different categories according to the proportion of samples belonging to each category.

Machine learning experiment results

We conducted comparative experiments to evaluate the prediction introduced by the MLC models. The prediction results, shown in Table 3, were evaluated using 11 performance evaluation metrics: hamming loss, accuracy, f1 score, precision, recall, f1 micro, f1 macro, precision micro, precision macro, recall micro, and recall macro. As shown in Table 3, apart from the label power set (LP), all MLC models outperformed the binary classification model in most evaluation performance metrics. Among the four MLC models, the ensemble classifier chains (ECC) yielded the best performance for most metrics. The LP method performed the worst, which may be because it introduced many classes with only a few samples, which aggravated the class imbalance.

Table 3 Experiment results on 5 different models including 11 performance evaluation metrics

Analysis of area under the receiver operating characteristic curve

We leveraged the AUC and ROC curve to further evaluate the performance of different models for each diabetic complication. The ROC curve is a graphical representation for showing the trade-off between the recall/sensitivity/true positive rate and false-positive rate; the precision-recall curve is a graphical representation for showing the trade-off between precision and recall. The experimental results of four different models, binary relevance (BR), LP, classifier chains (CC), and ECC, are shown in Fig. 2a–d. It should be noted that the area under the receiver operating characteristic curve (AUROC) result for calibrated label ranking (CLR) was not available, as it could not provide the prediction probability. As shown in Fig. 2a–d, the ECC model yielded the highest AUROC, and all the MLC models, except LR, outperformed the BR model. It can be seen that nephropathy (class 2) received the highest AUROC among all diabetic complications, while retinopathy (class 1) yielded a relatively low AUROC.

Fig. 2

The experimental results of 4 different models, BR, LP, CC, and ECC. In each figure, the ROC curves of different complications are marked by different colors accordingly

Key features analysis

The 10 most correlated indicators given by the learned random forest model for the five categories of diabetic states are listed in Table 4, in which a higher ranking indicates a higher correlation coefficient.

Table 4 Top10 key features in prediction

The majority of indicators were consistent with known predictive factors, for example; age, levels of glucose, serum/urine creatinine (Cr), and lipids (Table 4). Several predictors were consistent with the latest medical research: fibrinogen, plasma albumin, hematocrit for nephropathy [44,45,46,47], and adenosine deaminase-2 for retinopathy [44,45,46,47]. These results suggest validation of our models. Since this is the first application of an MLC model to the prediction of diabetic complications, new indicators not identified in medical research were identified that suggest future studies for medical verification. For example, creatine phosphokinase (CPK), creatine kinase-MB, erythrocyte sedimentation rate, total carbon dioxide in the blood, international normalized ratio, and monocyte percentage for diabetes retinopathy; urinary microalbumin, urine Cr, serum Calcium (Ca), urine red blood cells, crystalluria, serum phosphorous (P), creatine kinase, and hyaline cast cholinesterase for diabetes neuropathy; and urine Cr, urinary microalbumin, serum P, serum Ca, urinary squamous epithelial cells, and serum Cr for peripheral vascular disease.


To our knowledge, this is the first study to characterize the risk of developing diabetic complications and assess the relationship of these complications in the Chinese population.

This is the first application of an MLC model to leverage the correlations among diabetic complications for better predictive performance. The multi-label model can simultaneously predict multiple factors, including neuropathy, nephropathy, retinopathy, and peripheral vascular disease. This type of multiple-complication screening is widely used in real-world diagnosis, since patients can have multiple simultaneous complications. Most of the MLC models outperformed the BR model, which was used in previous studies. With the MLC model, high predictive performance was achieved, evaluated under multiple evaluation metrics and AUC, as shown in Table 3 and Fig. 2a, b. The ECC model yielded the highest AUROC of 0.826 and led in most of the evaluation metrics. The experimental results further indicate that multi-label models are efficient for leveraging correlations among diabetes complications.

On analyzing the correlation between and the predictive factors of diabetes complications from pathological perspective. The novel factors identified in our models are consistent with recent studies. Adenosine deaminase-2 was identified as a predictor of diabetic retinopathy, consistent with the work of Elsherbiny et al. [48]. It is shown that fibrinogen [44] can be used as a predictor of end-stage renal disease in type 2 diabetes, and plasma albumin [45, 46] can be used as an indicator of the prognosis of renal disease. Robles et al. used hematocrit, urea, and sex to predict the progression of diabetic nephropathy [47]. Issar et al. reported that patients with chronic kidney disease had severe neuropathy phenotypes and shared nerve dysfunction features with CKD [49]. These evidences further proved the validity of our model. In addition, our multi-label model identified several novel predictors that were not found by previous risk assessment methods. For example, CPK, and cholinesterase as predictors of diabetic neuropathy have not been indicated previously. The renal injury-related factors (urinary microalbumin, urinary Cr, Ca, crystallization, and P) identified as diabetes neuropathy and peripheral vascular disease predictors were rarely identified in diabetic medical research. Thus, our study simultaneously suggests the correlation between nephropathy and neuropathy/ peripheral vascular disease. Besides, the predictors of retinopathy such as phosphocreatine kinase CPK, creatine kinase-MB mass, urinary microalbumin, and urinary microalbumin creatinine demonstrated a correlation between retinal vascular changes and heart [26, 27]renal injury [29, 30]. These results show the efficiency of our model for disease prevention and medical research assistance.

There are still several limitations to this study. First, this model is designed for diagnose diabetes and complications. It is disease nature that limits its application in non-diabetic patients. Second, we used large amounts of data, and most factors become statistically significant but clinically irrelevant. To impact clinical usage, medical knowledge and related researches are needed. Finally, as we assembled the data only from one hospital, this action limits the generalizability of our model to other group of people who are distinguished by some other essential predictors. Our primary intention was just to demonstrate the feasibility of the MLC model. Further investigation on multi-center study is in our horizon.


In this study, we demonstrated the correlations between different diabetic complications from two perspectives: statistical analysis (PCC) and machine learning (MLC). We illustrated that the MLC model is effective to leverage these correlations and outperforms the traditional binary classification model in predicting diabetic complications. This study is essential not only because we provided a better model in predicting diabetic complications but also because our model could be adapted and applied to leverage correlations among complications and predict outcomes in other complex diseases. In the future, we intend to extend our method to other diseases.

Availability of data and materials

The datasets generated and analyzed during the current study are not publicly available due to data privacy laws, but are available from the corresponding author on reasonable request.



Type 2diabetes mellitus


Electronic health record


Multi-label classification


Alanine aminotransferase


Aspartate aminotransferase


R glutamyl transferase






Creatine phosphokinase


Erythrocyte sedimentation rate


Creatine kinase-MB


PT international standardized ratio


Albumin-to-creatinine ratio


Blood urea nitrogen






Red blood cell

Apo A1:

Apolipoprotein A1

Apo B:

Apolipoprotein B


Urinary Squamous Epithelial Cells


Label Power Set


Classifier Chains


Ensemble Classifier Chains


Calibrated Label Ranking


Binary Relevance


  1. 1.

    An Y, Zhang P, Wang J, et al. Cardiovascular and all-cause mortality over a 23-year period among chinese with newly diagnosed diabetes in the da qing igt and diabetes study. Diabetes Care. 2015;38(7):1365–71.

    Article  Google Scholar 

  2. 2.

    Hu H, Sawhney M, Shi L, et al. A systematic review of the direct economic burden of type 2 diabetes in china. Diabetes Ther. 2015;6(1):7–16.

    Article  Google Scholar 

  3. 3.

    Liu Z, Fu C, Wang W, Xu B. Prevalence of chronic complications of type 2 diabetes mellitus in outpatients: a cross-sectional hospital based survey in urban China. Health Qual Life Outcomes. 2010;8:62.

    Article  Google Scholar 

  4. 4.

    Mao W, Yip CW, Chen W. Complications of diabetes in China: health system and economic implications. BMC Public Health. 2019;19(1):269.

    Article  Google Scholar 

  5. 5.

    Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.

    Article  PubMed  Google Scholar 

  6. 6.

    Preo N, Capobianco E. Significant EHR feature-driven t2d inference: predictive machine learning and networks. Front Big Data. 2019;2:30.

    Article  Google Scholar 

  7. 7.

    Lan K, Wang DT, Fong S, Liu LS, Wong K, Dey N. A survey of data mining and deep learning in bioinformatics. J Med Syst. 2018;42:139.

    Article  Google Scholar 

  8. 8.

    Belur Nagaraj S, Pena MJ, Ju W, Heerspink HL. Machine-learning-based early prediction of end-stage renal disease in patients with diabetic kidney disease using clinical trials data. Diabetes Obes Metab. 2020;22(12):2479–86.

    CAS  Article  Google Scholar 

  9. 9.

    Makino M, Yoshimoto R, Ono M, et al. Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Sci Rep. 2019;9(1):11862.

    Article  Google Scholar 

  10. 10.

    Song X, Waitman LR, Yu AS, Robbins DC, Hu Y, Liu M. Longitudinal risk prediction of chronic kidney disease in diabetic patients using a temporal-enhanced gradient boosting machine: retrospective cohort study. JMIR Med Inform. 2020;8(1):e15510.

    Article  Google Scholar 

  11. 11.

    Jonnagaddala J, Liaw ST, Ray P, Kumar M, Dai HJ, Hsu CY. Identification and progression of heart disease risk factors in diabetic patients from longitudinal electronic health records. Biomed Res Int. 2015;2015:636371.

    Article  Google Scholar 

  12. 12.

    Ogunyemi OI, Gandhi M, Tayek C. Predictive models for diabetic retinopathy from non-image teleretinal screening data. AMIA Jt Summits Transl Sci Proc. 2019;2019:472–7.

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Dagliati A, Marini S, Sacchi L, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12(2):295–302.

    Article  Google Scholar 

  14. 14.

    Kim E, Pieczkiewicz DS, Castro MR, Caraballo PJ, Simon GJ. Multi-task learning to identify outcome-specific risk factors that distinguish individual micro and macrovascular complications of type 2 diabetes. AMIA Jt Summits Transl Sci Proc. 2018;2017:122–31.

    PubMed  Google Scholar 

  15. 15.

    Zhang M, Zhou Z. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26:1819.

    Article  Google Scholar 

  16. 16.

    Ganz T, Wainstein J, Gilad S, Limor R, Boaz M, Stern N. Serum asymmetric dimethylarginine and arginine levels predict microvascular and macrovascular complications in type 2 diabetes mellitus. Diabetes Metab Res Rev. 2017;33(2):2017.

    Article  Google Scholar 

  17. 17.

    Zhao Y, Lin W, Li Z, et al. High expression of mannose-binding lectin and the risk of vascular complications of diabetes: evidence from a meta-analysis. Diabetes Technol Ther. 2015;17(7):490–7.

    CAS  Article  Google Scholar 

  18. 18.

    Miller RG, Costacou T, Orchard TJ. Risk factor modeling for cardiovascular disease in type 1 diabetes in the pittsburgh epidemiology of diabetes complications (EDC) study: a comparison with the diabetes control and complications trial/epidemiology of diabetes interventions and complications study (DCCT/EDIC). Diabetes. 2019;68(2):409–19.

    CAS  Article  Google Scholar 

  19. 19.

    Basu S, Sussman JB, Berkowitz SA, Hayward RA, Yudkin JS. Development and validation of risk equations for complications of type 2 diabetes (RECODe) using individual participant data from randomised trials. Lancet Diabetes Endocrinol. 2017;5(10):788–98.

    Article  Google Scholar 

  20. 20.

    Basu S, Sussman JB, Berkowitz SA, et al. Validation of risk equations for complications of type 2 diabetes (RECODe) using individual participant data from diverse longitudinal cohorts in the US. Diabetes Care. 2018;41(3):586–95.

    Article  Google Scholar 

  21. 21.

    Gerstein HC, Miller ME, Byington RP, et al. Effects of intensive glucose lowering in type 2 diabetes. N Engl J Med. 2008;358(24):2545–59.

    CAS  Article  Google Scholar 

  22. 22.

    Hayes AJ, Leal J, Gray AM, Holman RR, Clarke PM. UKPDS outcomes model 2: a new version of a model to simulate lifetime health outcomes of patients with type 2 diabetes mellitus using data from the 30 year United Kingdom Prospective Diabetes Study: UKPDS 82. Diabetologia. 2013;56(9):1925–33.

    CAS  Article  Google Scholar 

  23. 23.

    Maxwell A, Li R, Yang B, et al. Deep learning architectures for multi-label classification of intelligent health risk prediction. BMC Bioinform. 2017;18(Suppl 14):523.

    Article  Google Scholar 

  24. 24.

    Folorunso SO, Fashoto SG, Olaomi J, Fashoto OY. A multi-label learning model for psychotic diseases in Nigeria. Inform Med Unlocked. 2020;19:100326.

    Article  Google Scholar 

  25. 25.

    Omar M, Tahir M, Khelifi F. Multi-label learning model for improving retinal image classification in diabetic retinopathy. 2017. 0202.

  26. 26.

    Lagani V, Chiarugi F, Manousos D, et al. Realization of a service for the long-term risk assessment of diabetes-related complications. J Diabetes Compl. 2015;29(5):691–8.

    Article  Google Scholar 

  27. 27.

    Flammer J, Konieczka K, Bruno RM, Virdis A, Flammer AJ, Taddei S. The eye and the heart. Eur Heart J. 2013;34(17):1270–8.

    Article  Google Scholar 

  28. 28.

    Rim TH, Teo A, Yang H, Cheung CY, Wong TY. Retinal vascular signs and cerebrovascular diseases. J Neuroophthalmol. 2020;40:44–59.

    Article  Google Scholar 

  29. 29.

    Gopinath B, Wang SB, Liew G, Phan K, Joachim N, Burlutsky G, et al. Retinal vascular geometry and the prevalence of atrial fibrillation and heart failure in a clinic-based sample. Heart Lung Circ. 2019;28:1631–7.

    Article  PubMed  Google Scholar 

  30. 30.

    Nägele MP, Barthelmes J, Ludovici V, et al. Retinal microvascular dysfunction in heart failure. Eur Heart J. 2018;39(1):47–56.

    Article  Google Scholar 

  31. 31.

    Wang J, Jiang J, Zhang Y, Qian YW, Zhang JF, Wang ZL. Retinal and choroidal vascular changes in coronary heart disease: an optical coherence tomography angiography study. Biomed Opt Express. 2019;10:1532–44.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Xu X, Sun F, Wang Q, et al. Comprehensive retinal vascular measurements: a novel association with renal function in type 2 diabetic patients in China. Sci Rep. 2020;10(1):13737.

    CAS  Article  Google Scholar 

  33. 33.

    Bai BM, Mangathayaru N, Rani BP. Diabetes complications prediction using different multi-label classification algorithms-MEKA. ICICCT 2019: system reliability, quality control, safety, maintenance and management. 2020.

  34. 34.

    Boutell M, Luo J, Shen X, Brown C. Learning multi-label scene classification. Pattern Recognit. 2004;37:1757.

    Article  Google Scholar 

  35. 35.

    Read J, Pfahringer B, Holmes G, Frank E. Classifier Chains for Multi-label Classification. 2009.

  36. 36.

    Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Mach Learn. 2011;85(3):333–59.

    Article  Google Scholar 

  37. 37.

    Fürnkranz J, Hüllermeier E, Loza Mencía E, Brinker K. Multilabel classification via calibrated label ranking. Mach Learn. 2008;73(2):133.

    Article  Google Scholar 

  38. 38.

    Tsoumakas G, Vlahavas I. Random k-Labelsets: An Ensemble Method for Multilabel Classification. Berlin, Heidelberg,2007.

  39. 39.

    Zhang M, Zhou Z. ML-KNN: A lazy learning approach to multi-label leaming. Pattern Recognit. 2007;40:2038.

    Article  Google Scholar 

  40. 40.

    Veloso A, Jr WM. Multi-Label Associative Classification. Springerbriefs in Computer Science. 2011: 53–59.

  41. 41.

    Elisseeff A, Weston J. A Kernel Method for Multi-Labelled Classification. 2002.

  42. 42.

    Ghamrawi N, Mccallum A. Collective multi-label classification. 2005. 195.

  43. 43.

    Elkafrawy P, Mausad A, Esmail H. Experimental comparison of methods for multi-label classification in different application domains. Int J Comput Appl. 2015;114:1.

    Google Scholar 

  44. 44.

    Zhang J, Wang Y, Zhang R, et al. Serum fibrinogen predicts diabetic ESRD in patients with type 2 diabetes mellitus. Diabetes Res Clin Pract. 2018;141:1–9.

    CAS  Article  Google Scholar 

  45. 45.

    Zhang J, Zhang R, Wang Y, et al. The level of serum albumin is associated with renal prognosis in patients with diabetic nephropathy. J Diabetes Res. 2019;2019:7825804.

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Tessari P, Kiwanuka E, Barazzoni R, Vettore M, Zanetti M. Diabetic nephropathy is associated with increased albumin and fibrinogen production in patients with type 2 diabetes. Diabetologia. 2006;49(8):1955–61.

    CAS  Article  Google Scholar 

  47. 47.

    Robles NR, Ferreira F, Martinez-Gallardo R, et al. Hematocrit, urea and gender: the Hematocrit, Urea and GEnder formula for prognosing progressive renal failure in diabetic nephropathy. Eur J Intern Med. 2012;23(3):283–6.

    Article  Google Scholar 

  48. 48.

    Samra YA, Saleh HM, Hussein KA, et al. Adenosine deaminase-2-induced hyperpermeability in human retinal vascular endothelial cells is suppressed by MicroRNA-146b-3p. Invest Ophthalmol Vis Sci. 2017;58(2):933–43.

    CAS  Article  Google Scholar 

  49. 49.

    Issar T, Arnold R, Kwai N, et al. Relative contributions of diabetes and chronic kidney disease to neuropathy development in diabetic nephropathy patients. Clin Neurophysiol. 2019;130(11):2088–95.

    Article  Google Scholar 

Download references


We thank Fei Ye ( and Yexun Zhang ( ) for providing the machine learning algorithms. We would like to thank Editage ( for English language editing.


We acknowledge support from Jiangsu Modern Hospital Management Research Fund (JSY-1-2018-117).

Author information




LZ and XZ equally contributed to the work design, results analyses and manuscript writing. DY contributed to the manuscript writing. YW contributed to compile and prepare data from patients and manuscript writing. XB contributed to manuscript writing. XY as the principal supervisor, contributed to work design, carry out each experimental test and manuscript writing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xinhua Ye.

Ethics declarations

Ethics approval and consent to participate

This study has been approved from the ethics committee of the Affiliated Changzhou No. 2 People’s Hospital of Nanjing Medical University (in these studies). Informed consent was obtained from all participants prior to participating in the interview study. All methods were carried out in accordance with relevant guidelines and regulations (e.g. Helsinki guidelines).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table 1

. 93 clinical parameters collected from EHR. Table 2. Qualitative data assignments. Table 3. Experimental results on 5 different models including 11 performance evaluation metrics, taking XGBoost as the base model. Fig. 1. The ROC curves using XGBoost as the base model. The experimental results of 4 different models, BR, LP, CC, and ECC. In each figure, the ROC curves of different complications are marked by different colors accordingly.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, L., Zheng, X., Yang, D. et al. Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med Inform Decis Mak 21, 182 (2021).

Download citation


  • Diabetic complication
  • Machine learning
  • Multi-label classification
  • Correlation
  • Key indicators