Skip to main content

Study on risk factors of diabetic peripheral neuropathy and establishment of a prediction model by machine learning



Diabetic peripheral neuropathy (DPN) is a common complication of diabetes. Predicting the risk of developing DPN is important for clinical decision-making and designing clinical trials.


We retrospectively reviewed the data of 1278 patients with diabetes treated in two central hospitals from 2020 to 2022. The data included medical history, physical examination, and biochemical index test results. After feature selection and data balancing, the cohort was divided into training and internal validation datasets at a 7:3 ratio. Training was made in logistic regression, k-nearest neighbor, decision tree, naive bayes, random forest, and extreme gradient boosting (XGBoost) based on machine learning. The k-fold cross-validation was used for model assessment, and the accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC) were adopted to validate the models’ discrimination and clinical practicality. The SHapley Additive exPlanation (SHAP) was used to interpret the best-performing model.


The XGBoost model outperformed other models, which had an accuracy of 0·746, precision of 0·765, recall of 0·711, F1-score of 0·736, and AUC of 0·813. The SHAP results indicated that age, disease duration, glycated hemoglobin, insulin resistance index, 24-h urine protein quantification, and urine protein concentration were risk factors for DPN, while the ratio between 2-h postprandial C-peptide and fasting C-peptide(C2/C0), total cholesterol, activated partial thromboplastin time, and creatinine were protective factors.


The machine learning approach helped established a DPN risk prediction model with good performance. The model identified the factors most closely related to DPN.

Peer Review reports


The incidence of diabetes has been increasing worldwide in recent years. About 6·7 million people died from diabetes or its complications in 2021 [1]. It is estimated that by 2045, China will have 174 million people with diabetes, ranking first in the world. The burden of diabetes-related complications is expected to increase with the increase in diabetes prevalence. Diabetic peripheral neuropathy (DPN) is the most common microvascular complication of diabetes [2], potentially leading to foot deformity, ulceration, and even amputation. It can also damage the central nervous system and increase the risk of all-cause and cardiovascular mortality [3,4,5,6]. Recent studies have found that DPN starts progressing in prediabetes [7]. However, a nerve conduction study could not assess its subtle damage to nerve fibers, resulting in delayed DPN diagnosis, intervention, and treatment administration [8, 9]. Identifying risk factors for DPN is crucial for clinical management.

Studies have identified age, diabetes duration, and glycosylated hemoglobin as risk factors for DPN progression [10, 11], and various other DPN-related factors are still being explored. However, due to differences in evaluation criteria among studies, many contrasting conclusions have been reported [12, 13], resulting in poor clinical applicability of DPN predictors. With the development of science and technology, the establishment of accurate predictive models through machine learning and risk factor acquisition are now applied clinically [14].

Machine learning does not require model structure pre-specification; rather, machine learning searches for the optimal fit within certain constraints. This approach can result in an accurate final prediction model that analyzes the complex interactions between many features [15]. Only a few DPN prediction models have been developed based on machine learning. Considering that DPN prevalence varies among countries [14], the existing DPN prediction models for China are limited by their small sample and single-center nature [16]. Our study aimed to discover the link between laboratory indicators and DPN through machine learning, and help clinicians quickly and accurately predict the risk of developing DPN.


Study population

We retrieved the data of 1278 patients with T2DM treated at Jiangsu Provincial Hospital of Traditional Chinese Medicine (n = 1093) and Jiangsu Provincial Governmental Hospital (n = 185) between February 2020 and July 2022. The data included 192 indicators, including the patient’s basic characteristics, complications, routine blood values, blood biochemistry, immune testing, thyroid function, coagulation function, routine urine values, urine biochemistry, routine stool values, insulin measurement, tumor screening, and sex hormone testing (see Supplementary Table S1).The data were divided into those with and those without DPN according to the nerve conduction test results. See Fig. 1 for the entire research process, including feature selection, data balancing, model construction, model comparison, and optimal model selection and interpretation. This study followed the principles of the Declaration of Helsinki and was approved by the Ethics Committee of Jiangsu Provincial Government Hospitals (2022 Hospital Ethics Review No. 030). All the above data have passed ethical verification.

Fig. 1
figure 1

Study design for building a machine learning model to predict diabetic peripheral neuropathy. Abbreviations and definitions: XGBoost, Extreme Gradient Boosting; NB, Naive Bayes; LR, Logistic Regression; KNN, K-Nearest-Neighbor; RF, Random Forest; DT, Decision Tree; K-Fold, K-Fold cross validation; SHAP, SHapley Additive exPlanations

Sample size

In order to achieve better performances of ML models in predicting the risk factors of diabetic peripheral neuropathy in this study, we included all records who fulfilled our inclusion criteria to fully train the models.

Data extraction

Screening of patient records according to the following inclusion and exclusion criteria.

Inclusion criteria

T2DM Patients diagnosed with DPN and had the diagnostic basis of EMG.

Exclusion criteria

Patients with neuropathy caused by other factors or patients with intervertebral disc disease, spinal nerve root disease and other neuropathy diseases.

Data exploration

We invited medical experts to manually select the features with a significant impact on DPN diagnosis, reducing the 192 features in the dataset to 53. The features were selected based on their clinical relevance and previous research and included the patient’s age, weight, glycosylated hemoglobin, blood lipids, and other essential characteristics. Some of these commonly used medical indicators are calculated by other indicators, as shown in the following formulae. Data mining treated the variables as complete (no without values) or incomplete (with missing values). Important incomplete variables were dealt with by deletion, imputation, or no processing. We eliminated ten features whose missing data rate exceeded 30%. After eliminating features, the remaining categorical variables with missing data were completed with the variables’ modes, and the remaining continuous variables were filled with the variables’ mean values. This study included 748 samples with DPN and 530 samples without DPN.

$$\mathrm C2/\mathrm C0=\frac{2\;\mathrm{hours}\;\mathrm{postprandial}\;\mathrm C-{\mathrm{peptide}\left(\mathrm{ng}/\mathrm{mL}\right)}}{\mathrm{fasting}\;\mathrm C-\mathrm{peptide}\left(\mathrm{ng}/\mathrm{mL}\right)}$$
$$\mathrm{HOMA}-\mathrm{IR}=\frac{\mathrm{fasting}\;\mathrm{plasmaglucose}(\mathrm{mmol}/\mathrm L)\times\mathrm{fasting}\;\mathrm{seruminsulin}(\mathrm{mU}/\mathrm L)}{22\cdot5}$$
$$\mathrm{NLR}=\frac{\mathrm{number}\;\mathrm{of}\;\mathrm{neutrophils}(10^9/\mathrm L)}{\mathrm{number}\;\mathrm{of}\;\mathrm{lymphocytes}(10^9/\mathrm L)}$$

Feature selection and data balancing

Feature selection, critical in feature engineering, can filter out highly correlated features to improve model performance and reduce training time. Feature selection can be divided into the filtering, wrapping, and embedding methods. The embedding feature selection method employed in this study uses a machine learning model to automatically select features and integrate them into the training process.

We balanced the data as most machine learning algorithms do not work well with unbalanced datasets. For unbalanced data, over-sampling, under-sampling or both can be used to achieve positive and negative sample balance. The SMOTETomek method [17, 18] was used for data balancing. It is a composite method that performs an over-sampling operation on a large proportion of samples and an under-sampling operation on a small proportion of samples.

Research technology

Statistical analysis was performed using IBM SPSS Statistics for Windows, Version 27.0 (IBM Corp., Armonk, NY, USA). We used the Kolmogorov-Smirnov test to assess the continuous variables and the Chi-Squared test to assess the categorical variables (see Supplementary Table S2 and S3). As the continuous variables were not normally distributed, they were compared by the Mann-Whitney U test. Feature selection and data preprocessing, balancing, modeling, and evaluation were performed using Python Software Foundation. Python Language Reference, version 3.9. Available at

We divided the data into a 7:3 ratio of training set and test set, using the training set to train the prediction model and the test set for the model evaluation. Six machine learning algorithms were used to build the prediction models, including logistic regression, k-nearest neighbor, decision tree, naive Bayes, random forest, and extreme gradient boosting (XGBoost). We used the open-source package sklearn 0.24.2 for model realization and evaluation [19]. Model performances were assessed by the indicators’ accuracy, precision, recall, F1-score, confusion matrix, and the area under the receiver operating characteristic curve (AUC) under 10-fold cross-validation.

Model interpretation

The interpretation of the model is a very important step that helps one to understand the process of model classification. The SHapley Additive exPlanations(SHAP) originated from cooperative game theory and have a solid theoretical foundation [20]. The method is a model-independent solution to model interpretability. We synthetically selected the best-performing model, and used SHAP to calculate the marginal contribution of features to explain the model output, identify significant features in the various classifications, and indicate whether they were positively or negatively correlated.



We used the collected data of 1278 diabetic patients for modeling of machine learning, all patients have completed EMG results. Nerve conduction abnormalities involving one or more nerves was defined as nerve injury, grouped according to the patient’s nerve conduction outcome. 748 (58.53%) of which did and 530 (41.47%) did not develop DPN. The description of the general situation of the study subjects is presented in Supplementary Table S3.

Feature selection

The 43 variables included in this study comprised 34 continuous variables and nine categorical variables. We used the embedded method and random forest as primary learners to further filter the features and select 16. Supplementary Table S4 presents the weights of these 16 features.

As shown in Table 1, age, alanine aminotransferase, albumin, total bilirubin, urea, total cholesterol, glycosylated hemoglobin, activated partial thromboplastin time (aPTT), 24-h urine protein quantification, urine protein concentration, diabetes duration, neutrophil-to-lymphocyte ratio, and the homeostatic model assessment of insulin resistance (HOMA-IR) index were statistically significant (P<0·05).

Table 1 Single factor analysis of DPN selection variables

Data balancing

The data distribution before and after balancing is shown in Table 2.

Table 2 Comparison of positive and negative samples before and after data balance

Modeling and evaluation

The evaluation results of the six machine learning algorithms are shown in Table 3. The results showed that XGBoost had the best accuracy (0·753 ± 0·032), recall (0·721 ± 0·050), and F1-value (0·744 ± 0·036). K-nearest neighbor showed the highest precision (0·858 ± 0·070) but performed poorly in the other indicators.

Table 3 Comparison of classification results of different models (mean ± std)

The confusion matrix serves as a formalized method for evaluating machine learning models, reflecting the results presented in Table 3. As can be easily deduced from Fig. 2, the overall performance of the Random Forest and XGBoost models is significantly superior to that of the other models, with the XGBoost model having a slight edge over the Random Forest model. Specifically, the XGBoost model slightly outperforms the Random Forest model in terms of accuracy, precision, and recall metrics.

Fig. 2
figure 2

Model classification confusion matrix. XGBoost, Extreme Gradient Boosting; NB, Naive Bayes; LR, Logistic Regression; KNN, K-Nearest-Neighbor; RF, Random Forest; DT, Decision Tree

The performance of the six machine learning algorithms in predicting DPN is shown in Fig. 3. The AUC of XGBoost (0·818) was the largest, followed by random forest (0·804). The decision tree model had the smallest AUC (0·636).

Fig. 3
figure 3

ROC curves for different classification models. XGBoost, Extreme Gradient Boosting; NB, Naive Bayes; LR, Logistic Regression; KNN, K-Nearest-Neighbor; RF, Random Forest; DT, Decision Tree; ROC curve,receiver operating characteristic curve

Descriptive statistics of AUC values for different models were shown in Table 4. Due to the small sample size and non-normal distribution of AUC, we conducted a Wilcoxon signed-rank test on the XGBoost model’s and other models’ AUC values. The AUC values of XGBoost were statistically significantly different from those of LR, KNN, NB, and DT, but not significantly different from RF, as shown in Table 5. Cross-validation results were often used for model selection, and based on the average values of various metrics in Table 3, we have chosen XGBoost.

Table 4 Descriptive Statistics of AUC for Different Models
Table 5 Significance Testing between XGBoost and Other Models

A comprehensive performance analysis of the six models indicated that XGBoost was the optimal model for predicting DPN. We used SHAP to elucidate the relationship between the features and the output of the XGBoost model. From Fig. 4, we can get the top 10 indicators that have the greatest impact on classification, all of which P < 0̵·05. Figure 4 shows that age, disease duration, C2/C0, and total cholesterol were essential features for DPN prediction by XGBoost and significantly impacted the classification results. Age, disease duration, glycated hemoglobin, insulin resistance (IR) index, 24-h urine protein quantification, and urine protein concentration were risk factors, while C2/C0, total cholesterol, aPTT, and creatinine were protective factors.

Fig. 4
figure 4

Comparison of XGBoost Model Interpretations using SHAP across Different Dataset Splits


DPN is the most common complication of diabetes in China, in addition to cardiovascular and cerebrovascular diseases. DPN is often neglected in its early stages because nerve conduction studies cannot detect small fiber lesions. The nerve damage caused by DPN is irreversible by the time the disease is fully established. Therefore, it is very important to identify the clinical indicators, predictors, and risk factors of DPN. This study built a DPN prediction model based on XGBoost through machine learning. The high recall rate of the model shows that it has good reliability.

The developed model has several advantages. First, data from two medical centers in Nanjing, China, were used for analysis. DPN diagnosis in all patients followed the nerve conduction study classification, and their pathological data were tested by standardized laboratory techniques, to ensuring the accuracy of the variables in the prediction model. Second, unbalanced datasets make models highly dependent on some specific indicators, leading to over- or underfitting. We used the SMOTETomek method to solve this problem. Third, we constructed multiple classical machine learning DPN prediction models. We determined that the XGBoost model was the best after comparing the models by 10-fold cross-validations. Fourth, the SHAP method was used to explain the relationship between the input features and the output variables of the XGBoost model, ranking the features according to their contribution to the model output. We found several important indicators highly correlated with the progression to DPN. These are highly relevant since the key indicators of disease diagnosis were objectively extracted from real clinical data.

Previous studies have attempted to build predictive models of DPN. Wu et al. [16] established four predictive nomographs and selected the optimal model, but the Toronto Clinical Neuropathy Scoring System score suggested the diagnosis groups included false positives. Kazemi et al. [21] built a DPN prediction model based on multicategory support vector machine; however, by directly selecting the research model, they could not compare it to others, possibly reducing the model’s accuracy. Baskozos et al. [22] used the DOLORisk project dataset, applied machine learning to classify painful and painless DPN, built models, and identified predictive factors. Although their study used one of the largest and most comprehensively phenotyped cohort of people with DPN, its disadvantage was in its uncertain data quality. Metsker et al. [23] used EMRS data to build various machine-learning models for DPN. They found the most effective method for each research task to ensure the high accuracy of the research results. Therefore, it would be useful to build a better DPN prediction model using well-structured, large, and balanced datasets. The previously determined risk factors of DPN include age, diabetes duration, glycosylated hemoglobin, 24-h urine protein quantification, and urine protein concentration [24], consistent with our results. Our research has additional new findings.

The C2/C0 index represents a protective factor of DPN. C2/C0 reflecting the secretory function of pancreatic β cells. The higher the C2/C0 value represents the higher increase in C-peptide after meals. C-peptide is a polypeptide released by pancreatic β cells into the blood at a concentration equal to that of insulin. Laboratory studies have shown that C-peptide can increase endothelial nitric oxide synthase to improve endothelial function, blood flow, and neural function [25]. It can also improve neural function and structural abnormalities by increasing the activity of nerve Na+/K + ATPase and reducing Na + retention [26]. Furthermore, it can improve the gene expression of nerve growth factor, insulin-like growth factor-1, and neurotrophin-3 receptors correcting the mRNA and protein expression of neurofilament and tubulin and normalizing the abnormal phosphorylation of neurofilament [27, 28]. Clinical studies of T1DM showed that C-peptide has a good neuroprotective effect [29]. As T2DM is related to IR, changes in insulin and C-peptide concentrations will occur with the development of the disease. Therefore, there is no clear clinical evidence of the role of C-peptide in T2DM. Most patients in our cohort had T2DM. C2 and C0 were correlated with DPN when we started analyzing the data, but they were not significant enough. Their importance became apparent once they were combined into an index. Our finding confirms the neuroprotective effect C-peptide has in patients with diabetes and further explains the relationship between the secretion function and health of pancreatic β cells and DPN.

Total cholesterol protects against DPN in patients with diabetes. Total cholesterol is the sum of HDL-C, LDL-C, and the small amount of free cholesterol. Cholesterol is closely related to nerves and is an indispensable resource for myelin sheath development [30]. The myelin sheath can ensure the rapid transmission of nerve impulses and maintain normal nerve function in the peripheral nervous system (PNS) [31]. The myelin sheath in the PNS is mainly formed by a repeated wrapping of the Schwann cell membrane around the axon. Most of the cholesterol required to form the myelin sheath is synthesized in the endoplasmic reticulum of the neuronal cell body. However, when a very long axon is damaged away from the cell body, the Schwann cells need to take cholesterol from the circulation to form the myelin sheath [32, 33]. It was suggested that lower serum cholesterol levels might hamper peripheral nerve regeneration [34]. It was proposed that nerve swelling due to changes in Schwann cell lipid components for lack of cholesterol affects axon regeneration after nerve injury [33, 35]. Furthermore, the effect of daily medication on the lipid profile in patients with diabetes should not be ignored. Most patients with diabetes are treated with insulin, which can increase the high-density lipoprotein cholesterol (HDL-C) level, but not the low-density lipoprotein cholesterol (LDL-C) level [36]. Metformin, a commonly used clinical hypoglycemic drug, can reduce LDL-C [37]. It was shown that LDL-C might cause nerve damage [32, 38, 39] while HDL-C protects against it [40]. HDL-C or LDL-C alone did not show a strong correlation with DPN in this study due to the diverse medications used by the patients. However, it can be assumed that changes in the lipid profile after diabetes medication use might affect the nerves. In addition to hypoglycemic drugs, the intake of statins and the related reduction in serum cholesterol level are also associated with accelerated deterioration of neurological symptoms, microvascular injury, and peripheral nerve fiber injury [34, 41, 42]. But in some studies, hyperlipidemia is a risk factor for DPN in patients with T2DM [43]. In clinical practice, most patients need medications to control blood lipids status. Consequently, a lingering question is whether we can predict DPN development based on the proportion of cholesterol in blood lipids, considering the use of drugs to control blood lipid within a certain range.

HOMA-IR is a predictor of DPN in patients with diabetes. It is an index calculated using fasting blood glucose and insulin [44]. HOMA-IR is a good indicator of the degree of IR in the body and has been used in large-scale clinical and epidemiological studies. IR is the main internal environment state in patients with T2DM. Neuronal IR can lead to low insulin signaling and induce DPN progression. IR reduces the Akt signaling transduction by destroying insulin signaling in Schwann cells of the PNS [45]. Alterations in the Akt signaling pathway affect the neuronal mitochondrial function and lead to the subsequent increase in oxidative stress [46]. Glucose can mediate oxidative stress and promote the progression of DPN by inducing mitochondrial biogenesis and fission [47]. Therefore, the disruption to insulin signaling induced by IR makes the PNS neurons more susceptible to metabolic damage. Moreover, Akt regulates the myelination of the PNS nerve fibers by activating Rac1 to enhance membrane encapsulation and synthesizing myelin protein through mammalian target of rapamycin complex 1 [48, 49]. The reduction in Akt signaling transduction caused by IR impairs myelination and enhances DPN progression. IR was associated with DPN in laboratory studies [50], but related clinical data analysis is rare. This study was the first to use the HOMA-IR index as an indicator and explored the correlation between IR and DPN in clinical data.

Our model indicated that aPTT and creatinine were protective factors of DPN. aPTT represents the coagulation ability in the body; the lower the value, the more likely there is a hypercoagulable state. Therefore, it is possible that the nerves have a better blood supply when the body is not in a hypercoagulable state, and the better blood supply delays the progress to DPN. However, considering the obstacles to coagulation in patients with diabetes and the few related studies, this conclusion remains to be explored. Creatinine was mostly associated with diabetic nephropathy in studies on diabetic complications. Its association with DPN should be further explored as there are too few studies on the topic.

The results obtained by the model helped better understand the importance of each feature to the model’s prediction. Among the indicators detected by the model, the ten most closely related to DPN were age, diabetes duration, C2/C0, total cholesterol, glycosylated hemoglobin, HOMA-IR index, aPTT, 24-h urine protein quantification, creatinine, and urine protein concentration. The high correlation between age, diabetes duration, and DPN further highlighted the importance of early intervention to prevent complications in patients with diabetes.

Our study had several limitations that must be considered. First, all participants had diabetes, so we only explored diabetes-related indicators. The prediction results could have been different if we had included a control group of healthy individuals. Second, we lack the collection and analysis of patients’ subjective description. These will need to be addressed in future research. Third, all the indicators in this study were continuous variables, and the data analysis may be segmental. For example, the protective effects of aPTT and creatinine on DPN should be restricted to a certain threshold range. Future studies should focus on the impact of indicators within different thresholds. Fourth, the prediction model data came mostly from Nanjing, Jiangsu Province. The applicability of our results to other regions remains to be verified.

In conclusion, we established a DPN risk prediction model, which showed good performance. Through the model, we identified the factors most closely related to DPN. Our team will address the existing problems and strive to establish a better DPN prediction model through future research to help doctors quickly and accurately judge the corresponding prognosis for improved, reliable, and convenient personalized treatment and management of patients with diabetes.

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.



Diabetic peripheral neuropathy


The ratio between 2-h postprandial C-peptide and fasting C-peptide


Non-diabetic peripheral neuropathy


Electronic medical record system


Logistic regression


K-nearest neighbor


Decision tree


Naive bayes


Random forest


Extreme gradient boosting


Shapley additive explanation


True negative value


False positive value


False negative value


True positive value




Type 1 diabetes mellitus


Type 2 diabetes mellitus


High-density lipoprotein cholesterol


Low-density lipoprotein cholesterol


The homeostatic model assessment of insulin resistance


Peripheral nervous system


Insulin resistance


Activated partial thromboplastin time


  1. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, Stein C, Basit A, Chan JCN, Mbanya JC, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119.

    Article  PubMed  Google Scholar 

  2. Liu Z, Fu C, Wang W, Xu B. Prevalence of chronic complications of type 2 diabetes mellitus in outpatients - a cross-sectional hospital based survey in urban China. Health Qual Life Outcomes. 2010;8:62.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Biessels GJ, Despa F. Cognitive decline and dementia in diabetes mellitus: mechanisms and clinical implications. Nat reviews Endocrinol. 2018;14(10):591–604.

    Article  Google Scholar 

  4. O’Brien PD, Hinder LM, Callaghan BC, Feldman EL. Neurological consequences of obesity. Lancet Neurol. 2017;16(6):465–77.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Lian J, Wang H, Cui R, Zhang C, Fu J. Status of analgesic drugs and quality of Life results for Diabetic Peripheral Neuropathy in China. Front Endocrinol. 2021;12:813210.

    Article  Google Scholar 

  6. Hicks CW, Wang D, Matsushita K, Windham BG, Selvin E. Peripheral neuropathy and all-cause and Cardiovascular Mortality in U.S. adults: a prospective cohort study. Ann Intern Med. 2021;174(2):167–74.

    Article  PubMed  Google Scholar 

  7. Papanas N, Vinik AI, Ziegler D. Neuropathy in prediabetes: does the clock start ticking early? Nat reviews Endocrinol. 2011;7(11):682–90.

    Article  CAS  Google Scholar 

  8. Laverdet B, Danigo A, Girard D, Magy L, Demiot C, Desmoulière A. Skin innervation: important roles during normal and pathological cutaneous repair. Histol Histopathol. 2015;30(8):875–92.

    CAS  PubMed  Google Scholar 

  9. Malik RA. Diabetic neuropathy: a focus on small fibres. Diab/Metab Res Rev. 2020;36 Suppl 1:e3255.

    Google Scholar 

  10. Tesfaye S, Stevens LK, Stephenson JM, Fuller JH, Plater M, Ionescu-Tirgoviste C, Nuber A, Pozza G, Ward JD. Prevalence of diabetic peripheral neuropathy and its relation to glycaemic control and potential risk factors: the EURODIAB IDDM Complications Study. Diabetologia. 1996;39(11):1377–84.

    Article  CAS  PubMed  Google Scholar 

  11. Risk Factors for Diabetic Peripheral Neuropathy in Adolescents. Young adults with type 2 diabetes: results from the TODAY Study. Diabetes Care. 2021;45(5):1065–72.

    Google Scholar 

  12. Christensen DH, Knudsen ST, Gylfadottir SS, Christensen LB, Nielsen JS, Beck-Nielsen H, Sørensen HT, Andersen H, Callaghan BC, Feldman EL, et al. Metabolic factors, Lifestyle Habits, and possible polyneuropathy in early type 2 diabetes: a nationwide study of 5,249 patients in the danish centre for Strategic Research in Type 2 diabetes (DD2) cohort. Diabetes Care. 2020;43(6):1266–75.

    Article  PubMed  Google Scholar 

  13. van der Velde J, Koster A, Strotmeyer ES, Mess WH, Hilkman D, Reulen JPH, Stehouwer CDA, Henry RMA, Schram MT, van der Kallen CJH, et al. Cardiometabolic risk factors as determinants of peripheral nerve function: the Maastricht Study. Diabetologia. 2020;63(8):1648–58.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA. 2018;319(13):1317–18.

    Article  PubMed  Google Scholar 

  15. Chen T, Li X, Li Y, Xia E, Qin Y, Liang S, Xu F, Liang D, Zeng C, Liu Z. Prediction and risk stratification of kidney outcomes in IgA Nephropathy. Am J kidney diseases: official J Natl Kidney Foundation. 2019;74(3):300–9.

    Article  Google Scholar 

  16. Wu B, Niu Z, Hu F. Study on risk factors of Peripheral Neuropathy in type 2 diabetes Mellitus and Establishment of Prediction Model. Diabetes & metabolism journal. 2021;45(4):526–38.

    Article  Google Scholar 

  17. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Article  Google Scholar 

  18. Tomek I. Two modifications of CNN. 1976, vol. 6: 769–72.

  19. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  20. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to Global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kazemi M, Moghimbeigi A, Kiani J, Mahjub H, Faradmal J. Diabetic peripheral neuropathy class prediction by multicategory support vector machine model: a cross-sectional study. Epidemiol health. 2016;38:e2016011.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Baskozos G, Themistocleous AC, Hebert HL, Pascal MMV, John J, Callaghan BC, Laycock H, Granovsky Y, Crombez G, Yarnitsky D, et al. Classification of painful or painless diabetic peripheral neuropathy and identification of the most powerful predictors using machine learning models in large cross-sectional cohorts. BMC Med Inf Decis Mak. 2022;22(1):144.

    Article  Google Scholar 

  23. Metsker O, Magoev K, Yakovlev A, Yanishevskiy S, Kopanitsa G, Kovalchuk S, Krzhizhanovskaya VV. Identification of risk factors for patients with diabetes: diabetic polyneuropathy case study. BMC Med Inf Decis Mak. 2020;20(1):201.

    Article  Google Scholar 

  24. Pai YW, Lin CH, Lee IT, Chang MH. Prevalence and biochemical risk factors of diabetic peripheral neuropathy with or without neuropathic pain in taiwanese adults with type 2 diabetes mellitus. Diabetes Metab Syndr. 2018;12(2):111–16.

    Article  PubMed  Google Scholar 

  25. Cotter MA, Ekberg K, Wahren J, Cameron NE. Effects of proinsulin C-peptide in experimental diabetic neuropathy: vascular actions and modulation by nitric oxide synthase inhibition. Diabetes. 2003;52(7):1812–17.

    Article  CAS  PubMed  Google Scholar 

  26. Stevens MJ, Zhang W, Li F, Sima AA. C-peptide corrects endoneurial blood flow but not oxidative stress in type 1 BB/Wor rats. Am J Physiol Endocrinol metabolism. 2004;287(3):E497–505.

    Article  CAS  Google Scholar 

  27. Wahren J, Larsson C. C-peptide: new findings and therapeutic possibilities. Diabetes Res Clin Pract. 2015;107(3):309–19.

    Article  CAS  PubMed  Google Scholar 

  28. Pierson CR, Zhang W, Sima AA. Proinsulin C-peptide replacement in type 1 diabetic BB/Wor-rats prevents deficits in nerve fiber regeneration. J Neuropathol Exp Neurol. 2003;62(7):765–79.

    Article  CAS  PubMed  Google Scholar 

  29. Kamiya H, Zhang W, Ekberg K, Wahren J, Sima AA. C-Peptide reverses nociceptive neuropathy in type 1 diabetes. Diabetes. 2006;55(12):3581–87.

    Article  CAS  PubMed  Google Scholar 

  30. Saher G, Brügger B, Lappe-Siefke C, Möbius W, Tozawa R, Wehr MC, Wieland F, Ishibashi S, Nave KA. High cholesterol level is essential for myelin membrane growth. Nat Neurosci. 2005;8(4):468–75.

    Article  CAS  PubMed  Google Scholar 

  31. Ackerman SD, Luo R, Poitelon Y, Mogha A, Harty BL, D’Rozario M, Sanchez NE, Lakkaraju AKK, Gamble P, Li J, et al. GPR56/ADGRG1 regulates development and maintenance of peripheral myelin. J Exp Med. 2018;215(3):941–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Iqbal Z, Bashir B, Ferdousi M, Kalteniece A, Alam U, Malik RA, Soran H. Lipids and peripheral neuropathy. Curr Opin Lipidol. 2021;32(4):249–257.

    Article  CAS  PubMed  Google Scholar 

  33. de Chaves EI, Rusiñol AE, Vance DE, Campenot RB, Vance JE. Role of lipoproteins in the delivery of lipids to axons during axonal regeneration. J Biol Chem. 1997;272(49):30766–773.

    Article  PubMed  Google Scholar 

  34. Jende JME, Groener JB, Rother C, Kender Z, Hahn A, Hilgenfeld T, Juerchott A, Preisner F, Heiland S, Kopf S, et al. Association of serum cholesterol levels with peripheral nerve damage in patients with type 2 diabetes. JAMA Netw open. 2019;2(5):e194798.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Cermenati G, Audano M, Giatti S, Carozzi V, Porretta-Serapiglia C, Pettinato E, Ferri C, D’Antonio M, De Fabiani E, Crestani M, et al. Lack of sterol regulatory element binding factor-1c imposes glial fatty acid utilization leading to peripheral neuropathy. Cell Metabol. 2015;21(4):571–83.

    Article  CAS  Google Scholar 

  36. Aslan I, Kucuksayan E, Aslan M. Effect of insulin analog initiation therapy on LDL/HDL subfraction profile and HDL associated enzymes in type 2 diabetic patients. Lipids Health Dis. 2013;12:54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Keidan B, Hsia J, Katz R. Plasma lipids and antidiabetic agents: a brief overview. Br J Diabetes Vascular Disease. 2002;2(1):40–3.

    Article  CAS  Google Scholar 

  38. Wiggin TD, Sullivan KA, Pop-Busui R, Amato A, Sima AA, Feldman EL. Elevated triglycerides correlate with progression of diabetic neuropathy. Diabetes. 2009;58(7):1634–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Andersen ST, Witte DR, Dalsgaard EM, Andersen H, Nawroth P, Fleming T, Jensen TM, Finnerup NB, Jensen TS, Lauritzen T, et al. Risk factors for Incident Diabetic Polyneuropathy in a Cohort with screen-detected type 2 diabetes followed for 13 years: ADDITION-Denmark. Diabetes Care. 2018;41(5):1068–75.

    Article  CAS  PubMed  Google Scholar 

  40. Jaiswal M, Divers J, Dabelea D, Isom S, Bell RA, Martin CL, Pettitt DJ, Saydah S, Pihoker C, Standiford DA, et al. Prevalence of and risk factors for Diabetic Peripheral Neuropathy in Youth with Type 1 and type 2 diabetes: SEARCH for diabetes in Youth Study. Diabetes Care. 2017;40(9):1226–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Gaist D, Jeppesen U, Andersen M, García Rodríguez LA, Hallas J, Sindrup SH. Statins and risk of polyneuropathy: a case-control study. Neurology. 2002;58(9):1333–37.

    Article  CAS  PubMed  Google Scholar 

  42. Novak P, Pimentel DA, Sundar B, Moonis M, Qin L, Novak V. Association of Statins with sensory and autonomic Ganglionopathy. Front Aging Neurosci. 2015;7:191.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Smith AG, Singleton JR. Obesity and hyperlipidemia are risk factors for early diabetic neuropathy. J Diabetes Complications. 2013;27(5):436–42.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Ip MS, Lam B, Ng MM, Lam WK, Tsang KW, Lam KS. Obstructive sleep apnea is independently associated with insulin resistance. Am J Respir Crit Care Med. 2002;165(5):670–76.

    Article  PubMed  Google Scholar 

  45. Boucher J, Kleinridders A, Kahn CR. Insulin receptor signaling in normal and insulin-resistant states. Cold Spring Harb Perspect Biol. 2014;6(1):a009191.

  46. Kim B, Feldman EL. Insulin resistance in the nervous system. Trends Endocrinol Metab. 2012;23(3):133–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Vincent AM, Edwards JL, McLean LL, Hong Y, Cerri F, Lopez I, Quattrini A, Feldman EL. Mitochondrial biogenesis and fission in axons in cell culture and animal models of diabetic neuropathy. Acta Neuropathol. 2010;120(4):477–89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Domènech-Estévez E, Baloui H, Meng X, Zhang Y, Deinhardt K, Dupree JL, Einheber S, Chrast R, Salzer JL. Akt regulates Axon wrapping and myelin sheath thickness in the PNS. J Neurosci. 2016;36(16):4506–21.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hackett AR, Strickland A, Milbrandt J. Disrupting insulin signaling in Schwann cells impairs myelination and induces a sensory neuropathy. Glia. 2020;68(5):963–78.

    Article  PubMed  Google Scholar 

  50. Grote CW, Groover AL, Ryals JM, Geiger PC, Feldman EL, Wright DE. Peripheral nervous system insulin resistance in ob/ob mice. Acta Neuropathol Commun. 2013;1:15.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable.


This work was supported by a grant from the Jiangsu Provincial Science and Technology Development Plan of Traditional Chinese Medicine (grant number MS2022138) and Qing Lan Project of Jiangsu Province 2021.

Author information

Authors and Affiliations



XYL, TY contributed to the concept and design of this study. JCZ, JZQ and MW were responsible for statistical analysis and writing of the report, MQY, XJL and GL assisted in statistical analysis, XYL, TY reviewed the article and provided critical feedback to improve and structure the report. All authors agree to be fully accountable for ensuring the integrity and accuracy of the work, and read and approved the final manuscript.

Corresponding authors

Correspondence to Tao Yang or Jingchen Zhong.

Ethics declarations

Ethics approval and consent to participate

This study followed the principles of the Declaration of Helsinki and was approved by the Ethics Committee of Jiangsu Provincial Government Hospitals (2022 Hospital Ethics Review No. 030). The need for informed consent was waived by the ethics committee Review Board of Jiangsu Provincial Governmental Hospital, because of the retrospective nature of the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table S1.

All the features in the raw data.

Additional file 2: Supplementary Table S2.

Kolmogorov-Smirnov test for continuous variables.

Additional file 3: Supplementary Table S3.

Patient demographics and univariate analysis

Additional file 4: Supplementary Table S4.

Feature weights in embedded feature selection.

Additional file 5: Supplementary Table S5.

Hyperparameters combinations in optuna of the 6 models.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lian, X., Qi, J., Yuan, M. et al. Study on risk factors of diabetic peripheral neuropathy and establishment of a prediction model by machine learning. BMC Med Inform Decis Mak 23, 146 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: