A data-driven approach to predicting diabetes and cardiovascular disease with machine learning

Table 5 Results using 10-fold cross-validation for diabetes classification

Lab	Year & Case	Model	AUC	Precision	Recall	F1
No lab		Logistic Reg.	0.827	0.75	0.75	0.75
	1999-2014	SVM	0.849	0.77	0.77	0.77
	Diab. Case I	Random Forest	0.855	0.78	0.78	0.78
		XGBoost	0.862	0.78	0.78	0.78
		Ensemble	0.859	0.78	0.78	0.78
		Logistic Reg.	0.732	0.67	0.67	0.67
	1999-2014	SVM	0.734	0.68	0.68	0.68
	Diab. Case II	Random Forest	0.731	0.67	0.67	0.67
		XGBoost	0.734	0.67	0.67	0.67
		Ensemble	0.737	0.68	0.68	0.68
		Logistic Reg.	0.800	0.72	0.72	0.72
	2003-2014	SVM	0.822	0.75	0.75	0.75
	Diab. Case I	Random Forest	0.841	0.77	0.76	0.76
		XGBoost	0.837	0.75	0.75	0.75
		Ensemble	0.834	0.75	0.75	0.75
		Logistic Reg.	0.718	0.66	0.66	0.66
	2003-2014	SVM	0.716	0.66	0.66	0.66
	Diab. Case II	Random Forest	0.719	0.67	0.67	0.66
		XGBoost	0.725	0.67	0.67	0.67
		Ensemble	0.725	0.66	0.66	0.66
With lab		Logistic Reg.	0.866	0.79	0.79	0.79
	1999-2014	SVM	0.887	0.81	0.81	0.81
	Diab. Case I	Random Forest	0.937	0.86	0.86	0.86
		XGBoost	0.957	0.89	0.89	0.89
		Ensemble	0.944	0.87	0.87	0.87
		Logistic Reg.	0.724	0.67	0.67	0.67
	1999-2014	SVM	0.737	0.68	0.68	0.68
	Diab. Case II	Random Forest	0.738	0.68	0.68	0.68
		XGBoost	0.802	0.74	0.74	0.74
		Ensemble	0.783	0.71	0.71	0.71
		Logistic Reg.	0.877	0.80	0.80	0.80
	2003-2014	SVM	0.882	0.81	0.80	0.80
	Diab. Case I	Random Forest	0.939	0.86	0.86	0.86
		XGBoost	0.962	0.89	0.89	0.89
		Ensemble	0.948	0.88	0.88	0.88
		Logistic Reg.	0.738	0.68	0.68	0.68
	2003-2014	SVM	0.737	0.68	0.68	0.68
	Diab. Case II	Random Forest	0.740	0.68	0.68	0.67
		XGBoost	0.834	0.75	0.75	0.75
		Ensemble	0.798	0.72	0.72	0.72

AUC - Area Under the Curve, \(Precision = \frac {TP}{TP + FP}, Recall = \frac {TP}{TP + FN}\) (where TP - True Positive, FP - False Positive, FN - False Negative), and F1 (score) = \(2\frac {precision*recall}{precision + recall}\). Bold face font signifies best performing model result

ISSN: 1472-6947