Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Table 9 Performance of the classifiers with the highest sensitivity and a specificity of at least 0.5 on the hepatobiliary disease and acute renal failure data sets

Data set	Algorithm	Baseline		Under-sampling		Over-sampling		Cost-sensitive
		Sens.	Spec.	Sens.	Spec.	Sens.	Spec.	Sens.	Spec.
Hepatobiliary disease	SVM	0.89	0.77	0.94	0.52	0.93	0.65	0.87	0.79
	MyC	0.92	0.68	0.95	0.56	0.94	0.54	0.95	0.54
	C4.5	0.90	0.79	0.93	0.59	0.94	0.56	0.92	0.66
	RIPPER	0.90	0.71	0.93	0.72	0.94	0.51	0.93	0.67
Acute renal failure	SVM	0.62	0.92	0.86	0.56	0.84	0.54	0.59	0.92
	MyC	0.69	0.90	0.83	0.70	0.89	0.51	0.81	0.63
	C4.5	0.69	0.88	0.86	0.77	0.83	0.61	0.78	0.60
	RIPPER	0.71	0.89	0.84	0.68	0.89	0.59	0.78	0.80

ISSN: 1472-6947