Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Table 5 Sensitivity and specificity of various classifiers trained on the hepatobiliary data set for difference percentages of over-sampling

Over-sampling	SVM		MyC		RIPPER		C4.5		Imbalance
(%)	Sens.	Spec.	Sens.	Spec.	Sens.	Spec.	Sens.	Spec.	ratio
0	0.89	0.77	0.92	0.68	0.91	0.71	0.90	0.79	42
100	0.90	0.72	0.96	0.52	0.94	0.64	0.93	0.73	21
200	0.90	0.70	0.96	0.47	0.96	0.56	0.94	0.67	14
300	0.91	0.70	0.97	0.44	0.96	0.54	0.95	0.65	11
400	0.91	0.71	0.98	0.45	0.97	0.50	0.95	0.63	8
500	0.92	0.69	0.98	0.43	0.97	0.48	0.95	0.62	7
600	0.92	0.68	0.97	0.35	0.96	0.47	0.95	0.61	6
700	0.92	0.67	0.98	0.34	0.97	0.47	0.95	0.60	5
800	0.92	0.65	0.97	0.34	0.97	0.47	0.95	0.61	5
900	0.93	0.65	0.97	0.34	0.97	0.45	0.95	0.59	4
1000	0.93	0.64	0.97	0.35	0.96	0.44	0.95	0.59	4

ISSN: 1472-6947