An ensemble model for predicting dispositions of emergency department patients

Table 6 Performance comparison of predictive models

	Training dataset						Test dataset
		Accuracy (SD)	AUROC (SD)	F1 (SD)	Precision (SD)	Recall (SD)	Accuracy (95% C.I.)	AUROC (95% C.I.)	F1 (95% C.I.)	Precision (95% C.I.)	Recall (95% C.I.)
Random Forest (Structured data)		0.794 (0.005)	0.846 (0.004)	0.794 (0.005)	0.795 (0.005)	0.794 (0.005)	0.791* (0.784–0.798)	0.844* (0.838–0.849)	0.791* (0.784–0.799)	0.792* (0.784–0.799)	0.791* (0.784–0.798)
Random Forest (Unstructured data)	BOW	0.792 (0.008)	0.844 (0.006)	0.792 (0.008)	0.793 (0.008)	0.792 (0.008)	0.793* (0.786-0.800)	0.845* (0.839–0.850)	0.793* (0.786-0.800)	0.794* (0.786–0.801)	0.793* (0.786-0.800)
Random Forest (Unstructured data)	TF-IDF	0.794 (0.005)	0.846 (0.004)	0.795 (0.005)	0.795 (0.005)	0.794 (0.005)	0.792* (0.785–0.799)	0.844* (0.839–0.849)	0.793* (0.786–0.799)	0.793* (0.787-0.800)	0.792* (0.785–0.799)
Multilayer Perceptron (Structured + Unstructured data)	BOW	0.939 (0.007)	0.973 (0.002)	0.896 (0.094)	0.942 (0.000)	0.936 (0.015)	0.937* (0.932–0.943)	0.971* (0.969–0.974)	0.906* (0.857–0.945)	0.937* (0.932–0.943)	0.937* (0.932–0.943)
Multilayer Perceptron (Structured + Unstructured data)	TF-IDF	0.936 (0.008)	0.972 (0.002)	0.849 (0.090)	0.938 (0.000)	0.933 (0.016)	0.938* (0.933–0.944)	0.972* (0.970–0.975)	0.896* (0.840–0.936)	0.938* (0.933–0.944)	0.938* (0.933–0.944)

Notes 1.BOW means Bag-of-Words, TF-IDF means term frequency–inverse document frequency, AUROC means area under the receiver operating characteristic, F1 means F1 score, SD means standard deviation, and C.I. indicates confidence interval
2. * indicates p < 0.001

ISSN: 1472-6947