The advanced machine learner XGBoost did not reduce prehospital trauma mistriage compared with logistic regression: a simulation study

Table 1 Characteristics of study data and sizes of data sets

Characteristics	NTDB	SweTrau
Total number of observations	813,567	30,577
Number of missing observations	422,416	10,411
Number of included observations	368,810	16,547
Proportion major trauma	0.21	0.12
Proportion female	0.38	0.35
Age (median-IQR)	51 [30, 69]	41 [25 59]

GCS Category	NTDB proportion of observations	SweTrau proportion of observations
13–15	0.9098	0.9173
9–12	0.0384	0.0401
6–8	0.0175	0.018
4–5	0.0061	0.009
3	0.0283	0.0156

RR Category	NTDB proportion of observations	SweTrau proportion of observations
30–67	0.0191	0.0543
10–29	0.9677	0.9407
6–9	0.0063	0.0036
0–5	0.0021	0.0011

SBP Category	NTDB proportion of observations	SweTrau proportion of observations
90–300	0.9707	0.9798
76–89	0.0192	0.0126
50–75	0.0091	0.0068
1–49	0.0011	0.0007

Size training sets (events per free parameter)	NTDB	SweTrau
10	714	1250
25	1786	3125
100	7143	12,500
1000	71,429	Missing

Size validation and test sets	NTDB	SweTrau
(200/proportion events)	952	1667

NTDB, National Trauma Data Bank; SweTrau, Swedish Trauma Registry; GCS, Glasgow Coma Scale; RR, Respiratory Rate; SBP, Systolic Blood Pressure

ISSN: 1472-6947