Detection of sentence boundaries and abbreviations in clinical narratives

Table 11 Sentence detection.

Top 10	[1-4]	w ²	[1-5]	w ²
1	Capitalization	11.34	LT	674.21
2	LT	10.52	Mean-LT	674.21
3	Mean-LT	10.52	Capitalization	637.85
4	No "\n"	4.82	Rippenanteile_RC	627.54
5	∈ CCDict	4.08	Lymphknoten_RC	356.25
6	Double "\n"	3.77	Double "\n"	336.64
7	All upper case	0.97	Lungengerüstzeichnung_RC	332.50
8	Contains digit	0.71	Integument_RC	321.86
9	> b3	0.31	No "\n"	300.18
10	Contains period	0.19	Normale_RC	277.68
Top 10	[1-6]	w ²	[1-7]	w ²
1	Capitalization	971.25	Abbreviation	1326.41
2	Mean-LT	840.45	Capitalization	867.06
3	LT	840.45	o.B.	382.83
4	Double "\n"	341.46	Double "\n"	374.57
5	No "\n"	324.25	No "\n"	364.32
6	o.B.	259.13	bds.	282.13
7	Rippenanteile_RC	254.91	CT_RC	266.54
8	mitresez.	254.91	Leberlappen_RC	225.08
9	CT_RC	251.41	A.	206.77
10	Leberlappen_RC	236.26	Narbige_RC	191.01

Top 10 feature rankings per feature set (1 Language features; 2 Rule-based features; 3 Text format features; 4 Word length features; 5 Right context word type features (RC); 6 Word type features; 7 Abbreviation feature). Length (LT); w²: Weight based feature relevance criterion.

ISSN: 1472-6947