Detection of sentence boundaries and abbreviations in clinical narratives

Table 3 Abbreviation detection.

Top 10	1	w ²	2	w ²	3	w ²
1	Contains period	0.30	C(L_norm, •)	1.34	S2	3897.48
2	All upper case	0.02	logλ	0.80	S3	3222.35
3	Contains digit	0.01	C(L_norm, ¬•)	0.43	S4	2592.76
4	-	-	C(¬L_norm, ¬•)	0.31	S2, S3	2329.77
5	-	-	C(¬L_norm, •)	0.19	S4, S5	847.88
6	-	-	-	-	S5	706.98
7	-	-	-	-	S2, S4, S5	511.38
8	-	-	-	-	S2, S5	412.86
9	-	-	-	-	S3, S4	204.80
10	-	-	-	-	S2, S3, S4	139.36
Top 10	4	w ²	5	w ²	6	w ²
1	∈ MDDict	0.34	LT border b₂	16.15	St.p.	409.58
2	-	-	LT border b₁	16.15	Amb.	409.51
3	-	-	LT border b₃	16.15	o.B.	409.09
4	-	-	LT	8.74	re.	407.87
5	-	-	Mean-LT	8.74	Z.n.	407.35
6	-	-	> b ₁	0.54	li.	407.28
7	-	-	> b ₃	0.16	ca.	407.00
8	-	-	> b ₂	0.10	unauff.	406.94
9	-	-	-	-	bds.	406.19
10	-	-	-	-	Pat.	405.75

Top 10 feature rankings per feature set (1 Rule-based features; 2 Statistical features; 3 Scaling features; 4 Language-dependent features; 5 Length features; 6 Word type features). Length (LT); w²: Weight based feature relevance criterion.

ISSN: 1472-6947