Skip to main content

Table 3 Abbreviation detection.

From: Detection of sentence boundaries and abbreviations in clinical narratives

Top 10

1

w 2

2

w 2

3

w 2

1

Contains period

0.30

C(L norm , •)

1.34

S2

3897.48

2

All upper case

0.02

logλ

0.80

S3

3222.35

3

Contains digit

0.01

C(L norm , ¥)

0.43

S4

2592.76

4

-

-

C(¬L norm , ¬•)

0.31

S2, S3

2329.77

5

-

-

C(¬L norm , •)

0.19

S4, S5

847.88

6

-

-

-

-

S5

706.98

7

-

-

-

-

S2, S4, S5

511.38

8

-

-

-

-

S2, S5

412.86

9

-

-

-

-

S3, S4

204.80

10

-

-

-

-

S2, S3, S4

139.36

Top 10

4

w 2

5

w 2

6

w 2

1

∈ MDDict

0.34

LT border b2

16.15

St.p.

409.58

2

-

-

LT border b1

16.15

Amb.

409.51

3

-

-

LT border b3

16.15

o.B.

409.09

4

-

-

LT

8.74

re.

407.87

5

-

-

Mean-LT

8.74

Z.n.

407.35

6

-

-

> b 1

0.54

li.

407.28

7

-

-

> b 3

0.16

ca.

407.00

8

-

-

> b 2

0.10

unauff.

406.94

9

-

-

-

-

bds.

406.19

10

-

-

-

-

Pat.

405.75

  1. Top 10 feature rankings per feature set (1 Rule-based features; 2 Statistical features; 3 Scaling features; 4 Language-dependent features; 5 Length features; 6 Word type features). Length (LT); w2: Weight based feature relevance criterion.