Skip to main content

Table 11 Sentence detection.

From: Detection of sentence boundaries and abbreviations in clinical narratives

Top 10

[1-4]

w 2

[1-5]

w 2

1

Capitalization

11.34

LT

674.21

2

LT

10.52

Mean-LT

674.21

3

Mean-LT

10.52

Capitalization

637.85

4

No "\n"

4.82

Rippenanteile RC

627.54

5

∈ CCDict

4.08

Lymphknoten RC

356.25

6

Double "\n"

3.77

Double "\n"

336.64

7

All upper case

0.97

Lungengerüstzeichnung RC

332.50

8

Contains digit

0.71

Integument RC

321.86

9

> b3

0.31

No "\n"

300.18

10

Contains period

0.19

Normale RC

277.68

Top 10

[1-6]

w 2

[1-7]

w 2

1

Capitalization

971.25

Abbreviation

1326.41

2

Mean-LT

840.45

Capitalization

867.06

3

LT

840.45

o.B.

382.83

4

Double "\n"

341.46

Double "\n"

374.57

5

No "\n"

324.25

No "\n"

364.32

6

o.B.

259.13

bds.

282.13

7

Rippenanteile RC

254.91

CT RC

266.54

8

mitresez.

254.91

Leberlappen RC

225.08

9

CT RC

251.41

A.

206.77

10

Leberlappen RC

236.26

Narbige RC

191.01

  1. Top 10 feature rankings per feature set (1 Language features; 2 Rule-based features; 3 Text format features; 4 Word length features; 5 Right context word type features (RC); 6 Word type features; 7 Abbreviation feature). Length (LT); w2: Weight based feature relevance criterion.