Skip to main content

Table 4 Summary of the feature settings. (The w denotes the window size. If the value is absent, only feature of the current token is used. The n denotes the n of the n-gram. The ‘len’ denotes the length of affixes. The matching features denote the result of controlled vocabulary matching)

From: Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition

Set

Token

Norm-token

n-gram

character affix

capitalization

POS/Chunk

Matching

#1-context

w = 3

w = 3

     

#2-morph

w = 3

w = 3

 

len = 2~3

w = 3

   

#3-i2b2

w = 5

w = 5

n = 2

w = 5

len = 2~7

w = 3

w = 1

  

#3-snuh

w = 5

w = 3

n = 2

w = 5

len = 2~3

  

modifier /control

#3-conll

w = 5

  

len = 3~4

w = 5

w = 5

n = 1

Â