Skip to main content

Table 1 The statistics of the NCBI dataset for disease NER

From: SBLC: a hybrid model for disease named entity recognition based on semantic bidirectional LSTMs and conditional random fields

Characteristics

Training

Developing

Testing

Total

# of PubMed article abstracts

593

100

100

793

# of annotated disease mentions

5145

787

960

6892

# of unique annotated disease mentions

1710

368

427

2136

Avg. sentences/abstract

10

10

10

10

Avg. words/sentence

20

22

22

21

Avg. words/abstract

217

226

232

225