Skip to main content

Table 5 Training and test datasets

From: Deep learning approach to detection of colonoscopic information from unstructured reports

Dataset a

D1

D2

D3

D4

D5

Number of documents

1,000

2,000

3,000

4,000

5,000

Number of sentences

16,417

32,821

49,048

65,279

81,668

Number of words

92,315

184,928

277,266

369,063

461,713

Number of types of words

2,001

2,771

3,410

3,922

4,478

  1. aThe dataset sizes were increased by 1,000 to compare the performance according to the amount of data. For evaluation, fivefold cross-validation was applied