Deep learning approach to detection of colonoscopic information from unstructured reports

Table 5 Training and test datasets

Dataset ^a	D1	D2	D3	D4	D5
Number of documents	1,000	2,000	3,000	4,000	5,000
Number of sentences	16,417	32,821	49,048	65,279	81,668
Number of words	92,315	184,928	277,266	369,063	461,713
Number of types of words	2,001	2,771	3,410	3,922	4,478

^aThe dataset sizes were increased by 1,000 to compare the performance according to the amount of data. For evaluation, fivefold cross-validation was applied

ISSN: 1472-6947