Skip to main content

Table 4 The data summary of the training and testing datasets

From: A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts

Language Dataset #texts #sentences #clauses #temp. Exp. #ave. temp. exp. /text
Chinese Training 276 7747 23,423 3525 12.77
Testing 134 4196 11,827 1778 13.27
Total 400 11,943 35,250 5303 13.26
English Training 100 257 694 155 1.55
Testing 300 787 1703 398 1.33
Total 400 1044 2397 553 1.38