Skip to main content

Table 4 The data summary of the training and testing datasets

From: A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts

Language

Dataset

#texts

#sentences

#clauses

#temp. Exp.

#ave. temp. exp. /text

Chinese

Training

276

7747

23,423

3525

12.77

Testing

134

4196

11,827

1778

13.27

Total

400

11,943

35,250

5303

13.26

English

Training

100

257

694

155

1.55

Testing

300

787

1703

398

1.33

Total

400

1044

2397

553

1.38