From: A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
Dataset
Notes
Sentences
words
Training
1440
6867
158,035
valid
180
813
19,290
test
857
21,472
total
1800
8537
198,797