From: ParaMed: a parallel corpus for English–Chinese translation in the biomedical domain
Language
Articles
Sentences
Avg. Len.
Tokens
Unique Tokens
English
1,966
97,441
31.08
3,028,434
55,673
Chinese
29.93
2,916,779
46,700