Data split/annotation method | #Sentences | #DOCTOR | #PATIENT | #DATE | #VILLE | #ZIP | #STR | #PHONE | |
---|---|---|---|---|---|---|---|---|---|
training/automatic | 4,948,186 | 3,883,360 | 1,853,646 | 4,948,519 | 2,544,287 | 1,305,402 | 1,165,009 | 276,208 | 2,210,577 |
validation/automatic | 608,305 | 479,821 | 229,925 | 607,383 | 315,771 | 162,791 | 144,654 | 35,288 | 271,081 |
test/automatic | 620,581 | 489,008 | 232,030 | 620,028 | 322,279 | 165,492 | 147,432 | 35,322 | 276,873 |
test/manual | 23,196 | 1206 | 510 | 2078 | 764 | 293 | 234 | 96 | 545 |