Skip to main content

Table 5 Token length distribution in all training datasets.

From: Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)

 

Label

Average token length (±standard deviation)

Number of samples

Normal finding

0

182.92±12.62

1100

Unrelated finding

1

237.73±28.45

2851

Related finding

2

262.52±47.13

1913