Skip to main content

Table 4 Hyperparameters

From: Chinese medical entity recognition based on the dual-branch TENER model

Parameter Name

Value

Clipping gradient

2

The number of TENER encoder layer

24

Dropout

0.3

Hidden size

768

Weight decay

0.01

The number of hidden layer

12

Gradient accumulation steps

2

The number of attention head

12

Hidden activation function

GELU

Epoch

200

Max sequence len

128

Batch size

16