From: Chinese medical entity recognition based on the dual-branch TENER model
Parameter Name | Value |
---|---|
Clipping gradient | 2 |
The number of TENER encoder layer | 24 |
Dropout | 0.3 |
Hidden size | 768 |
Weight decay | 0.01 |
The number of hidden layer | 12 |
Gradient accumulation steps | 2 |
The number of attention head | 12 |
Hidden activation function | GELU |
Epoch | 200 |
Max sequence len | 128 |
Batch size | 16 |