Drug knowledge discovery via multi-task learning and pre-trained models

Table 2 Model comparison in development set with different pre-trained models

Task	Model	P	R	F₁
Trigger words recognition	BiLSTM + CRF	0.478	0.408	0.440
	BERT_base	0.497	0.448	0.471
	NCBI BERT	0.553	0.453	0.498
	ClinicalBERT	0.523	0.486	0.504
	BioBERT	0.511	0.529	0.519
Thematic roles identification	BERT_base	0.758	0.890	0.818
	NCBI BERT	0.778	0.879	0.826
	ClinicalBERT	0.796	0.913	0.850
	BioBERT	0.807	0.891	0.847
	ClinicalBERT-TS	0.810	0.917	0.860
	BioBERT-TS	0.813	0.894	0.852

The models (except BiLSTM + CRF) are jointly trained by using NCBI dataset, BC5CDR dataset, and our training set. BioBERT performs better than others in Task 1, while ClinicalBERT achieves best F₁ in Task 2. The two-step training process (i.e., TS) further improves the performance

ISSN: 1472-6947