Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

Table 5 Performance comparison for relation extraction models

Settings	NLP Models	Strict			Lenient
Settings	NLP Models	Precision	Recall	F1 score	Precision	Recall	F1 score
Use gold-standard concepts	BERT_general	0.9199	0.9437	0.9316	0.9199	0.9437	0.9316
	RoBERTa_general	0.9024	0.9574	0.9291	0.9024	0.9574	0.9291
	BERT_MIMIC	0.9254	0.9254	0.9254	0.9254	0.9254	0.9254
	RoBERTa_MIMIC	0.9147	0.9467	0.9304	0.9147	0.9467	0.9304
End-to-end	BERT_general_e2e	0.8397	0.8767	0.8578	0.8712	0.9056	0.8881
	RoBERTa_general_e2e	0.8274	0.8904	0.8578	0.8565	0.9178	0.8861
	BERT_MIMIC_e2e	0.8282	0.8584	0.843	0.8584	0.8858	0.8719
	RoBERTa_MIMIC_e2e	0.8362	0.8782	0.8567	0.8688	0.9072	0.8876

^*Best precision, recall, and F1 are highlighted in bold. The strict and lenient scores are identical for the ‘gold-standard’ settings as the gold-standard annotation for concepts and attributes were used

ISSN: 1472-6947