Skip to main content

Table 5 Performance comparison for relation extraction models

From: Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

Settings

NLP Models

Strict

Lenient

Precision

Recall

F1 score

Precision

Recall

F1 score

Use gold-standard concepts

BERT_general

0.9199

0.9437

0.9316

0.9199

0.9437

0.9316

RoBERTa_general

0.9024

0.9574

0.9291

0.9024

0.9574

0.9291

BERT_MIMIC

0.9254

0.9254

0.9254

0.9254

0.9254

0.9254

RoBERTa_MIMIC

0.9147

0.9467

0.9304

0.9147

0.9467

0.9304

End-to-end

BERT_general_e2e

0.8397

0.8767

0.8578

0.8712

0.9056

0.8881

RoBERTa_general_e2e

0.8274

0.8904

0.8578

0.8565

0.9178

0.8861

BERT_MIMIC_e2e

0.8282

0.8584

0.843

0.8584

0.8858

0.8719

RoBERTa_MIMIC_e2e

0.8362

0.8782

0.8567

0.8688

0.9072

0.8876

  1. *Best precision, recall, and F1 are highlighted in bold. The strict and lenient scores are identical for the ‘gold-standard’ settings as the gold-standard annotation for concepts and attributes were used