LiSA: an assisted literature search pipeline for detecting serious adverse drug events with deep learning

Table 3 Measured Precision, Recall and F1-score performances on the three NLP tasks implemented in the pipeline on test sets

	AE-Drug relationship classification			Named Entity Recognition			Seriousness classification
	P	R	F1	P	R	F1	P	R	F1
UMLSBERT	0.94	0.93	0.93	0.94	0.96	0.95	0.89	0.87	0.88
bioBERT	0.91	0.93	0.92	0.96	0.95	0.95	0.89	0.90	0.89
blueBERT	0.93	0.89	0.91	0.96	0.93	0.94	0.73	0.83	0.78
sciBERT	0.94	0.92	0.93	0.95	0.95	0.95	0.92	0.81	0.86
Bio_ClinicalBERT	0.94	0.92	0.93	0.97	0.92	0.94	0.68	0.93	0.79
BERT	0.90	0.89	0.90	0.95	0.92	0.93	0.76	0.74	0.75
PubMedBERT	0.95	0.90	0.92	0.96	0.95	0.96	0.87	0.91	0.89

The best value per column is in bold. ThFor the drug/AE entity recognition task, the displayed metrics only concern the AE class. The best model was selected for each task, PubMedBERT for NER and seriousness classification, UMLSBERT for AE-Drug relationship classification

ISSN: 1472-6947