Skip to main content

Table 3 Mention-level evaluation results of the state-of-the-art methods for phenotype concept recognition and for reference purposes the results of the best-performing GPT models

From: An evaluation of GPT models for phenotype concept recognition

 

HPO-GS

BioC-GS

Tool

Precision

Recall

F1

Precision

Recall

F1

PhenoTagger [12]

0.77

0.68

0.72

0.74

0.52

0.61

ClinPheno [26]

0.73

0.36

0.48

0.47

0.57

0.52

Doc2HPO [25]

0.81

0.50

0.62

0.84

0.29

0.43

Monarch Annotator [9]

0.82

0.50

0.62

0.47

0.46

0.46

NCBO Annotator [27]

0.66

0.49

0.56

0.78

0.41

0.54

Best non in-contex learning gpt (gpt-4, Prompt 4)

0.32

0.23

0.27

0.43

0.46

0.44

Best gpt-4 (Prompt 7; in-context learning)

0.73

0.3

0.43

0.77

0.64

0.7

Best gpt-3.5 (Prompt 7; in-context learning)

0.28

0.25

0.26

0.54

0.49

0.51