From: An evaluation of GPT models for phenotype concept recognition
GPT version | Prompt | Precision | Recall | F1 | Precision | Recall | F1 |
---|---|---|---|---|---|---|---|
Document-level | Mention-level | ||||||
3.5 | 1 | 0.51 | 0.12 | 0.19 | 0.5 | 0.11 | 0.18 |
2 | 0.68 | 0.05 | 0.09 | 0.68 | 0.05 | 0.09 | |
3 | 0.27 | 0.29 | 0.28 | 0.26 | 0.25 | 0.25 | |
4 | 0.26 | 0.33 | 0.29 | 0.22 | 0.29 | 0.25 | |
5 | 0.31 | 0.2 | 0.24 | 0.3 | 0.17 | 0.22 | |
6 | 0.31 | 0.2 | 0.24 | 0.3 | 0.17 | 0.22 | |
7 | 0.56 | 0.56 | 0.56 | 0.54 | 0.49 | 0.51 | |
4 | 1 | 0.46 | 0.44 | 0.45 | 0.45 | 0.39 | 0.42 |
2 | 0.44 | 0.44 | 0.44 | 0.43 | 0.38 | 0.4 | |
3 | 0.47 | 0.43 | 0.45 | 0.47 | 0.37 | 0.41 | |
4 | 0.43 | 0.53 | 0.47 | 0.43 | 0.46 | 0.44 | |
5 | 0.44 | 0.27 | 0.33 | 0.43 | 0.24 | 0.31 | |
6 | 0.44 | 0.27 | 0.33 | 0.43 | 0.24 | 0.31 | |
7 | 0.78 | 0.73 | 0.75 | 0.77 | 0.64 | 0.7 |