Skip to main content

Table 7 Automatic evaluation metrics for BERTSUM, BART, and ChatGPT summaries, such as comparisons of ROUGE-1, ROUGE-2, ROUGE-L and BERTScore scores

From: Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences

Model

ROUGE-1

ROUGE-2

ROUGE-L

BERTScore

BERTSUM_Classifier

34.51

13.94

24.47

63.53

BERTSUM_Transfromer

33.52

13.21

24.06

63.21

BERTSUM_RNN

33.73

13.62

24.17

63.18

BART

55.39

40.38

54.21

78.32

ChatGPT(Prompt_T.7)

48.19

25.41

40.81

73.38