Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences

Table 7 Automatic evaluation metrics for BERTSUM, BART, and ChatGPT summaries, such as comparisons of ROUGE-1, ROUGE-2, ROUGE-L and BERTScore scores

Model	ROUGE-1	ROUGE-2	ROUGE-L	BERTScore
BERTSUM_Classifier	34.51	13.94	24.47	63.53
BERTSUM_Transfromer	33.52	13.21	24.06	63.21
BERTSUM_RNN	33.73	13.62	24.17	63.18
BART	55.39	40.38	54.21	78.32
ChatGPT(Prompt_T.7)	48.19	25.41	40.81	73.38

ISSN: 1472-6947