Classifying patient and professional voice in social media health posts

Table 4 Inter-annotator agreement scores per domain and data source reported in terms of average per label F1 scores, macro-averaged F1 and accuracy (and standard deviation in brackets)

	Cardio/Reddit	Cardio/Twitter	Skin/Reddit	Skin/Twitter	All
F1: Other	0.90 (0.01)	0.93 (0.03)	0.89 (0.02)	0.95 (0.03)	0.93 (0.03)
F1: Patient voice	0.96 (0.01)	0.69 (0.09)	0.97 (0.01)	0.53 (0.19)	0.93 (0.03)
F1: Professional Voice	0.85 (0.03)	0.59 (0.07)	0.18 (0.15)	– (–)	0.59 (0.06)
Macro averaged F1	0.90 (0.03)	0.73 (0.06)	0.68 (0.05)	0.74 (0.11)	0.81 (0.04)
Accuracy (%)	0.94 (0.01)	0.87 (0.04)	0.95 (0.01)	0.91 (0.05)	0.92 (0.03)

ISSN: 1472-6947