Skip to main content

Table 4 Inter-annotator agreement scores per domain and data source reported in terms of average per label F1 scores, macro-averaged F1 and accuracy (and standard deviation in brackets)

From: Classifying patient and professional voice in social media health posts

 

Cardio/Reddit

Cardio/Twitter

Skin/Reddit

Skin/Twitter

All

F1: Other

0.90 (0.01)

0.93 (0.03)

0.89 (0.02)

0.95 (0.03)

0.93 (0.03)

F1: Patient voice

0.96 (0.01)

0.69 (0.09)

0.97 (0.01)

0.53 (0.19)

0.93 (0.03)

F1: Professional Voice

0.85 (0.03)

0.59 (0.07)

0.18 (0.15)

– (–)

0.59 (0.06)

Macro averaged F1

0.90 (0.03)

0.73 (0.06)

0.68 (0.05)

0.74 (0.11)

0.81 (0.04)

Accuracy (%)

0.94 (0.01)

0.87 (0.04)

0.95 (0.01)

0.91 (0.05)

0.92 (0.03)