Skip to main content

Table 8 Result for the Reddit and Twitter models on in- and out-of-source test data sets compared to the baseline model trained on all of the data

From: Classifying patient and professional voice in social media health posts

Model(s)

Other: F1

Patient voice: F1

Prof. Voice: F1

Macro F1

Acc.

Test

Reddit

0.94

0.95

0.86

0.92

0.95

Reddit:

Twitter

0.74

0.69

0.00

0.47

0.71

3933

All

0.85

0.88

0.30

0.68

0.86

 

Reddit

0.83

0.50

0.00

0.44

0.73

Twitter:

Twitter

0.98

0.90

0.90

0.93

0.96

1941

All

0.90

0.64

0.26

0.60

0.83

 

Reddit&Twitter

0.96

0.95

0.88

0.92

0.95

All:

All

0.87

0.85

0.28

0.66

0.85

5474

  1. We also include the results for both models when tested each on in-source test data combined compared to the baseline model trained on all the data (last two rows). We report F1 scores per label, macro-average F1 and accuracy across all three label types as well as the size of the test set