Skip to main content

Table 1 Description of the dataset by health topic

From: AutoDiscern: rating the quality of online health information with hierarchical encoder attention-based neural networks

Topic

Breast Cancer

Arthritis

Depression

Number of Articles

79

88

102

Number of Sentences

10,170

10,950

13,790

Number of Tokens

125,891

129,759

160,597

Avg Sentences per Article

129

124

135

Avg Tokens per Article

1,549

1,475

1,574

Positive Class Prevalence

   

Q4: References

13%

14%

14%

Q5: Date

20%

26%

24%

Q9: How Treatment Works

85%

28%

52%

Q10: Treatment Benefits

89%

80%

65%

Q11: Treatment Risks

63%

16%

33%