The QUEST for quality online health information: validation of a short quantitative tool

Background Online health information is unregulated and can be of highly variable quality. There is currently no singular quantitative tool that has undergone a validation process, can be used for a broad range of health information, and strikes a balance between ease of use, concision and comprehensiveness. To address this gap, we developed the QUality Evaluation Scoring Tool (QUEST). Here we report on the analysis of the reliability and validity of the QUEST in assessing the quality of online health information. Methods The QUEST and three existing tools designed to measure the quality of online health information were applied to two randomized samples of articles containing information about the treatment (n = 16) and prevention (n = 29) of Alzheimer disease as a sample health condition. Inter-rater reliability was assessed using a weighted Cohen’s kappa (κ) for each item of the QUEST. To compare the quality scores generated by each pair of tools, convergent validity was measured using Kendall’s tau (τ) ranked correlation. Results The QUEST demonstrated high levels of inter-rater reliability for the seven quality items included in the tool (κ ranging from 0.7387 to 1.0, P < .05). The tool was also found to demonstrate high convergent validity. For both treatment- and prevention-related articles, all six pairs of tests exhibited a strong correlation between the tools (τ ranging from 0.41 to 0.65, P < .05). Conclusions Our findings support the QUEST as a reliable and valid tool to evaluate online articles about health. Results provide evidence that the QUEST integrates the strengths of existing tools and evaluates quality with equal efficacy using a concise, seven-item questionnaire. The QUEST can serve as a rapid, effective, and accessible method of appraising the quality of online health information for researchers and clinicians alike. Electronic supplementary material The online version of this article (10.1186/s12911-018-0668-9) contains supplementary material, which is available to authorized users.


Background
The Internet has revolutionized how information is distributed and has led to the rapid expansion of health resources from a wide variety of content providers, ranging from government organizations to for-profit companies. Consulting online health information is an increasingly popular behavior, with 80% of Internet users engaging in this activity [1]. Health information consumers worldwide, particularly those in developing countries and remote areas, may benefit from accessible and immediate retrieval of up-to-date information [2,3]. This new information gateway also promotes autonomy by allowing patients to be more active in their health [4].
The dynamic nature of the Internet, however, introduces important concerns in parallel with these benefits. Online information is unregulated and can be of highly variable quality [5]. This has critical implications for users as it is estimated that over half of the adult population in the United States and Canada does not possess an adequate level of health literacy [6,7], and low health literacy is negatively correlated with the ability to discriminate between high and low quality eHealth information [8]. Compounding this issue, there is a growing number of individuals who use online information to guide health care decisions, either for themselves or on behalf of another person. It is therefore crucial to develop effective methods to evaluate online health information [9]. To this end, there have been many efforts to develop tools that assess the quality of online health information; while such tools will not solve the issue of regulation, they can assist end-users, health care professionals and researchers in differentiating between high-and low-quality online sources.
A scoping review of the literature on the evaluation of health information was conducted using Arksey and O'Malley's six-stage methodological framework [10]. The scoping review aimed to identify existing health information evaluation tools and information available in the literature on their demonstrated validity and reliability. An iterative team approach was used to determine a search strategy balancing feasibility and comprehensiveness. Data was collected via keyword searches and citation searches on Google Scholar and PubMed. Seven combinations of following keywords were used: online, health information, evaluate, evaluation, tool, quality, validity, testing, validation, and assessment. A total of 49 records were retrieved between January 15, 2016 and February 5, 2016. Thirty-six 1 of these articles were included in the review based on the following inclusion criteria: 1) the article is in the English language; 2) validation of an assessment tool related to quality of health information was the focus of the article. Fifteen tools 2 currently available in the literature were identified in the scoping review. A follow-up search was conducted on September 10, 2018, yielding three additional tools: the Quality Index for health-related Media Reports (QIMR) [11], the "Date, Author, References, Type, and Sponsor" (DARTS) tool [12] and Index of Scientific Quality (ISQ) [13]. The tools identified range from generic assessments, intended for use across multiple domains of online health information, to assessments targeted to a specific: 1) health condition [14,15]; 2) aspect of a condition such as treatment [12,16]; 3) audience [17,18]; or 4) type of media [11,13]. As such, a disadvantage of existing tools is that they are limited in the scope of their application.
Many of the existing tools identified, with some notable exceptions, are lengthy and potentially arduous to use, out-dated, or no longer available online [3]. Some tools consist of sets of criteria or checklists that do not provide a quantitative result, making it difficult to compare information from different sources. Finally, while there are many studies evaluating online health information using existing quality evaluation tools, studies assessing the validity, reliability, and efficacy of the tools themselves are lacking in the medical informatics literature.
At present, there is no clear universal standard for evaluating the quality of online health information [3]. Many researchers and regulatory bodies, including the World Health Organization, have called for the establishment of such a standard [9]. Quality criteria across existing tools often overlap and thus may serve as the basis for developing a universalized set of criteria. Aslani et al. distilled a total of 34 criteria from five evaluation tools into 10 general criteria, subdivided into four categories: author, sponsors, and individual(s) responsible for the website; purpose of the website and supporting evidence; design, ease of use, privacy, and interactibility of the website; and date of update [19]. These aggregate criteria largely correspond to groupings of criteria generated in previous reviews of the literature [20,21]. The criteria also align with the "5 C's" of website quality (credibility, currency, content, construction, and clarity) outlined by Roberts [22].
Of the many criteria-based assessment tools that have been developed, only a fraction have been tested for inter-rater reliability and even fewer have been validated [23]. Of tools that have reported measuring inter-rater reliability, few have consistently achieved acceptable levels of agreement across all criteria [24]. Gagliardi and Jadad [25] found that only five of 51 rating instruments they evaluated provided explicit evaluation criteria and none were validated. In a more recent review of 12 instruments by Breckons et al. [23], only two tools, DISCERN and the LIDA Minervation tool, contained any measure of reliability and validity. The DISCERN tool is the only tool currently available online for which substantive validation data is publicly available. During development of the tool, a questionnaire administered to information providers and self-help organizations was used to establish face and content validity and inter-rater reliability [16]. Additionally, external assessments indicated significant correlation with content coverage and correctness [26], good internal consistency, and significant inter-rater reliability [27]. Past comparisons to other tools, including the Mitretek Information Quality Tool (IQT) [27], Sandvik quality scale [28], EQIP [17], and DARTS [26], found significant convergent validity with DIS-CERN. However, DISCERN is limited in its scope of application as it is focused on treatment information and as such is not applicable to online content about other aspects of health and illness including prevention and diagnosis.
There is currently no singular quantitative tool that has undergone a validation process, can be used for a broad range of health information, and strikes a balance between ease of use, concision and comprehensiveness (Fig. 1). To address these gaps, we developed the QUality Evaluation Scoring Tool (QUEST). The QUEST quantitatively measures six aspects of the quality of online health information: authorship, attribution, conflict of interest, currency, complementarity, and tone ( Fig. 2), yielding an overall quality score between 0 and 28. Attribution is measured through two items, yielding a seven-item evaluation for six measures of health information quality. The criteria were chosen based on a review of existing tools used to evaluate the quality of online information by Chumber et al. [29], Sandvik et al. [28], and Silberg et al. [30]; content analysis was used to capture the overarching categories assessed by these tools [31].
When applying the QUEST, each of the seven quality items is assigned a weighted score. The weighting of each criterion was developed based on two factors: (i) how critical it is to the overall quality of the article, established by a preliminary analysis of a sample Fig. 1 Review of existing quality evaluation tools (n = 16). Adapted from the CONSORT 2010 Flow Diagram available at http://www.consort-statement.org/consort-statement/flow-diagram of websites, and (ii) consideration of the criterion's ethical implications. One criterion, attribution, is measured through a two-step process by identifying (1) the presence of references to scientific studies and, (2) the type of studies referenced, if any (e.g., animal models, observational studies, meta-analyses, clinical trials). The second item, which assigns a ranking based on the types of studies included, is in accordance with the GRADE criteria for clinical evidence [32]. This item is scored as a support to the overall quality of the health information presented, not as a judgment of the referenced studies' quality.
The aim of the present study was to evaluate whether the QUEST reliably measures a similar concept of quality to existing tools. Here we present the results of the inter-rater reliability and convergent validity analyses.

Sample
For the purposes of this study, Alzheimer disease (AD) was used as the reference health condition as there is an abundance of online articles on this topic [33,34], and there are established methodologies for sampling in this field [31]. Online articles containing AD treatment information were retrieved using a location-disabled search on Google.com/ncr (no country redirect) to avoid localized results. Searches were conducted on an application that prevents the collection of browsing history and cookies during the search and browsing history and cookies were cleared before each search to ensure that search results were not influenced by these factors. Forty-eight different combinations of search terms related to the treatment Authorship (Score x 1) 0 -No indication of authorship or username 1 -All other indications of authorship 2 -Author's name and qualification clearly stated

Attribution
(Score x 3) 0 -No sources 1 -Mention of expert source, research findings (though with insufficient information to identify the specific studies), links to various sites, advocacy body, or other 2 -Reference to at least one identifiable scientific study, regardless of format (e.g., information in text, reference list) 3 -Reference to mainly identifiable scientific studies, regardless of format (in >50 of claims) For all articles scoring 2 or 3 on Attribution: (Score x 1) Type of study 0 -In vitro, animal models, or editorials 1 -All observational work 2 -Meta-analyses, randomized controlled trials, clinical studies Conflict of interest (Score x 3) 0 -Endorsement or promotion of intervention designed to prevent or treat condition (e.g., supplements, brain training games, foods) within the article 1 -Endorsement or promotion of educational products & services (e.g., books, care home services) 2 -Unbiased information

Currency
(Score x 1) 0 -No date present 1 -Article is dated but 5 years or older 2 -Article is dated within the last 5 years

Complementarity
(Score x 1) 0 -No support of the patient-physician relationship 1 -Support of the patient-physician relationship Tone (includes title) (Score x 3) 0 -Fully supported (authors fully and unequivocally support the claims, strong vocabulary such as "cure", "guarantee", and "easy", mostly use of non-conditional verb tenses ("can", "will"), no discussion of limitations) 1 -Mainly supported (authors mainly support their claims but with more cautious vocabulary such as "can reduce your risk" or "may help prevent", no discussion of limitations) 2 -Balanced/cautious support (authors' claims are balanced by caution, includes statements of limitations and/or contrasting findings) of AD were used. Articles were extracted from the first three pages of search results, based on analyses of aggregate data on online activity patterns indicating that most Internet users tend not to view past the third page of search results [35]. Each page of search results was comprised of nine articles, totalling 27 articles for each key word combination. Inclusion criteria for the articles were: 1) the article is in the English language; 2) no payment or login is required to access the article; 3) treatment of AD is the main focus of the article as determined by the content of the headline and lead paragraph; and 4) treatment interventions discussed in the article are not solely based on animal experiments. An automatic number generator was used to obtain random 10% samples of articles that met these inclusion criteria in this present study. In a separate sample, online articles containing information about the prevention of AD were retrieved using similar methods. To retrieve these articles, 105 combinations of search terms related to AD prevention were used. Articles were screened according to criteria 1, 2, and 3 of the inclusion criteria used for treatment articles, with the exception that criterion 3 focused on prevention rather than treatment. As with the treatment-related articles, a random 10% sample of relevant articles was used for validation. In the present study, an article is defined as the heading on a webpage and the text associated with it, excluding links, images and advertising outside of the main body of text. We selected this sampling strategy based on previous investigations of inter-rater reliability and validity of similar tools that have assessed samples of 12 to 40 websites [23,26,27,36,37].

Reliability analysis
The QUEST was applied to each sample of online articles by two independent raters (JJ and TF for the prevention sample and JJ and JL for the treatment sample). Two of the three raters were naïve to tool development. To evaluate inter-rater agreement between the two reviewers, a weighted Cohen's kappa (κ) was calculated for each item of the tool. Agreement was interpreted according to Landis and Koch, where a κ-value of 0.0 to 0.2 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates almost perfect or perfect agreement [38]. Following initial ratings of the samples, remaining disagreements were resolved by discussion to achieve 100% agreement.

Validity analysis
Three tools were selected for comparison with the QUEST based on availability, ubiquity of use, and relatedness of quality criteria and were applied to both samples. The Health on the Net Foundation's HONcode Code of Conduct and the DISCERN instrument [16] are two of the most widely used and cited quality evaluation tools [5]. The DISCERN instrument is a 16-item questionnaire intended specifically for evaluation of health information on treatment choices, and has been found to demonstrate good inter-rater reliability and face and content validity. The HONcode Code of Conduct is a set of eight criteria used to certify websites containing health information [5]; its creators also developed a Health Website Evaluation Tool, which was used in this analysis due to its closer similarity in purpose and format to the QUEST and other tools. General quality items developed by Sandvik comprised the final tool for comparison [28]. All three tools selected for comparison are criteria-based, can be applied by a non-expert user, and contain quality criteria that, in general, align categorically with each other and the QUEST ( Table 1).
The QUEST and the three tools for comparison were applied to the 10% sample of treatment-related articles and the 10% sample of prevention-related articles by one investigator. The numeric scores obtained by each tool were converted to percentage scores to facilitate comparison across tools. The distribution of quality scores generated by the QUEST was plotted as a histogram to determine whether a spectrum of quality was captured by the sample (see Fig. 1, Robillard and Feng 2017 [31]).
For each tool, the articles were ranked based on their scores and rankings were compared across tools in order to measure convergence. To accomplish this, a two-tailed Kendall's tau (τ) ranked correlation [39] was used to measure convergence at α = .05. Confidence intervals (CI) of 95% for τ were calculated using Z 0.05 . Six correlational tests, each comparing a unique pair of tools, were performed to compare the results of the QUEST, HONcode, Sandvik, and DISCERN tools. This process was carried out for both the samples of treatment-and prevention-related articles.

Sample
A total of 496 treatment articles were retrieved, with 163 of the articles meeting criteria for inclusion in the analysis and the random 10% sample consisted of 16 articles (Additional file 1). Similarly, a sample of 308 prevention articles were collected, 296 of which met inclusion criteria and 29 articles were included in the random 10% sample (Additional file 2). These articles were analyzed using the QUEST in previous quality analysis studies of articles about the prevention of AD [31]. The scores generated by each of the tools for the treatment and prevention samples are included in additional files [see Additional files 1 and 2].

Reliability analysis Treatment
The level of inter-rater reliability was substantial between the reviewers for Attribution (κ = 0.79), high to near perfect for authorship, currency, complementarity and tone (κ ranging from 0.86 to 0.91), and perfect for type of study and conflict of interest ( Table 2).

Prevention
Inter-rater reliability between the two reviewers ranged from substantial to perfect agreement for each of the seven items included in the QUEST (κ ranging from 0.74 to 1.0; Table 3).

Validity analysis Treatment
Scores obtained using HONcode had the widest range, 15-100%. Scores obtained using the Sandvik criteria had a narrower range, 43-100%. The DIS-CERN instrument returned the narrowest range of scores, 45-86%. The QUEST generated a range of scores (25-100%) wider than those generated by both the DISCERN tool and Sandvik criteria, but narrower than that of HONcode.
The median percentage scores returned by the DISCERN and HONcode tools were 59% and 62% respectively, while the Sandvik criteria generated a median score of 86%. Again, the median score generated by the QUEST, 71%, fell between those of the other instruments.
Quality analysis of the prevention-related articles generated similar results. HONcode generated the widest range of scores (22-100%), while DISCERN returned the narrowest range (30-88%). The range of scores obtained using the Sandvik criteria (29-93%) fell between the ranges generated by the HONcode and DISCERN instruments. The QUEST generated a range of scores (29-96%) wider than those of DIS-CERN and Sandvik, but narrower than that of HONcode.
On the lower end, the median percentage score obtained using the DISCERN criteria was 54%. On the upper end, the median score generated by HONcode was 68%. Between these values, both the Sandvik criteria and the QUEST returned a median score of 64%.
Of the six correlational tests performed between unique pairs of tools on the articles related to treatment, all six tests demonstrated a significant correlation between the tools ( Table 4). Values of τ ranged from 0.47 (QUEST and HONcode) and 0.53 (HONcode and Sandvik) on the lower end to 0.62 (QUEST and Sandvik) and 0.65 (QUEST and DISCERN) on the higher end (P < .05 for all tests).

Prevention
Similarly, all six correlational tests performed on the prevention sample demonstrated a significant  correlation between the tools (P < .05; Table 5). The weakest correlations were found between Sandvik and DISCERN, and the QUEST and DISCERN, which produced τvalues of 0.41 and 0.55 respectively. The strongest correlations were found between the QUEST and Sandvik (τ =0.62) and the QUEST and HONcode (τ = 0.64).

Discussion
In the present study to validate a novel tool to assess the quality of health information available on the Internet, we find the QUEST to have high inter-rater reliability and convergent validity when applied to two samples of online articles on AD. The results of the validity analysis of treatment and prevention samples indicate that the rankings of quality scores generated by the QUEST converge with those generated by three other toolsthe HONcode Health Website Evaluation tool, the DISCERN instrument, and the Sandvik criteria.
For the sample of articles on AD treatment, the strong correlation between the QUEST and the DIS-CERN instrument suggests that these tools evaluate a similar concept of quality. As past findings indicate that the DISCERN tool is itself a valid tool for assessing treatment information, its high level of convergence with the QUEST confers promising preliminary evidence for the validity of the QUEST. One limitation of the DISCERN tool is the ambiguity in applying a Likert scale to the data. The QUEST addresses this limitation by providing specific descriptions of the criteria for each possible score for a given item. The QUEST's lower level of convergence with the HONcode's evaluation of treatment-related articles may indicate a wider gap between interpretations of the concept of quality evaluated by these two tools. The HONcode tool places emphasis on aspects that are not assessed by the QUEST, such as the website's mission, target audience, privacy policy, and interactivity [40], all of which expand on the concept of quality but increase the time required to apply the tool. However, there may be other factors that account for the discrepancy between the tools' rankings. There exist some ambiguities in scoring websites using HONcode that are intrinsic to the design of the tool. For example, with a few exceptions, the HONcode rates questions on a dichotomous scale (Yes/ No). This rating system, unlike the Likert-type scales used by the QUEST, DISCERN, and Sandvik [28], does not allow for an assessment beyond an absence or presence of criteria. Finally, some criteria are only marginally or not applicable to many websites' content. For example, one question asks the responder to evaluate banner content, and website design has moved away from these types of site elements.
Analysis of the scores generated from the sample of prevention-related articles found the strongest correlation between the QUEST and HONcode. Conversely, the QUEST displayed the poorest convergence with the DISCERN instrument. The discrepancy between these findings and those from the treatment sample, which found the strongest   convergence between the QUEST and DISCERN and the weakest between the QUEST and HONcode, may reflect intrinsic differences in the purpose of the tools. The DISCERN instrument was developed specifically for the quality evaluation of treatment information, whereas the QUEST, HONcode and Sandvik criteria were developed for health information more broadly.
Overall findings demonstrate a high degree of inter-rater reliability for all seven items of the QUEST. In their evaluation of the DISCERN instrument, Charnock et al. [16] found that lower agreement scores were generally associated with criteria that required more subjective assessment, such as ratings about areas of uncertainty or questions requiring scaled responses. Results from the current study indicate that more subjective items in the QUEST, such as attribution, conflict of interest and tone, achieve about equal or higher levels of inter-reliability as more objective items. Results from the reliability analysis suggest that the QUEST criteria may serve as an effective framework for current as well as future iterations of quality evaluation resources.
The QUEST offers three main advantages over existing tools. Foremost, the QUEST condenses a wide range of quality evaluation criteria into a brief, seven-item questionnaire that evaluates quality with comparable efficacy to established tools. This concise design in conjunction with a weighted criteria approach facilitates the rapid evaluation of health information for a diverse group of users. For example, health care professionals may use the QUEST to evaluate the quality of information brought to them by their patients or to find high-quality articles to recommend. The QUEST may also be of value to the scientific community as it can be used as a research tool to quickly and accurately evaluate quality, facilitating the characterization and comparison of large amounts of information. Additionally, the QUEST may help inform creators of online health content, including government, industry, university, and advocacy groups, during the content development process.
In terms of content, the QUEST tool is differentiated from the three other tools included in the present analysis in its weighted measurement of tone, conflict of interest, and complementarity (see Table 1). These criteria address factors such as potential bias linked to promotion of a product or intervention, whether support of the patient-physician relationship is referenced, and whether the information is presented in a balanced way.
Finally, the QUEST was designed for application to a variety of health topics including information on both treatment and prevention, as well as general health information. Altogether these characteristics, combined with evidence of the QUEST's reliability and validity, are reflective of a versatile tool suited to meet diverse user needs. It is important to note that each individual item provides information about only a single aspect of information quality, and thus the QUEST should be used as a gestalt to provide an overall assessment of quality.
It should be noted that while the QUEST is designed to be a concise and universally applicable tool, there is a range of other evaluation tools in the literature with different and potentially complementary aims to QUEST (please see Appendix 2 for a comparison of currently available tools to QUEST). For example, the QIMR tool released in 2017 may be more suited for evaluating health research reports in the lay media and the AGREE instrument may be best suited to evaluating the quality of clinical practice guidelines. While the versatility of the QUEST tool lies in its applicability to a range of online health information, is not necessarily the only or most suitable tool for all types of health-related media.
The focused area of the samples used in this study addresses an important and growing issue relating to the quality of online health information targeted toward aging populations, who face unique challenges in cognition which can be exacerbated by low health literacy [41]. Additionally, older adults tend to have less experience conducting online searches and critically evaluating the credibility of online information [42,43]. Due to this combination of factors, this demographic of health consumers may be more susceptible to misinformation online. Beyond the focus on AD used for this validation study, the QUEST will benefit from further testing across a wider range of health conditions.
The study design has several strengths. The correlational method used does not rely on an assumption of normality of the data, and the magnitudes of the correlation coefficients indicate the strength of correlation between the tools being compared [39]. We conducted more than one analysis on the data, comparing the QUEST to three well-established and well-regarded evaluation tools. Careful selection of tools for comparison and use of multiple tools in the analysis both contribute to the rigour of the study.
However, we also recognize the limitations of the study. A sample of convenience of a relatively small number of articles was used, taken from existing collections of AD treatment and prevention articles. Due potentially to the small sample size of articles used, the Kendall's tau scores have substantially overlapping confidence intervals; this indicates a need for further validation studies that include a larger number of articles on other health conditions and on types of health information beyond treatment and prevention, such as descriptions of symptoms and management. Furthermore, our study included only three raters, whereas it may be useful to include more raters in the future when assessing inter-rater reliability. It may also be informative to assess the predictive validity of real-life application of the tool. This may be used to predict whether sustained use of the instrument is associated with higher levels of user knowledge, engagement with care providers on the health topic, or self-efficacy in management of the health condition researched.
Additionally, existing quality evaluation tools generally adopt the perspective of the health care professional in conceptualizing quality [27]. We recognize that the QUEST tool, currently aimed at health care professionals and researchers, falls into this category. Given the time constraints of clinical visits, health care professionals may not be able to assess the quality of online resources during the consultation. To address this issue, attempts have been made to automate tools such as the HONcode and the QUEST [44,45]. Further, research indicates that the methods used by health consumers to search and appraise online health information differ from the systematic methods used by investigators [46]. As a partially non-academic area of research, a number of health information evaluation tools are not detailed or evaluated in the peer-reviewed literature and may have been excluded from the scoping review presented here. Existing efforts to expand the user base for quality evaluation tools include the HONcode Health Website Evaluation Tool and Provost et al.'s 95-item WebMedQual assessment [47]. This body of work can be expanded upon in the academic space by standardising and ensuring validity of the broad range of heterogeneous tools that exist outside of this space. Future work should continue efforts to develop a more accessible and concise patientfriendly tool that incorporates the values of endusers when assessing online health information, such as privacy and usability factors. To address this need, we are currently in the process of developing a public-friendly adaptation of the existing QUEST criteria that can be easily understood and applied by non-expert users.
Finally, a novel tool aiming to address the issue of misinformation onlinewhether intended for use by expert or non-expert usersneeds to be supplemented by a careful examination of the drivers of public attitudes toward key issues in health care. Studies have shown that social beliefs and attitudes related to a range of health issues (e.g., vaccination uptake [48,49], health and wellbeing in an ageing population [50], uptake of mental health care [51,52]) pose significant challenges in obtaining optimal public health outcomes. Tools such as QUEST are designed as downstream interventions that can aid health consumers and providers in differentiating between high-and low-quality information online. It is unlikely that the wide availability of these tools will be effective as a standalone intervention; additional work is required to contextualize the public spaces in which these evaluation tools will be useful and to determine how these tools can best be used in complement to health communication strategies and more upstream, systemic interventions in order to change health behaviours and attitudes.

Conclusions
Developed to address gaps in available quality evaluation tools for online health information, the QUEST is composed of a short set of criteria that can be used by health care professionals and researchers alike. Our findings demonstrate the QUEST's reliability and validity in evaluating online articles on AD treatment and prevention. For example, two similar tools used for comparison, the DISCERN and HONcode Health Website Evaluation tools, are 12-16 questions in length. This study provides evidence that the QUEST builds on the strengths of existing instruments and evaluates quality with similar efficacy using a rapid seven-item questionnaire. As a result, this tool may serve as a more accessible resource that effectively consolidates the quality criteria outlined in previous work. Additionally, due to its simplicity and unique weighting approach, the QUEST reduces the need for users' subjective judgment and indicates potential for future iterations of the tool to be easily tailored to the needs of different users. Based on the current evidence, the QUEST can be used to reliably assess online sources of information on treatment and prevention of AD. Following formal establishment of its reliability and validity across a wide range of health topics, the QUEST may serve as or inform a universal standard for the quality evaluation of online health information.
Endnotes 1 Please refer to Appendix 1 for characteristics of retrieved articles. 2 Please refer to Appendix 2 for a complete listing of currently available tools. Table 6 Characteristics of articles (n = 36) retrieved between January 15, 2016 and February 5, 2016 using the following search terms on Google Scholar and PubMed: online, health information, evaluate, evaluation, tool, quality, validity, testing, validation, and assessment and meeting the following inclusion criteria: 1) the article is in the English language; 2) validation of an assessment tool related to quality of health information was the focus of the article (Continued) Table 6 Characteristics of articles (n = 36) retrieved between January 15, 2016 and February 5, 2016 using the following search terms on Google Scholar and PubMed: online, health information, evaluate, evaluation, tool, quality, validity, testing, validation, and assessment and meeting the following inclusion criteria: 1) the article is in the English language; 2) validation of an assessment tool related to quality of health information was the focus of the article (Continued)