- Research article
- Open Access
- Open Peer Review
This article has Open Peer Review reports available.
A new adaptive testing algorithm for shortening health literacy assessments
© Kandula et al; licensee BioMed Central Ltd. 2011
Received: 24 January 2011
Accepted: 6 August 2011
Published: 6 August 2011
Low health literacy has a detrimental effect on health outcomes, as well as ability to use online health resources. Good health literacy assessment tools must be brief to be adopted in practice; test development from the perspective of item-response theory requires pretesting on large participant populations. Our objective was to develop a novel classification method for developing brief assessment instruments that does not require pretesting on large numbers of research participants, and that would be suitable for computerized adaptive testing.
We present a new algorithm that uses principles of measurement decision theory (MDT) and Shannon's information theory. As a demonstration, we applied it to a secondary analysis of data sets from two assessment tests: a study that measured patients' familiarity with health terms (52 participants, 60 items) and a study that assessed health numeracy (165 participants, 8 items).
In the familiarity data set, the method correctly classified 88.5% of the subjects, and the average length of test was reduced by about 50%. In the numeracy data set, for a two-class classification scheme, 96.9% of the subjects were correctly classified with a more modest reduction in test length of 35.7%; a three-class scheme correctly classified 93.8% with a 17.7% reduction in test length.
MDT-based approaches are a promising alternative to approaches based on item-response theory, and are well-suited for computerized adaptive testing in the health domain.
More than half of the US adult population has limited health literacy skills . Low health literacy can limit a person's ability to communicate effectively with healthcare providers, comprehend physician's instructions and make informed health decisions. In addition, a strong correlation between low health literacy and poor health outcomes have been documented in a range of medical problems .
The goal of the consumer health informatics initiatives is to improve patients' comprehension of health information. This requires tools that identify deficiencies in disadvantaged patient populations. Several health literacy measures are available, including the Rapid Estimate of Adult Literacy of Medicine (REALM) , the Test of Functional Health Literacy in Adults (TOFHLA) , and the Newest Vital Sign (NVS) . In addition, several other assessments focus on numerical skills or numerical self-efficacy [5–7]. However, no single assessment tool covers text reading skills, numerical skills, ability to use forms, tables, and graphs, oral communication skills, conceptual understanding of physiological health issues, and ability to understand and navigate health systems such as hospitals and insurance companies, which are just some of the domains considered to be relevant to health literacy [8–10].
Even this partial list suggests that a truly comprehensive health literacy assessment would be prohibitively burdensome for both the patient and the clinician. In order for testing to be useful in clinical practice or for patient-oriented informatics initiatives, it would have to be made as simple and short as possible. Abbreviated versions of REALM[11–13] and TOFHLA have been proposed, although the applicability of methods used to shorten these tests on new tests remains to be investigated.
Our current work is part of a research program with an overall objective of developing a flexible health literacy instrument for older adults. A short and powerful assessment would be particularly useful for older adults, who: (a) are more likely to have low health literacy [10, 15], (b) use health care services more often than younger people , and (c) are more likely to experience fatigue from taking an exhaustive literacy exam.
The task of health literacy assessment is similar to mastery tests used in educational and professional domains, in which a subject needs to be classified as master or non-master. Huynh and van der Linden[18, 19] have studied the problem of administering fixed-length mastery tests while minimizing the cost of misclassification of the subjects. However, in cases where the subjects exhibit clear mastery (or non-mastery) early termination of the tests would be beneficial. This would require variable-length tests in which the decision to continue or terminate testing can be made after each item. In one of the earliest attempt at a variable-length testing, Ferguson proposed an approach based on Wald's sequential probability ratio. However, in this approach all items were considered to be of equal difficulty. Kingsbury and Weiss, Reckase and Lewis and Sheenan have proposed methods that do not rely on this assumption and estimate each item's difficulty using item response theory (IRT).
where α i ∈ [0, ∞), b i ∈ (-∞,∞) and c i ∈ [0,1] are item-specific parameters that represent the difficulty, discriminating power and guessing probability of item i respectively [25, 26]. IRT requires the items to be pre-tested on a large population in order to estimate the probability that an item is answered correctly by examinees of different competency levels. Although IRT-based approaches provide excellent results when examinees need to be scored with precision on a continuous scale, this calibration can be complex and resource-intensive. Multi-stage testing approaches can relax the calibration size requirements of IRT but would still require calibration samples in the order of a few hundred subjects.
The methods proposed by Kingsbury and Weiss and Reckase have used decision theory principles to evaluate the stopping function i.e. the cost of continuation of the test compared to the reduction in cost of misclassification (both false positives and false negatives) that can be expected to result from administering additional items. Vos, Welch and Frick and Rudner[30, 31] have discussed the extension of decision theory to item-selection.
These testing methods, which adapt to the examinees' perceived ability in order to determine the item order and test length, can be suitably administered using computers. Computerized adaptive testing (CAT) provides a way to make assessment more comprehensive while limiting additional burden on the test-taker [26, 32]. CAT is routinely used in standardized educational testing and has recently been applied to patient-oriented assessments in health and medicine. Mead and Drasgow's review of 123 tests showed that CAT and paper-and-pencil tests are equivalent for carefully constructed tests. For instance, the Patient-Reported-Outcomes Measurement Information System (PROMIS) is an NIH Roadmap network project that uses CAT to improve the reliability, validity, and precision of patient-reported outcomes. CAT has also been used successfully to screen for developmental problems , assess tobacco beliefs among young people  and to administer headache impact tests . It has consistently been shown to be an efficient testing approach with high precision. We believe that in situations when the examinees need to be classified into relatively few categories, measurement decision theory (MDT) can provide comparable performance with significantly fewer items and a much smaller test population .
In this paper we discuss the application of a MDT-based approach to administer variable-length tests in order classify examinees into a limited number of discrete categories. Pilot testing with a relatively small calibration sample is used to estimate the conditional probabilities that subjects of particular literacy levels will answer a particular question correctly. Subsequently, Bayes' Theorem is used to compute the likelihood that an individual test-taker who answers a series of questions correctly is of a specified literacy level. We demonstrate the validity of the proposed method using data from two patient health literacy questionnaires.
In the MDT-based CAT process, the goal is to place the examinee in one of k literacy classes (e.g., low or adequate; or low, adequate, or high). One item is presented at a time, and on the basis of the test-taker's previous answers, the next 'best' item is the item that eliminates the most uncertainty about the classification.
P(L i )- the distribution of different competence levels (L i ) in the participant population;
P(Q j = 1/L i ) - the conditional probability that participants of a particular competence level L i respond correctly to a question Q j .
Termination: The testing process could theoretically continue until the participant answered all available questions. However, to gain efficiency, we terminate testing if H(S next ) (from 4) was less than a specified threshold for three successive questions. At termination, the participant's predicted class is the class L i with maximum P(L i /Z).
We validated our MDT-based CAT approach through a secondary analysis of data from two assessment tests developed for consumers in the health domain. The algorithm was applied retrospectively to the data sets to (a) determine an optimal question order and (b) to categorize each participant into a literacy or numeracy category. Finally, we also performed ROC analysis to characterize the sensitivity, specificity, and predictive power (area under the curve) of the algorithm.
Data Set 1
Demographic characteristics of study samples
(n = 52)
(n = 100)
(n = 65)
Age bracket, n (%)
Number (%) women
Educational level, n (%)
no bachelor's degree
bachelor's or graduate degree
Self-identity, n (%)
African - American
Poor health literary (by S-TOFHLA), n (%)
a key term that was related to the target term in the question, either at the surface-level ("biopsy" is a "test") or at the concept level ("biopsy" means "removing sample of a tissue");
a distractor term that had the same semantic relation to the target term as the key term and of approximately the same difficulty as the key term;
a second distractor term satisfying the same criteria as 2;
4. a do not know option.
Results of the study are reported elsewhere .
Cronbach's alpha was found to be 0.93 signifying very high internal consistency. Factor analysis without rotation showed that all questions load heavily on one factor accounting for 26% of the variance. The second and third factors account for 10% and 7% respectively.
Different literacy measures use different cut-off points to assign participants to a category. For the current analysis, the participants' scores were used to classify them into one of three categories: low literacy (a score of 44 or lower), moderate literacy (a score in the range of 45 and 52) or high literacy (a score of 53 or higher). These score thresholds were selected for demonstration purposes so as to obtain approximately equal-sized groups: 34.6% of participants in the low literacy group and 32.7% in moderate and high literacy groups. However, different thresholds could be selected for other purposes.
Data Set 2
Numeracy scale with percentages answering correctly in two studies
Lipkus et al
(n = 463)
(n = 100)
(n = 62 b )
(n = 162)
1. Imagine that we flip a fair coin 1,000 times. What is your best guess about how many times the coin would come up heads?
question not scored a
2. Which of the following numbers represents the biggest risk of getting a disease? _ 1 in 100, _ 1 in 1000, _1 in 10
3. Which of the following numbers represents the biggest risk of getting a disease? _ 1%, _ 10%, _ 5%
4. If Person A's risk of getting a disease is 1% in ten years, and person B's risk is double that of A's, what is B's risk?
If Person A's chance of getting a disease is 1 in 100 in ten years, and person B's risk is double that of A's, what is B's risk?
question not used
5. If the chance of getting a disease is 10%, how many people would be expected to get the disease out of 100?
6. If the chance of getting a disease is 10%, how many people would be expected to get the disease out of 1000?
7. If the chance of getting a disease is 20 out of 100, this would be the same as having a ____% chance of getting the disease.
8. The chance of getting a viral infection is .0005. Out of 10,000 people, about how many of them are expected to get infected?
For the current analysis, two categorization schemes were developed for demonstration purposes. In the first scheme, participants were categorized as low numeracy (a score of 5 or lower, n = 48 (29.1%)) or high numeracy; in the second scheme, they were categorized as low numeracy (a score of 5 or lower, n = 48(29.1%)), moderate numeracy (a score of 6 or 7, n = 74(44.8%)), or high numeracy. Classifying patients into 3 categories of competency is typical of other assessments such as the TOFHLA .
The gold standard literacy or numeracy level, true category, of a participant was defined by his/her total score on the original questionnaire as described above. The MDT-CAT algorithm was applied to predict the true category of the participant using a leave-one-out approach. For example, for the first data set, data from 51 of the 52 participants served as calibration cases - i.e. used to estimate the distribution P(L i ) and P(Q j = 1|L i ) - and the 52nd participant served as the test case. The process was repeated so that each participant served as the test case exactly once.
To initialize the testing process, i.e. to determine the first question to be presented, the following two alternatives were explored:
SA: Question selection criteria (i.e. entropy minimization) described by (4), (5) and (6) by substituting P(L i ) for P(L i |Z) (since Z is not initialized).
SB: Any question that a participant known to be of moderate literacy has a 40-60% chance of answering correctly.
The average number of questions that needed to be presented prior to termination and the number of wrong classifications were calculated and reported.
For data set 1, using the start criterion SA and a termination threshold of 0 (i.e. no termination threshold), the algorithm classified 88.5% (n = 46) of the participants correctly. Of the misclassified participants (n = 6), half were of moderate literacy incorrectly classified as low literacy while the others were of high literacy classified as moderate.
All the misclassified cases appear to result from the same type of outlier question/response pair. For instance, one of the misclassification resulted when a participant with a score of 45 was misclassified to be of low literacy (true category: moderate literacy). An analysis of the response vector and state entropies revealed that this can be attributed to the participant's response to one particular question, for which the participant gave an incorrect response, in contrast to all other participants of moderate literacy. As the algorithm uses a leave-one-out approach, P(Qi = 0|moderate literacy) for this question would be equal to 0, and the participant's incorrect response results in a misclassification.
The use of a random moderate-difficulty start question (start criterion SB) did not result in any discernible difference in the number or type of errors produced by the algorithm.
Figure 2 also shows the result of using start criterion SB in combination with different entropy thresholds. As can be seen in the figure, the reduction in questions is comparable to our earlier results. However, the method tends to misclassify a slightly larger number of participants.
Actual P(Qj = 1 | Li) for various subsets of Dataset 2 - complete sample, a random half of the sample, online sample (n = 100) and clinic sample (n = 62)
Random 50% split
Actual P(Li) of the calibration sets for Dataset 2 using three different calibration schemes - leave-one-out, a random half of the sample, and online sample (n = 100)
Random 50% split
Using the leave one-out approach, the average number of questions to be answered was 6.6 at 93.8% classification accuracy (accuracy possible if no threshold were used). For calibration scheme (a), 6.8 questions needed to be answered and a higher accuracy of 97.5% was observed. Calibration scheme (b) resulted in a classification accuracy of 91.9% with the subject having had to answer 5.3 items on average. As can be seen in Table 4 for scheme (b), the calibration sample has a very different distribution from the testing sample. For example, in the online sample, 16% of the subjects were of low numeracy and 37% of high numeracy whereas in the clinic population 52% were of low numeracy and only 5% of high numeracy. The decrease in performance using scheme (b) can probably be attributed to this difference.
P(Li) predicted for the testing sets using the three calibration schemes
Comprehensive, robust and parsimonious diagnostic health literacy measures are needed in consumer health initiatives, as well as clinical practice, to identify low literacy individuals. The need is particularly acute in older adults who are more likely to require frequent health care services and are also more likely to experience fatigue and cognitive burden from a lengthy testing procedure.
Our results show that the method discussed in this paper can be useful in reducing the number of questions that need to be answered by a participant without seriously compromising classification accuracy. For the data described in this study, we found that by selecting an appropriate threshold, the number of questions can be reduced by half without making any additional misclassifications. In addition, this high degree of accuracy was achieved with only few calibration cases. By contrast, the number of cases needed for calibrating IRT-based algorithms is estimated to be quite large [25, 30]. For demonstration purposes, we applied it to data sets from relatively brief assessments. Even with these short assessments, the algorithm markedly reduced the number of test items; the impact on our target population would be even more substantive if this algorithm were applied to lengthier competency assessments. Although MDT-based adaptive testing methodologies have been proposed and used in other domains, to the best of our knowledge this is its first application in the health domain.
The thresholds used to report the results are values at which the greatest reduction in test length was observed at the same classification accuracy as that achieved had no threshold been used. In a real testing scenario, these values can be estimated by simulating the leave-one out approach on the calibration sample. These simulations can also inform the test administrators on the accuracy/test-length trade-offs and help them choose higher thresholds that further reduce the length of the tests.
Results observed with calibration scheme (b) of the second dataset also suggest that the users of the method should be encouraged to have a calibration set that is representative of the overall population and if this is not possible the performance of the classification accuracy can be expected to decrease. Additional analysis, preferably with larger datasets, would be necessary to determine the response of the performance to deviations between calibration and testing samples.
Although the results are promising, this study has some limitations. First, it relies on secondary analysis of data from assessments that measure vocabulary familiarity and numeracy, constructs that are related to or contribute to health literacy but do not fully cover the domain of health literacy. Second, the MDT algorithm is predicated on the assumption that the responses to questions are independent of other responses, and this may not be valid. The effect of the violation of this assumption on the performance of this algorithm remains to be investigated. Future work could compare algorithm performance on assessments that have highly correlated questions and assessments with highly independent questions, with independence inferred from Cronbach's alpha or factor analysis. It is relevant to point out however, that IRT makes a similar assumption of question independence.
A final limitation is that the assessment would have to be administered on a computer, and many individuals with low literacy also have low computer literacy. We are currently developing a computer interface for this assessment that is expressly designed to be easier to use for our target population of older adults, many of whom are novice computer users. The processing/memory requirements to administer this kind of testing are minimal and a fully equipped computer is not necessary. Touch screen devices such as tablet computers - which are becoming more affordable - hold promise as a more user-friendly medium for administering these tests in healthcare settings.
In addition, in future work, we intend to study how to apply this method to scenario-based tests where each scenario is described and is followed by several related questions. In such cases, probability and entropy calculations may need to be performed at the module level rather than at the individual question level. Additionally, we intend to validate the method by applying it to directly test the subjects in contrast to the retrospective process used in this study.
The computer-adaptive testing method presented in this paper, which is based on measurement decision theory, can significantly reduce the number of items needed to classify subjects into categories corresponding to levels of competency without significantly compromising the accuracy of the classification. In addition, the measure can be validated with few subjects and test items. This method creates the potential for the development of sensitive diagnostic instruments for measuring health literacy efficiently and without undue burden on subjects.
This work is supported by a grant from the National Institute for Nursing Research (1R21NR010710) awarded to DRK and QZT. JSA was supported by NLM training grant LM-007079. The familiarity data were collected under NIH grant R01 LM007222-05 (PI: Qing Zeng-Treitler) and the numeracy data were collected as part of AHRQ R03-HS016333 (PI: Rita Kukafka).
- Nielsen-Bohlman L, Panzer AM, Kindig DA, Committee on Health Literacy: Health Literacy: A Prescription to End Confusion. Book Health Literacy: A Prescription to End Confusion (Editor ed.^eds.). 2004, City: Institute of Medicine. The National Academies PressGoogle Scholar
- Davis TC, Long SW, Jackson RH, Mayeaux EJ, George RB, Murphy PW, Crouch MA: Rapid estimate of adult literacy in medicine: a shortened screening instrument. Family Medicine. 1993, 25: 391-395.PubMedGoogle Scholar
- Nurss J, Parker R, Williams M, Baker D: TOFHLA test of functional health literacy in adults. 2001, Show Camp, NC: Peppercorn Books and PressGoogle Scholar
- Weiss BD, Mays MZ, Martz W, Castro KM, DeWalt DA, Pignone MP, Mockbee J, Hale FA: Quick assessment of literacy in primary care: The Newest Vital Sign. Annals of Family Medicine. 2005, 3: 514-522. 10.1370/afm.405.View ArticlePubMedPubMed CentralGoogle Scholar
- Lipkus IM, Samsa G, Rimer BK: General performance on a numeracy scale among highly educated samples. Medical Decision Making. 2001, 21: 37-44.View ArticlePubMedGoogle Scholar
- Schwartz L, Woloshin S, Black W, Welch H: The role of numeracy in understanding the benefit of screening mammography. Annals of Internal Medicine. 1997, 127: 966-972.View ArticlePubMedGoogle Scholar
- Zikmund-Fisher BJ, Smith DM, Ubel PA, Fagerlin A: Validation of the subjective numeracy scale: effects of low numeracy on comprehension of risk communications and utility elicitations. Medical Decision Making. 2007, 27: 663-671. 10.1177/0272989X07303824.View ArticlePubMedGoogle Scholar
- Zarcadoolas C, Pleasant A, Greer DS: Understanding health literacy: an expanded model. Health Promotion International. 2005, 20: 195-203. 10.1093/heapro/dah609.View ArticlePubMedGoogle Scholar
- McCray AT: Promoting health literacy. JAMIA. 2005, 12: 152-163.PubMedPubMed CentralGoogle Scholar
- Rudd R, Kirsch I, Yamamoto K: Literacy and Health in America. Book Literacy and Health in America (Editor ed.^eds.). 2004, City: Center for Global Assessment Policy Information Center Research and Development Educational Testing ServiceGoogle Scholar
- Bass PF, Wilson JF, Griffith CH: A shortened instrument for literacy screening. Journal of general internal medicine. 2003, 18: 1036-1038. 10.1111/j.1525-1497.2003.10651.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Shea JA, Beers BB, McDonald VJ, Quistberg DA, Ravenell KL, Asch DA: Assessing health literacy in African American and Caucasian adults: disparities in rapid estimate of adult literacy in medicine (REALM) scores. Fam Med. 2004, 36: 575-581.PubMedGoogle Scholar
- Arozullah AM, Yarnold PR, Bennett CL, Soltysik RC, Wolf MS, Ferreira RM, Lee SYD, Costello S, Shakir A, Denwood C: Development and validation of a short-form, rapid estimate of adult literacy in medicine. Medical Care. 2007, 45: 1026-1033. 10.1097/MLR.0b013e3180616c1b.View ArticlePubMedGoogle Scholar
- Baker DW, Williams MV, Parker RM, Gazmararian JA, Nurss J: Development of a brief test to measure functional health literacy. Patient education and counseling. 1999, 38: 33-42. 10.1016/S0738-3991(98)00116-5.View ArticlePubMedGoogle Scholar
- Ancker JS, Kaufman DR: Rethinking health numeracy: A multidisciplinary literature review. Journal of the American Medical Informatics Association. 2007, 14: 713-721. 10.1197/jamia.M2464.View ArticlePubMedPubMed CentralGoogle Scholar
- National Academy on an Aging Society: Chronic Conditions: A challenge for the 21st century. Washington, DC. 1999Google Scholar
- Huynh H: A nonrandomized minimax solution for passing scores in the binomial error model. Psychomterika. 1980, 45: 167-182. 10.1007/BF02294075.View ArticleGoogle Scholar
- van Der Linden WJ: Applications of decision theory to test-based decision making. New developments in testing: Theory and applications. Edited by: Hambleton RK, Zaal JN. 1990, Boston: Kluwer, 129-155.Google Scholar
- van der Linden WJ, Vos HJ: A compensatory approach to optimal selection with mastery scores. Pyschometrika. 1996, 61: 155-172. 10.1007/BF02296964.View ArticleGoogle Scholar
- Ferguson RL: The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Doctoral Thesis. 1969, University of Pittsburgh, Pittsburgh, PAGoogle Scholar
- Wald A: Sequential analysis. 1947, New York: WileyGoogle Scholar
- Kingsbury GG, Weiss DJ: A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. New horizons in testing: Latent trait test theory and computerized adaptive testing. Edited by: Weiss DJ. 1983, New York: Academic Press, 257-283.Google Scholar
- Reckase MD: A procedure for decision making using tailored testing. New horizons in testing: Latent trait test theory and computerized adaptive testing. Edited by: Weiss DJ. 1983, New York: Academic Press, 237-255.Google Scholar
- Lewis C, Sheehan K: Using Bayesian decision theory to design a computerized mastery test. Applied Pschycological Measurement. 1990, 14: 367-386. 10.1177/014662169001400404.View ArticleGoogle Scholar
- Hambleton RK, Jones R: An NCME Instructional Module on Comparison of Classical Test Theory and Item Response Theory and Their Applications to Test Development. Educational Measurement: Issues and Practice. 12: 38-47.Google Scholar
- van der Linden WJ, Glas CAW: Computerized adaptive testing: Theory and practice. 2000, Springer NetherlandsView ArticleGoogle Scholar
- Chuah SC, Drasgow F, Luecht R: How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation. Applied Measurement in Education. 2006, 19: 241-255. 10.1207/s15324818ame1903_5.View ArticleGoogle Scholar
- Vos HJ: Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics. 1999, 24: 271-292.View ArticleGoogle Scholar
- Welch RE, Frick T: Computerized adaptive testing in instructional settings. Educational Technology Research & Development. 1993, 41: 47-62.View ArticleGoogle Scholar
- Rudner LM: An Examination of Decision-Theory Adaptive Testing Procedures. Annual Meeting of the American Educational Research Association; New Orleans, LA. 2002Google Scholar
- Rudner LM: Measurement Decision Theory. Final Report to the National Institute on Student Achievement, Curriculum and Assessment. 2002Google Scholar
- Wainer H, Dorans NJ, Eignor D, Flaugher R, Green BF, Mislevy RJ, Steinberg L: Computerized adaptive testing: A primer. 2001, SpringerGoogle Scholar
- Mead AD, Drasgow F: Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin. 1993, 114: 449-View ArticleGoogle Scholar
- Fries J, Bruce B, Cella D: The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clinical and Experimental Rheumatology. 2005, 23: S53-57.PubMedGoogle Scholar
- Jacobusse G, Buuren S: Computerized adaptive testing for measuring development of young children. Statistics in Medicine. 2007, 15: 2629-2638.View ArticleGoogle Scholar
- Panter A, Reeve B: Assessing tobacco beliefs among youth using item response theory models. Drug and Alcohol Dependence. 2002, 68: S21-39.View ArticlePubMedGoogle Scholar
- Ware JE, Kosinski M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlöf CGH, Tepper S, Dowson A: Applications of computerized adaptive testing (CAT) to the assessment of headache impact. 2003, 12: 935-952.Google Scholar
- Shannon CE: A mathematical theory of communication. The Bell System Technical Journal. 1948, 27: 379-423. 623-656.View ArticleGoogle Scholar
- Keselman A, Tse T, Crowell J, Browne A, Ngo L, Zeng Q: Assessing consumer health vocabulary familiarity: An exploratory study. Journal of Medical Internet Research. 2007, 9: e5-10.2196/jmir.9.1.e5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ancker JS, Weber EU, Kukafka R: Effects of Game-Like Interactive Graphics on Risk Perceptions and Decisions. Medical Decision Making. 2010,Google Scholar
- Gurmankin AD, Baron J, Armstrong K: Intended message versus message received in hypothetical physician risk communications: Exploring the gap. Risk Analysis. 2004, 24: 1337-1347. 10.1111/j.0272-4332.2004.00530.x.View ArticlePubMedGoogle Scholar
- Gurmankin AD, Baron J, Armstrong K: The effect of numerical statements of risk on trust and comfort with hypothetical physician risk communication. Medical Decision Making. 2004, 24: 265-271. 10.1177/0272989X04265482.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/11/52/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.