Skip to main content
  • Research article
  • Open access
  • Published:

Health numeracy in Japan: measures of basic numeracy account for framing bias in a highly numerate population



Health numeracy is an important factor in how well people make decisions based on medical risk information. However, in many countries, including Japan, numeracy studies have been limited.


To fill this gap, we evaluated health numeracy levels in a sample of Japanese adults by translating two well-known scales that objectively measure basic understanding of math and probability: the 3-item numeracy scale developed by Schwartz and colleagues (the Schwartz scale) and its expanded version, the 11-item numeracy scale developed by Lipkus and colleagues (the Lipkus scale).


Participants’ performances (n = 300) on the scales were much higher than in original studies conducted in the United States (80% average item-wise correct response rate for Schwartz-J, and 87% for Lipkus-J). This high performance resulted in a ceiling effect on the distributions of both scores, which made it difficult to apply parametric statistical analysis, and limited the interpretation of statistical results. Nevertheless, the data provided some evidence for the reliability and validity of these scales: The reliability of the Japanese versions (Schwartz-J and Lipkus-J) was comparable to the original in terms of their internal consistency (Cronbach’s α = 0.53 for Schwartz-J and 0.72 for Lipkus-J). Convergent validity was suggested by positive correlations with an existing Japanese health literacy measure (the Test for Ability to Interpret Medical Information developed by Takahashi and colleagues) that contains some items relevant to numeracy. Furthermore, as shown in the previous studies, health numeracy was still associated with framing bias with individuals whose Lipkus-J performance was below the median being significantly influenced by how probability was framed when they rated surgical risks. A significant association was also found using Schwartz-J, which consisted of only three items.


Despite relatively high levels of health numeracy according to these scales, numeracy measures are still important determinants underlying susceptibility to framing bias. This suggests that it is important in Japan to identify individuals with low numeracy skills so that risk information can be presented in a way that enables them to correctly understand it. Further investigation is required on effective numeracy measures for such an intervention in Japan.

Peer Review reports


Active involvement of patients in decision making for their own medical care, such as deciding whether or not to accept a particular treatment (informed consent), or choosing among medical-care options (informed choice), is becoming a worldwide practice [1]. In Japan, informed consent was codified in the “Medical Service Act” in 1997, and is now common practice [2]. Recent surveys show that moving beyond informed consent, Japanese patients are involved in informed decision making in clinical practice [3] and that they prefer this involvement [4]. Consequently, it is very important to ensure that patients correctly understand medical information, so that their decisions, possibly made in life-threatening situations, reflect their true will.

One way to address this issue would be to assess patients’ health numeracy, the ability to understand probabilistic and mathematical concepts [5]. Health numeracy has gained increasing attention, as the amount of quantitative information, such as the probability of survival outcomes for different medical treatments, is increasingly present in medical risk information [6, 7]. Previous studies assessing health numeracy have shown that individuals with low numeracy are more likely to misunderstand risk information, and their risk evaluations tend to be more influenced by context, such as how the related numbers are framed (reviewed in [5, 811]). While many of those studies used healthy respondents, studies with actual patients have shown the influence of numeracy on their disease-related decision making (e.g. [1215]). Therefore, to ensure accurate medical-risk communication, it is important to know whether patients have a sufficient level of health numeracy.

Despite its importance, however, health numeracy and assessment scales have been largely understudied in many countries, including Japan. In fact, previous research is mostly from the United States and Europe, with research in other countries just beginning (e.g. [1618]). These pioneering studies on cross-cultural comparisons have shown considerable differences in numeracy levels [16, 19], as well as the association of numeracy with decision making [20, 21] across different countries. Therefore, it is important to determine how previous findings apply to different countries, and develop strategies suitable for each case.

In Japan, the limited attention paid to health numeracy might be a reflection of the common belief that the majority of Japanese people enjoy basic numeracy. For example, since its inception in 2000, the Programme for International Student Assessment has rated the mathematical ability of Japanese students (aged 15 to 16 years) as higher than the international average, whereas that in the United States has been below average. While this might imply that Japanese patients are able to correctly use numerical medical data, recent pioneering work in Japan suggests this might not be the case [22]. In their study, Takahashi et al. developed a 7-item scale to test the ability of Japanese patients to interpret medical information (TAIMI), with 3 of these items especially relevant to numeracy. Unexpectedly, more than 50 percent of respondents made mistakes in 2 of these items. Those 2 items evaluated the effect of a medicine, one presenting information as a fraction and the other as a natural frequency. This result suggests that Japanese health numeracy may not be as high as expected, and that there is a need for further investigation specifically focusing on health numeracy.

The aim of this study is twofold: 1) to evaluate Japanese versions of numeracy scales, and 2) to assess the health numeracy of Japanese adults.

We chose two well-known health numeracy scales that focus on the basic understanding of math and probability, the 3-item Schwartz scale [23], and its expanded version, the 11-item Lipkus scale [24]. To date, these scales are among the most frequently used instruments in numeracy studies. Since the focus of these tests is different from that of TAIMI, we were interested in how Japanese people performed in these health numeracy scales. Japanese versions of these scales (Schwartz-J and Lipkus-J) were prepared using forward and backward translation procedures [25].

The reliability of the scales was assessed for internal consistency. The original scales were shown to be unidimensional; the original Schwartz study did not conduct a factor analysis, but Lipkus and colleagues evaluated the factor structure of their scale which included the Schwartz items [24]. However, there are also studies showing multi-factor structure on these scales [18]. Since factor structure could be different depending on the nature of a target population [26], an exploratory factor analysis was conducted to explore factor structure for the Japanese version. Convergent validity of numeracy scales was evaluated by their correlation with existing measures of health numeracy [8]. As mentioned above, TAIMI has a two-factor structure where three items are specifically relevant to numeracy [22]. Thus, we examined correlations between TAIMI scores and the Schwartz-J and Lipkus-J scores. We expected a positive correlation between performance on TAIMI and the scales translated in this study, specifically for the numeracy items of TAIMI (TAIMI-num).

To determine whether numeracy levels measured by these scales have any influence on medical risk communication in Japan, we examined the association between framing bias and performance on numeracy scales. Previous studies found that those with low scores on the Lipkus scale are more susceptible to framing effects [12, 19, 27, 28], an effect whereby different phrasing influences participant decisions based on mathematically identical data. We tested whether this can be seen with Japanese samples using the Schwartz-J and Lipkus-J scales. To approximate the Japanese population, we used quota sampling (n = 300) according to age, gender, and education level. In so doing, this study explores a method for assessing health numeracy in the Japanese population, with the aim of improving medical-risk communication.


Study population

Participant demographics are summarized in Table 1. Overall, the demographics matched those of the Japanese population: education level, gender, and age group were controlled for during recruitment (Additional file 1: Table S1). While household income was not a controlled factor, it also turned out to be similar to that of the general population (Additional file 2: Table S2).

Table 1 Characteristics of the study sample (n = 300)

Scale characteristics


Tetrachoric correlations among the items indicated that items 8 and 9 of Lipkus-J were highly correlated (ρ = .99). Therefore, item 8 (the one with less variance) was removed from the analysis to avoid redundancy. A one-factor structure was indicated for both Schwartz-J and Lipkus-J (Table 2), as was also reported in the original study by Lipkus et al. However, while all the items were accounted for by the first factor in the original study, in our study, one item (item 11) did not produce a sufficient factor loading score for Lipkus-J. Hereafter, we refer to the scale consisting of the 9 items that formed the first factor as Lipkus-J9, and mainly focus on this scale when dealing with total scores. The scale, including all 11 items is referred to as Lipkus-Jall. Cronbach’s α of Schwartz-J (0.53) and Lipkus-J9 (0.72) were comparable to those reported by the original developers (Schwartz scale, 0.56-0.80 [8]; Lipkus scale, 0.70 - 0.75 [24]).

Table 2 Questionnaire items, correct response rate, and factor loading for each item for Schwartz-J and LipkusJ

Convergent validity

Convergent validity of the scales was indicated by significant positive correlations between the scores of each scale and TAIMI. As expected, the associations were stronger with TAIMI-num, the numeracy component of the TAIMI scale (Spearman’s ρ, with Schwartz-J, 0.42; Lipkus-J9, 0.46), than with TAIMI-all (Spearman’s ρ, with Schwartz-J, 0.29; Lipkus-J9, 0.33).

Performance on the numeracy scales

Respondents spent 4.9 min ± 0.3 (mean ± SD) completing the Lipkus-Jall, which included Schwartz-J. On average, the item-wise correct response rate was nearly 90 percent, which is much higher than that of the American sample reported in the original study [24] (Table 2). This high level of performance indicates a high level of basic numeracy among the Japanese public. In particular, in the comparison of risk section that was presented in the same format (items 4 and 5), more than 95% of responses were correct. However, there were two other items where nearly 30 percent of participants made mistakes. These were conversion of fractions to percentages (item 3), and conversion of percentages to frequencies (item 11), both of which included decimal numbers. This suggests that a significant proportion of Japanese people become confused when dealing with such data.

A summary of the total scores for our numeracy measures are listed in Table 3. As in the previous studies using original scales e.g. [2833], correct responses to Schwartz-J and Lipkus-J9 had skewed distributions. Among the demographic characteristics, education level and gender had significant effects on performance on most of the scales, as found in previous American and German studies [33]. In our study, males with high educational attainment performed significantly better than others (see details in Additional file 3: Table S3). The effect of age was significant only in Schwartz-J, and household income did not have any significant effect (Additional file 3: Table S3).

Table 3 Total score for each numeracy measure

Numeracy and framing bias

To examine whether numeracy is an important consideration for medical risk communication in a Japanese sample, we examined the degree of framing bias for two respondent groups - those scoring low, and those scoring high in the Schwartz/Lipkus scales. A median split of the measures (grouping based on Schwartz-J, cut-off score ≤ 2; grouping based on Lipkus-J9, cut-off score ≤ 9) was made because, as in previous studies, the distribution of scores for both of these scales was skewed [12, 28].

To investigate framing bias, we compared the risk rating scores for a surgical risk for two different framing conditions (survival rate vs. death rate) for each respondent group. In the high-numeracy group, risk ratings did not differ significantly between the two frames (Table 4, Wilcoxon signed rank test, grouping based on Schwartz-J, Z =-0.1, p > 0.9; Lipkus-J9 Z =-0.3, p > 0.7). In contrast, participants in the low-numeracy group rated the risk framed as a survival rate as being significantly more risky than the risk framed as a death rate (Table 4, Wilcoxon signed rank test, grouping based on Schwartz-J, Z = −2.7, p < 0.01; Lipkus-J9 Z = −3.0, p < 0.01). The framing effect, defined as the difference in riskiness scores between the two frames, was also different between the low- and high-numeracy groups. The degree of framing bias was significantly larger in the low-numeracy group (Mann–Whitney's test, grouping based on Schwartz-J, Z = −2.6, p < 0.01; Lipkus-J9, Z = −2.8, p < 0.01). These results indicate that those performing less well on the numeracy scales are more likely to be influenced by how the numerical data are framed, rather than the numbers themselves.

Table 4 Difference in framing bias between low and high performance groups for each numeracy measure

Because numeracy scores were influenced by education and gender, we examined whether these factors also influenced risk perception bias. We did not find significant effects for education (p > 0.5) or gender (p > 0.2) on the framing effect. This suggests that numeracy is a more important determinant of risk perception bias than the demographic characteristics that were examined in the current study.


In this study, we evaluated Japanese numeracy by translating and applying the Schwartz and Lipkus scales, the widely used health numeracy scales that focus on the understanding of basic math and probability. Translated versions of both scales showed certain reliability and validity, however, the Japanese sample’s high performance caused the score distributions to be negatively skewed, imposing limitations on the psychometric evaluations of the scales. In this section, we first discuss Japanese numeracy in light of our results. Then we address the validity and limits of Lipkus-J and Schwartz-J, and future directions for the application of health numeracy measures.

The current study suggests that basic understanding of math and probability is quite high among Japanese: correct response rates for Lipkus-J items were much higher than those found in the original US samples [24], and in more recent studies on probabilistic national German and US samples [33]. This is consistent with the results of the Programme for International Student Assessment (PISA), where the national average math score for Japanese students has been surpassing those of both the US and Germany since its inception [3437]. The relatively high attainment of math skills during school education, as assessed by PISA, might partly account for the generally high numeracy of the Japanese. The current result is also in line with the recent study assessing the numeracy skills of students at top universities in 15 counties [16]. Although linking top-university level performance with that of the general population is not straightforward, Japan was second best in having the smallest proportion of respondents falling into the lowest quartile.

However, in spite of generally high numeracy, the performance of Japanese sample on the Schwartz-J and Lipkus-J tests still accounted for susceptibility to the framing effect, which can influence patients’ decisions regarding their medical options, such as acceptance of surgery (e.g. [12, 19, 38]; however, empirical results on framing effects in a clinical setting are mixed, reviewed in [39]). A number of previous studies using the original Schwartz and Lipkus scales have shown a numeracy effect on understanding and decision making based on medical information (reviewed in [5, 811]). Moreover, studies have been advancing for communicating quantitative risk information with consideration of patients’ numeracy, such as supplementing numerical data with visual or verbal aides, using natural frequencies rather than probabilities, or presenting risks with both negative and positive frames (reviewed in [10, 11, 4043]). Considering our results and these earlier findings, such care would be called for when communicating medical information to those with low numeracy in Japan, and possibly in other countries where general math performance is deemed to be high.

Regarding instruments to identify those with low numeracy, both the Schwartz-J and Lipkus-J scales demonstrated certain reliability and validity, with Cronbach’s α being comparable with those of original scales, convergent validity being supported by their positive correlation with other health literacy and numeracy measure (TAIMI, [22]), criterion validity being suggested by their association with the susceptibility to framing bias, and content validity being ensured in the original scales. However, we also found a pronounced ceiling effect, which confounded the analysis we have applied, and limited the psychometric qualities of the scales.

Ceiling effects pose multiple psychometric limitations [16]. First, they suggest that scales are less able to differentiate among those with high numeracy. Second, statistical methods applicable for data analysis become limited, as many popular methods assume a normal distribution, and possibly giving in erroneous results when this assumption is violated [4446]. Non-parametric alternatives are not always sufficient. For example, in the current study, we had to use a median split, making it difficult to examine the relationship between numeracy scores and framing effects in depth.

A third limitation is that the means to evaluate the validity of the scale. For example, respondents’ performances can be confounded with other factors such as motivation [47]. However, ensuring discriminant validly is not straightforward with data having a ceiling effect, as, for example, a weak correlation between motivation and numeracy scores might be due to the ceiling effect, rather than the variables being truly unrelated. Similarly, examining the relationships between measured ability with other closely related abilities such as working memory [48] would be confounded by the ceiling effect. Thus, use of Lipkus-J and Schwartz-J with high numeracy sample requires careful consideration of those limitations.

In fact, negative skew for the original Schwartz and Lipkus scales have been noted in a number of earlier studies [2833], and the limitations mentioned above have been pointed out [16, 27]. In response to those concerns, new numeracy scales have recently been developed: the Berlin Numeracy Test (BNT, [16]) and the Abbreviated Numeracy Scale (ANS, [27]). While both scales were built on the works of Schwartz and Lipkus, they have a wider range of difficulty. As a result, they have better psychometric characteristics, especially when used with high-performance samples. Considering the generally high numeracy of Japanese, those new scales might be more suitable for assessing numeracy in Japan, and this should be explored.

Meanwhile, Lipkus-J and Schwartz-J could be useful for assessing those having low numeracy. In the above-mentioned studies that developed new numeracy scales, the effectiveness of the original Lipkus and Schwartz scales is indicated for assessing groups with low numeracy [16, 27]. In fact, positive skew was observed in some of the samples studied using BNT [16], where easier tests would work better. This is an important point to consider when clinical applications are in scope, because some patients are likely to be under physical and psychological stress, which might result in lower numeracy. For instance, a recent clinical study using the Lipkus scale found the numeracy of epilepsy patients to be significantly lower than healthy controls even though educational attainments were lower in the control group [12]. This issue also bears on the test’s validity; where the psychometric characteristics of scales could differ across population groups or settings [49]. Considering possible difference between patients and healthy groups, the use of the numeracy scales translated here, as well as the above-mentioned new scales, should be explored using patient samples so that more effective numeracy measures for the patient population can be discovered.

Finally, the possible influence of volunteer bias [50] should be noted when interpreting the current results. Although demographics of the sample matched those of the Japanese adult population, the test respondents were those who voluntarily agreed to participate in a survey concerning numbers. Therefore, the results could be biased towards those who are more interested in solving numerical problems, and not actually representative of the population. In fact, the average total score of TAIMI in the current study was 4.7, which is higher than that of 3.9 in the original report (Internet survey, n = 6047, [22]). This disparity might be due to differences in sample composition between Takahashi et al.’s work and ours (there were more females and elderly in their study, and no education levels were reported). However, it is also possible that the numeracy reported here is higher than average. This issue should be addressed in future random-selection population-based surveys.


The current study highlights the importance of considering health numeracy in Japan. As assessed by the Japanese versions of internationally used health numeracy scales, the basic understanding of math and probability by Japanese people was shown to be high, but still not sufficient for many to avoid framing bias. Thus, to improve numerical medical risk communication in Japan, it would be necessary to assess health numeracy, screening those with low numeracy to provide them with appropriate care. Although efforts have been made [16, 22], numeracy has not gained much attention in Japan. By evaluating the health numeracy of Japanese, and its measurement instruments, our study is a step towards improving medical risk communication.


Translation of the Schwartz and Lipkus scales

The Japanese version of the Lipkus scale (Lipkus-J), which includes the Schwartz scale (Schwartz-J), was prepared using forward and backward translation [25]. Each translation process was conducted by independent professional translators. Two raters (a bilingual Japanese-English individual and a native English speaker) evaluated the concordance between the back-translated items and the originals. Forward-backward translation was repeated until both raters rated all the back-translated items as semantically concordant with the original. After the translation, some expressions (for example, a lottery prize in dollars) were changed to suit the Japanese context. Finally, the understandability of the resultant wording was checked by students, office workers, and researchers recruited at Jichi Medical University, and Obihiro University of Agriculture & Veterinary Medicine (n = 22), and minor changes in wording were made.

Other measures

Framing bias

A set of questions in which subjects were asked to rate the risk level of a surgical procedure when risk information was presented in two different frames (survival rate, “991 in 1000 people survive this surgery”, and death rate, “9 in 1000 people die from this surgery”), were adopted from the Medical Data Interpretation test [51]. Framing was manipulated within subjects, separating the two differently framed questions with 12 irrelevant ones. A four-point scale was used to rate risk level (1 = not risky, 2 = slightly risky, 3 = risky, 4 = very risky). The framing effect was evaluated by examining the difference between risk rating scores obtained for the two frames.

TAIMI measure

As mentioned in the Introduction, TAIMI [22] contains health numeracy items for Japanese adults. To examine how performance on TAIMI relates to that on the numeracy scales used in the current study, TAIMI was included in the survey.

Survey and participants

An online survey company (Cross Marketing, Tokyo, Japan) was contracted to collect responses (n = 300), and recruitment e-mails were sent to a participant pool maintained by a different online survey company (Research Panel, Tokyo, Japan, n > 1.4 million). Participants voluntarily agreed to complete the online survey. We created 20 blocks of subjects, each defined by gender, age group (20-29, 30-39, 40-49, 50-59, 60-69 years old), and education level (low attainment [high school or lower], high attainment [high school or higher]). The quota was set so that the sample composition roughly matched the Japanese adult population (Additional file 1: Table S1). Participants were recruited until the quota was filled. Students and medical professionals were excluded.

The survey included the Lipkus-J scale (which incorporates the Schwartz-J), TAIMI, measures for framing bias, and some other measures of health and risk attitudes (not reported here). The web page was designed in such a way that respondents could not proceed to the next question without completing the current one, so there were no non-response items in the survey. The survey was conducted in March 2011.

Statistical analysis

Item responses for Schwartz-J, Lipkus-J, and TAIMI were first dichotomized to be either correct or incorrect, and the percentage of individuals with the correct response was determined for each item. As in original scales, the total score was calculated as the number of correct items for each respondent.

An exploratory factor analysis with binary variables was conducted using Mplus version 6.12 [52]. The method used employs tetrachoric correlation with weighted least squares means and variance adjusted (WLSMV) estimation method. This method accommodates dichotomous observation, and has been indicated to be robust to ceiling effects [5356].

Numbers of factors of the new scales were determined by parallel analysis. In the analysis, random datasets for the same number of items and participants as actual observation were generated. Eigen values were extracted for each random dataset, and actual observation. Only those factors with eigen values greater than the average eigen values obtained from random datasets were deemed to be meaningful [57]. Subsequently, exploratory factor analysis with number of factors determined by parallel analyses was performed to examine the factor loadings. Criteria for factor loadings were set to be .35 and above [58]. The consistencies of scales were evaluated according to classical test theory, including Cronbach’s alpha [59], item-total correlation, and descriptive statistics of items.

Convergent validity of numeracy scales has been evaluated through correlation with existing measures for health numeracy [8]. We examined correlations between TAIMI scores and Schwartz-J, and Lipkus-J scores. TAIMI has a two-factor structure where three items are especially relevant to numeracy [22]. Therefore, we expected a positive correlation between performance on TAIMI and the scales translated in this study, especially for the numeracy items of TAIMI (TAIMI-num).

Because the assumption of a normal distribution was not satisfied for test statistics, we used non-parametric tests for examining the effects of demographic characteristics on test performance, and of numeracy levels on framing bias. The Wilcoxon signed rank test was used for pair-wise comparison between two conditions. The Mann-Whitney test was used to compare between two groups, and the Kruskal–Wallis test followed by a pair-wise Mann-Whitney test with Bonferroni correction was used to compare between three or more groups. Significance levels were set at p < 0.05. The program, IBM SPSS Statistics 19 was used for most statistical analysis, with M-plus version 6.12 [52] was used for factor analysis.

Ethical considerations

The institutional ethics committee of the National Food Research Institute granted approval for the study, and permission was also obtained from management section of the Obihiro University of Agriculture & Veterinary Medicine. The methodology used in this study followed the principles of the Helsinki Declaration. Collection of on-line data complied with requirements specified in Japanese Industrial Standards “Personal information protection management systems - Requirements” (JIS Q 15001). Written (electrical) consent was obtained from all the participants.


  1. Hellenthal N, Ellison L: How patients make treatment choices. Nat Clin Pract Urol. 2008, 5: 426-433.

    Article  PubMed  Google Scholar 

  2. Japanese Cabinet Office: Surveys for the Measures for the Aging Society.,

  3. Partridge JC, Martinez AM, Nishida H, Boo NY, Tan KW, Yeung CY, Lu JH, Yu VY: International comparison of care for very low birth weight infants: parents' perceptions of counseling and decision-making. Pediatrics. 2005, 116: e263-e271. 10.1542/peds.2004-2274.

    Article  PubMed  Google Scholar 

  4. Alden DL, Merz MY, Akashi J: Young adult preferences for physician decision-making style in Japan and the United States. Asia Pac J Public Health. 2012, 24: 173-184. 10.1177/1010539510365098.

    Article  PubMed  Google Scholar 

  5. Peters E: Beyond Comprehension: The role of numeracy in judgments and decisions. Curr Dir Psychol Sci. 2012, 21: 31-35. 10.1177/0963721411429960.

    Article  Google Scholar 

  6. Nelson W, Reyna VF, Fagerlin A, Lipkus I, Peters E: Clinical implications of numeracy: theory and practice. Ann Behav Med. 2008, 35: 261-274. 10.1007/s12160-008-9037-8.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gaissmaier W, Gigerenzer G: Statistical illiteracy undermines informed shared decision making. Z Evid Fortbild Qual Gesundhwes. 2008, 102: 411-413. 10.1016/j.zefq.2008.08.013.

    Article  PubMed  Google Scholar 

  8. Reyna VF, Nelson WL, Han PK, Dieckmann NF: How numeracy influences risk comprehension and medical decision making. Psychol Bull. 2009, 135: 943-973.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Lipkus IM, Peters E: Understanding the role of numeracy in health: proposed theoretical framework and practical insights. Health Educ Behav. 2009, 36: 1065-1081. 10.1177/1090198109341533.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Fagerlin A, Ubel PA, Smith DM, Zikmund-Fisher BJ: Making numbers matter: present and future research in risk communication. Am J Health Behav. 2007, 31: S47-S56. 10.5993/AJHB.31.s1.7.

    Article  PubMed  Google Scholar 

  11. Garcia-Retamero R, Okan Y, Cokely ET: Using visual aids to improve communication of risks about health: a review. ScientificWorldJournal. in press

  12. Choi H, Wong JB, Mendiratta A, Heiman GA, Hamberger MJ: Numeracy and framing bias in epilepsy. Epilepsy Behav. 2011, 20: 29-33. 10.1016/j.yebeh.2010.10.005.

    Article  PubMed  Google Scholar 

  13. Lipkus IM, Peters E, Kimmick G, Liotcheva V, Marcom P: Breast cancer patients' treatment expectations after exposure to the decision aid program adjuvant online: the influence of numeracy. Med Decis Making. 2010, 30: 464-473. 10.1177/0272989X09360371.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gardner PH, McMillan B, Raynor DK, Woolf E, Knapp P: The effect of numeracy on the comprehension of information about medicines in users of a patient information website. Patient Educ Couns. 2011, 83: 398-403. 10.1016/j.pec.2011.05.006.

    Article  PubMed  Google Scholar 

  15. Estrada CA, Martin-Hryniewicz M, Peek BT, Collins C, Byrd JC: Literacy and numeracy skills and anticoagulation control. Am J Med Sci. 2004, 328: 88-93. 10.1097/00000441-200408000-00004.

    Article  PubMed  Google Scholar 

  16. Cokely ET, Galesic M, Schulz E, Ghazal S, Garcia-Retamero R: Measuring risk literacy: the Berlin numeracy test. Judgm Decis Mak. 2012, 7: 25-47.

    Google Scholar 

  17. Peters E, Baker DP, Dieckmann NF, Leon J, Collins J: Explaining the effect of education on health: a field study in Ghana. Psychol Sci. 2010, 21: 1369-1376. 10.1177/0956797610381506.

    Article  PubMed  Google Scholar 

  18. Liberali JM, Reyna VF, Furlan S, Stein LM, Pardo ST: Individual differences in numeracy and cognitive reflection, with implications for biases and fallacies in probability judgment. J Behav Decis Mak. 2012, 25: 361-381. 10.1002/bdm.752.

    Article  PubMed  Google Scholar 

  19. Garcia-Retamero R, Galesic M: How to reduce the effect of framing on messages about health. J Gen Intern Med. 2010, 25: 1323-1329. 10.1007/s11606-010-1484-9.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Pachur T, Galesic M: Strategy selection in risky choice: The impact of numeracy, affect, and cross-cultural differences. J Behav Decis Mak. in press

  21. Garcia-Retamero R, Galesic M: Communicating treatment risk reduction to people with low numeracy skills: a cross-cultural comparison. Am J Public Health. 2009, 99: 2196-2202. 10.2105/AJPH.2009.160234.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Takahashi Y, Sakai M, Fukui T, Shimbo T: Measuring the ability to interpret medical information among the Japanese public and the relationship with inappropriate purchasing attitudes of health-related goods. Asia Pac J Public Health. 2011, 23: 386-398. 10.1177/1010539509344882.

    Article  PubMed  Google Scholar 

  23. Schwartz LM, Woloshin S, Black WC, Welch HG: The role of numeracy in understanding the benefit of screening mammography. Ann Intern Med. 1997, 127: 966-972.

    Article  CAS  PubMed  Google Scholar 

  24. Lipkus IM, Samsa G, Rimer BK: General performance on a numeracy scale among highly educated samples. Med Decis Making. 2001, 21: 37-44.

    Article  CAS  PubMed  Google Scholar 

  25. Steiner DL, Norman GR: Health measurement scales. A practical guide to their development and use. 2003, Oxford University Press, New York, 3

    Google Scholar 

  26. Nunnally JC, Bernstein IH: Psychometric theory. 1994, McGraw-Hill, New York, 3

    Google Scholar 

  27. Weller JA, Dieckmann NF, Tusler M, Mertz CK, Burns WJ, Peters E: Development and testing of an abbreviated numeracy scale: A rasch analysis approach. J Behav Decis Mak. in press

  28. Peters E, Vastfjall D, Slovic P, Mertz CK, Mazzocco K, Dickert S: Numeracy and decision making. Psychol Sci. 2006, 17: 407-413. 10.1111/j.1467-9280.2006.01720.x.

    Article  PubMed  Google Scholar 

  29. Peters E, Dieckmann N, Dixon A, Hibbard JH, Mertz CK: Less is more in presenting quality information to consumers. Med Care Res Rev. 2007, 64: 169-190. 10.1177/10775587070640020301.

    Article  PubMed  Google Scholar 

  30. Peters E, Slovic P, Västfjäll D, Mertz CK: Intuitive numbers guide decisions. Judgm Decis Mak. 2008, 3: 619-635.

    Google Scholar 

  31. Schapira MM, Walker CM, Sedivy SK: Evaluating existing measures of health numeracy using item response theory. Patient Educ Couns. 2009, 75: 308-314. 10.1016/j.pec.2009.03.035.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Hanoch Y, Miron-Shatz T, Cole H, Himmelstein M, Federman AD: Choice, numeracy, and physicians-in-training performance: the case of Medicare Part D. Health Psychol. 2010, 29: 454-459.

    Article  PubMed  Google Scholar 

  33. Galesic M, Garcia-Retamero R: Statistical numeracy for health: a cross-cultural comparison with probabilistic national samples. Arch Intern Med. 2010, 170: 462-468. 10.1001/archinternmed.2009.481.

    Article  PubMed  Google Scholar 

  34. OECD: PISA. 2000,, Technical Report,

    Google Scholar 

  35. OECD: PISA. 2003,, Technical Report,

    Google Scholar 

  36. OECD: PISA. 2006,, Technical Report,

    Google Scholar 

  37. OECD: PISA. 2009,, Technical Report,

    Book  Google Scholar 

  38. Okan Y, Rocio G, Cokely ET, Maldonado A: Individual differences in graph literacy: Overcoming denominator neglect in risk comprehension. J Behav Decis Mak. 2011, 25: 390-401.

    Article  Google Scholar 

  39. O'Keefe DJ, Jensen JD: The relative persuasiveness of gain-framed and loss-framed messages for encouraging disease prevention behaviors: a meta-analytic review. J Health Commun. 2007, 12: 623-644. 10.1080/10810730701615198.

    Article  PubMed  Google Scholar 

  40. Ancker JS, Senathirajah Y, Kukafka R, Starren JB: Design features of graphs in health risk communication: a systematic review. J Am Med Inform Assoc. 2006, 13: 608-618. 10.1197/jamia.M2115.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Lipkus IM: Numeric, verbal, and visual formats of conveying health risks: suggested best practices and future recommendations. Med Decis Making. 2007, 27: 696-713. 10.1177/0272989X07307271.

    Article  PubMed  Google Scholar 

  42. Kurz-Milcke E, Gigerenzer G, Martignon L: Transparency in risk communication: graphical and analog tools. Ann N Y Acad Sci. 2008, 1128: 18-28. 10.1196/annals.1399.004.

    Article  PubMed  Google Scholar 

  43. Hanoch Y, Pachur T: Nurses as information providers: facilitating understanding and communication of statistical information. Nurse Educ Today. 2004, 24: 236-243. 10.1016/j.nedt.2004.01.004.

    Article  PubMed  Google Scholar 

  44. Uttl B: Measurement of individual differences: lessons from memory assessment in research and clinical practice. Psychol Sci. 2005, 16: 460-467.

    PubMed  Google Scholar 

  45. Muthén B: Moments of the censored and truncated bivariate normal distribution. Br J Math Stat Psychol. 1990, 43: 131-143. 10.1111/j.2044-8317.1990.tb00930.x.

    Article  Google Scholar 

  46. Sheng Y, Sheng Z: Is coefficient alpha robust to non-normal data?. Front Psychol. 2012, 3: 1-13.

    Article  Google Scholar 

  47. Duckworth AL, Quinn PD, Lynam DR, Loeber R, Stouthamer-Loeber M: Role of test motivation in intelligence testing. Proc Natl Acad Sci U S A. 2011, 108: 7716-7720. 10.1073/pnas.1018601108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Cokely ET, Kelley CM: Cognitive abilities and superior decision making under risk: A protocol analysis and process model evaluation. Judgm Decis Mak. 2009, 4: 20-33.

    Google Scholar 

  49. Messick S: Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. Am Psychol. 1995, 50: 741-749.

    Article  Google Scholar 

  50. Heiman GW: Research methods in psychology. 2002, Houghton Mifflin, Boston & New York, 3

    Google Scholar 

  51. Schwartz LM, Woloshin S, Welch HG: Can patients interpret health information? An assessment of the medical data interpretation test. Med Decis Making. 2005, 25: 290-300. 10.1177/0272989X05276860.

    Article  PubMed  Google Scholar 

  52. Mplus user's guide. 1998–2010,, 6,

  53. Beauducel A, Herzberg PY: On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling: A Multidisciplinary Journal. 2006, 13: 186-203. 10.1207/s15328007sem1302_2.

    Article  Google Scholar 

  54. Flora DB, Curran PJ: An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol Methods. 2004, 9: 466-491.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Muthén B: Dichotomous factor analysis of symptom data. Latent Variable Models for Dichotomous Outcomes: Analysis of Data from the Epidemiological Catchment Area Program. Edited by: Eaton WW, Bohrnstedt GW. 1989, Sage Periodicals Press, Newbury Park, CA, 19-65. Sociological methods & research

    Google Scholar 

  56. Muthén BO, du Toit SHC, Spisic D: Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes.,

  57. Bernstein IH, Rush AJ, Carmody TJ, Woo A, Trivedi MH: Clinical vs. self-report versions of the quick inventory of depressive symptomatology in a public sector sample. J Psychiatr Res. 2007, 41: 239-246. 10.1016/j.jpsychires.2006.04.001.

    Article  PubMed  Google Scholar 

  58. Comrey AL, Lee HB: A First Course in Factor Analysis. 1992, Lawrence Erlbaum Associates, Hillsdale, NJ, 2

    Google Scholar 

  59. Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika. 1951, 16: 297-334. 10.1007/BF02310555.

    Article  Google Scholar 

Pre-publication history

Download references


We thank Drs. Y. Takahashi and T. Shinbo, the developers of TAIMI for their help in using TAIMI. We also thank Profs. H. Kanuka and S. Kawazu for their support in conducting the research, Mr. Kitazawa, Messrs. Sugimoto, Noguchi and Yamauchi for their assistance in data preparation, and ELCS for proofreading the manuscript. This work was supported in part by a grant from the Global COE Program from Japanese Ministry of Education, Science, Sports, Culture and Technology (MEXT), Programme for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry (MO), Grant-in-Aid for Young Scientists (B) 20700779 (MO) and 23700921 (YK) from MEXT, Grant-in-Aid (B) 23300247 from MEXT, and grants from the Japan Science and Technology Agency, under the Strategic Promotion of Innovative Research and Development Program, and Comprehensive Research on Disability, Health and Welfare from Health and Labour Sciences Research Grants (ID). None of the funding bodies had any role in the study design, collection, analysis and interpretation of the data, writing of the paper, or in the decision to submit the manuscript for publication.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Masako Okamoto.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MO designed the study, collected, analyzed, and interpreted the data, and wrote the manuscript. YK designed the statistical analysis procedures, contributed to the study design, supported collection, analysis and interpretation of the data, and helped with the manuscript development. MS and EW contributed to the questionnaire development and data collection. LC contributed to questionnaire development, data analysis, and manuscript development. ID contributed to design the study, collection, analysis, and interpretation of the data, and manuscript development. KK contributed to questionnaire development, data collection and study coordination. All authors have read and approved the final manuscript.

Masako Okamoto, Yasushi Kyutoku contributed equally to this work.

Electronic supplementary material


Additional file 1:Table S1. Education levels of the Japanese population. Education histories for different generations and genders. Percentage of people with a high school education or lower in the Japanese adult population for each category is shown based on the latest national survey (Ministry of Internal Affairs and Communications, as of October 1, 2007). Sampling quotas in the current study were allocated according to this proportion. (DOC 33 KB)


Additional file 2:Table S2. Household income of the Japanese population. Percentage of people in each household income category in the Japanese adult population (Population column) and in the current sample (Sample column). Data is based on the latest national survey (Ministry of Internal Affairs and Communications, as of October 1, 2007). (DOC 32 KB)


Additional file 3:Table S3. Numeracy scores for each demographic sub-group. Mean ± standard deviation is shown for each sub-group. Scores between subgroups were compared using non-parametric methods, but means are presented because median scores did not show differences between sub-groups. The effect of gender and educational attainment was significant for both scales (Mann–Whitney's test, effect of gender, Schwartz-J, Z=2.6, p<0.01; Lipkus-J9, Z=2.6, p<0.01; effect of education, Schwartz-J, Z=2.0, p<0.05; Lipkus-J9, Z=2.3, p<0.05). The effect of age was significant only for Schwartz-J, where post hoc analysis revealed that the 40-49 year old group performed significantly better than the 60-69 year old group (Mann-Whitney's test, Z=2.9, p<0.05, Bonferroni corrected). (DOC 40 KB)

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Okamoto, M., Kyutoku, Y., Sawada, M. et al. Health numeracy in Japan: measures of basic numeracy account for framing bias in a highly numerate population. BMC Med Inform Decis Mak 12, 104 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: