In this study, we evaluated Japanese numeracy by translating and applying the Schwartz and Lipkus scales, the widely used health numeracy scales that focus on the understanding of basic math and probability. Translated versions of both scales showed certain reliability and validity, however, the Japanese sample’s high performance caused the score distributions to be negatively skewed, imposing limitations on the psychometric evaluations of the scales. In this section, we first discuss Japanese numeracy in light of our results. Then we address the validity and limits of Lipkus-J and Schwartz-J, and future directions for the application of health numeracy measures.
The current study suggests that basic understanding of math and probability is quite high among Japanese: correct response rates for Lipkus-J items were much higher than those found in the original US samples , and in more recent studies on probabilistic national German and US samples . This is consistent with the results of the Programme for International Student Assessment (PISA), where the national average math score for Japanese students has been surpassing those of both the US and Germany since its inception [34–37]. The relatively high attainment of math skills during school education, as assessed by PISA, might partly account for the generally high numeracy of the Japanese. The current result is also in line with the recent study assessing the numeracy skills of students at top universities in 15 counties . Although linking top-university level performance with that of the general population is not straightforward, Japan was second best in having the smallest proportion of respondents falling into the lowest quartile.
However, in spite of generally high numeracy, the performance of Japanese sample on the Schwartz-J and Lipkus-J tests still accounted for susceptibility to the framing effect, which can influence patients’ decisions regarding their medical options, such as acceptance of surgery (e.g. [12, 19, 38]; however, empirical results on framing effects in a clinical setting are mixed, reviewed in ). A number of previous studies using the original Schwartz and Lipkus scales have shown a numeracy effect on understanding and decision making based on medical information (reviewed in [5, 8–11]). Moreover, studies have been advancing for communicating quantitative risk information with consideration of patients’ numeracy, such as supplementing numerical data with visual or verbal aides, using natural frequencies rather than probabilities, or presenting risks with both negative and positive frames (reviewed in [10, 11, 40–43]). Considering our results and these earlier findings, such care would be called for when communicating medical information to those with low numeracy in Japan, and possibly in other countries where general math performance is deemed to be high.
Regarding instruments to identify those with low numeracy, both the Schwartz-J and Lipkus-J scales demonstrated certain reliability and validity, with Cronbach’s α being comparable with those of original scales, convergent validity being supported by their positive correlation with other health literacy and numeracy measure (TAIMI, ), criterion validity being suggested by their association with the susceptibility to framing bias, and content validity being ensured in the original scales. However, we also found a pronounced ceiling effect, which confounded the analysis we have applied, and limited the psychometric qualities of the scales.
Ceiling effects pose multiple psychometric limitations . First, they suggest that scales are less able to differentiate among those with high numeracy. Second, statistical methods applicable for data analysis become limited, as many popular methods assume a normal distribution, and possibly giving in erroneous results when this assumption is violated [44–46]. Non-parametric alternatives are not always sufficient. For example, in the current study, we had to use a median split, making it difficult to examine the relationship between numeracy scores and framing effects in depth.
A third limitation is that the means to evaluate the validity of the scale. For example, respondents’ performances can be confounded with other factors such as motivation . However, ensuring discriminant validly is not straightforward with data having a ceiling effect, as, for example, a weak correlation between motivation and numeracy scores might be due to the ceiling effect, rather than the variables being truly unrelated. Similarly, examining the relationships between measured ability with other closely related abilities such as working memory  would be confounded by the ceiling effect. Thus, use of Lipkus-J and Schwartz-J with high numeracy sample requires careful consideration of those limitations.
In fact, negative skew for the original Schwartz and Lipkus scales have been noted in a number of earlier studies [28–33], and the limitations mentioned above have been pointed out [16, 27]. In response to those concerns, new numeracy scales have recently been developed: the Berlin Numeracy Test (BNT, ) and the Abbreviated Numeracy Scale (ANS, ). While both scales were built on the works of Schwartz and Lipkus, they have a wider range of difficulty. As a result, they have better psychometric characteristics, especially when used with high-performance samples. Considering the generally high numeracy of Japanese, those new scales might be more suitable for assessing numeracy in Japan, and this should be explored.
Meanwhile, Lipkus-J and Schwartz-J could be useful for assessing those having low numeracy. In the above-mentioned studies that developed new numeracy scales, the effectiveness of the original Lipkus and Schwartz scales is indicated for assessing groups with low numeracy [16, 27]. In fact, positive skew was observed in some of the samples studied using BNT , where easier tests would work better. This is an important point to consider when clinical applications are in scope, because some patients are likely to be under physical and psychological stress, which might result in lower numeracy. For instance, a recent clinical study using the Lipkus scale found the numeracy of epilepsy patients to be significantly lower than healthy controls even though educational attainments were lower in the control group . This issue also bears on the test’s validity; where the psychometric characteristics of scales could differ across population groups or settings . Considering possible difference between patients and healthy groups, the use of the numeracy scales translated here, as well as the above-mentioned new scales, should be explored using patient samples so that more effective numeracy measures for the patient population can be discovered.
Finally, the possible influence of volunteer bias  should be noted when interpreting the current results. Although demographics of the sample matched those of the Japanese adult population, the test respondents were those who voluntarily agreed to participate in a survey concerning numbers. Therefore, the results could be biased towards those who are more interested in solving numerical problems, and not actually representative of the population. In fact, the average total score of TAIMI in the current study was 4.7, which is higher than that of 3.9 in the original report (Internet survey, n = 6047, ). This disparity might be due to differences in sample composition between Takahashi et al.’s work and ours (there were more females and elderly in their study, and no education levels were reported). However, it is also possible that the numeracy reported here is higher than average. This issue should be addressed in future random-selection population-based surveys.