Measurement properties of the Inventory of Cognitive Bias in Medicine (ICBM)
© Sladek et al; licensee BioMed Central Ltd. 2008
Received: 05 November 2007
Accepted: 28 May 2008
Published: 28 May 2008
Understanding how doctors think may inform both undergraduate and postgraduate medical education. Developing such an understanding requires valid and reliable measurement tools. We examined the measurement properties of the Inventory of Cognitive Bias in Medicine (ICBM), designed to tap this domain with specific reference to medicine, but with previously questionable measurement properties.
First year postgraduate entry medical students at Flinders University, and trainees (postgraduate doctors in any specialty) and consultants (N = 348) based at two teaching hospitals in Adelaide, Australia, completed the ICBM and a questionnaire measuring thinking styles (Rational Experiential Inventory).
Questions with the lowest item-total correlation were deleted from the original 22 item ICBM, although the resultant 17 item scale only marginally improved internal consistency (Cronbach's α = 0.61 compared with 0.57). A factor analysis identified two scales, both achieving only α = 0.58. Construct validity was assessed by correlating Rational Experiential Inventory scores with the ICBM, with some positive correlations noted for students only, suggesting that those who are naïve to the knowledge base required to "successfully" respond to the ICBM may profit by a thinking style in tune with logical reasoning.
The ICBM failed to demonstrate adequate content validity, internal consistency and construct validity. It is unlikely that improvements can be achieved without considered attention to both the audience for which it is designed and its item content. The latter may need to involve both removal of some items deemed to measure multiple biases and the addition of new items in the attempt to survey the range of biases that may compromise medical decision making.
The context of decision making in modern healthcare is complex, often involving multiple decision makers from varied professions. While their decisions may be imbedded in such broad organisational contexts,  doctors still retain a primary role in diagnostic and treatment decisions for patients,  ultimately determining which protocols to follow [3, 4]. Optimising doctors' decision making is therefore a worthy objective. For example, if cognitive processes influence doctors' decision making, an improved understanding of these processes may contribute to maximising patient outcomes and avoiding common errors [5–8]. Our specific interest lies in understanding the cognitive processes that may inform strategies designed to change existing clinician practices so that gaps between the known best research evidence and existing clinical practices are reduced . In sum, understanding how doctors think may contribute to both undergraduate and postgraduate medical education.
Such an understanding requires valid and reliable measurement tools. Currently available instruments include the Rational Experiential Inventory (REI)  and the Thinking Dispositions Composite  which measure general thinking styles. Even the Sensing-Intuiting and Thinking-Feeling subscales derived from the Myers-Briggs Type Indicator  have been proposed to measure cognitive style. To date relatively few scales have been designed to tap this domain with specific reference to medicine. The Inventory of Cognitive Bias in Medicine (ICBM) is one such potential instrument .
The ICBM was designed to measure the extent to which cognitive biases detract from logical and statistical thinking . Items were constructed without reference to any specific theoretical model, although the authors drew on the substantial body of 'heuristics and biases' research from psychology, demonstrating that even experts fall victim to common biases in reasoning [14, 15]. The ICBM comprises 22 items, each of which presents a clinical scenario to which responses represent either a 'correct' answer based on a statistical rationale or a 'bias prone' response . That is, reasoning is assumed to be either rational (a statistically correct answer) or biased (a statistically incorrect answer).
The ICBM was originally administered to medical students, residents and physicians, with the students responding with a higher level of bias than the physicians. Later it was noted that students who attended a seminar on cognitive bias demonstrated significantly less biased responses than non-attendees . Unfortunately, only modest internal reliability coefficients (α) of 0.62 for physicians and 0.42 for students/residents were obtained . Our own research using the ICBM, as yet unpublished, also found a modest overall α (0.56), obtained from a similar sample of medical students (0.68), postgraduate trainee doctors (0.51) and medical consultants (0.41). The total sample figure was improved marginally by iteratively deleting items based on low item-total correlations until only 10 items remained (α = 0.61).
Clearly, despite apparent face validity and the critical importance of being able to assess medical decision making, the ICBM currently lacks both a theoretical basis and construct validity. Recently, over 30 years of heuristics and biases research was reconsidered in relation to emergent support for dual processing models of reasoning . Such models propose two modes of cognitive processing; one referred to as experiential, heuristic, intuitive, unconscious and fast; the other as deliberate, reflective, rational, conscious and slow [18–21]. Within this framework a biased response to an ICBM item could be considered prima facie evidence for a heuristic mode of reasoning. While this potentially offers a theoretical basis for the ICBM, its construct validity can only be demonstrated by comparing responses with other test instruments that measure the same or similar constructs.
The current study had two objectives. First, we wished to examine the measurement properties of the ICBM, with the specific goal of improving internal consistency. Second, we sought to investigate the construct validity of the ICBM by comparing it with scores from the Rational Experiential Inventory (REI). Consistent with a dual processing model of reasoning, this instrument measures rational (need for cognition) and experiential (faith in intuition) thinking styles. It was hypothesised that if higher ICBM scores reflect rational reasoning and lower scores reflect experiential reasoning (cognitive bias), there would be positive associations with need for cognition and/or negative associations with faith in intuition.
This study was undertaken in Adelaide, South Australia at Flinders Medical Centre (500 beds) and Repatriation General Hospital (280 beds), both metropolitan teaching hospitals affiliated with Flinders University. Ethics approval was obtained from the Research and Ethics Committees of both institutions.
Participants and procedure
A study of thinking dispositions among doctors and medical students provided 147 participants. This was augmented by the addition of 201 participants. In total, 77 first year postgraduate entry medical students, 88 trainees (postgraduate doctors in any specialty) and 183 consultants were recruited to the study (N = 348). Each tranche of participants was recruited using a similar procedure. Questionnaires were mailed to doctors (emailed to students) with two follow-up reminders at two-week intervals. Due to initial low numbers of students, an additional invitation to participate was made personally during a teaching session.
The data reported are age, gender, and scores for the ICBM and REI, respectively.
Inventory of Cognitive Bias in Medicine
Representation of cognitive biases in the ICBM
No. of Items
Insensitivity to prior probability of outcomes
Use of irrelevant information in probability estimates
Insensitivity to sample size
Easily retrievable instances are judged to be more frequent
Framing of anchoring bias
Misperceptions of chance
Insensitivity to the principle of regression
Insensitivity to superior reliability of objective over intuitive data
Rational Experiential Inventory
The REI is a reliable and valid instrument containing scales that measure rational (need for cognition) and experiential (faith in intuition) thinking dispositions . Need for cognition reflects the tendency to actively engage in, and enjoy, thinking. Faith in intuition measures the preference for experiential processing. Within each scale an ability subscale assesses how well a person believes they use each disposition, while a favourability subscale assesses reliance on and enjoyment of each disposition. There are 40 questions with 5-point response scales (20 each for need for cognition and faith in intuition, with 10 items each for the subscales of ability and favourability). Scores are averaged to provide variables ranging from 1 to 5, with a higher score reflecting a greater tendency to endorse the construct measured. The current sample provided internal reliabilities (α) of 0.90 (total need for cognition), 0.81 (need for cognition: ability), 0.82 (need for cognition: favourability), 0.79 (total faith in intuition), 0.74 (faith in intuition: ability), 0.63 (faith in intuition: favourability).
Correct responses to ICBM items.
1. Training program
2. Atrial fibrillation
5. Single mother
6. Leg pain
10. Breast nodule
11. Lung cancer
14. Female CAD
16. Angina CHD
18. Information source
19. Hollywood death
20. Birth rates
21. Blood pressure
22. Twin weights
Summary statistics for key study variables.
Gender (n, % male)
Age in years (mean, SD)
REI (mean, SD)
Need for cognition (total)
Need for cognition (ability)
Need for cognition (favourability)
Faith in intuition (total)
Faith in intuition (ability)
Faith in intuition (favourability)
ICBM22 (mean, SD)
ICBM10 (mean, SD)
ICBM17 (mean, SD)
ICBMs1 (mean, SD)
ICBMs2 (mean, SD)
The internal reliability (α) of the ICBM22 was 0.57 (0.55 students, 0.52 trainees, 0.56 consultants), indicating a relatively poor level of internal consistency. The ICBM10 demonstrated similarly poor reliability of 0.57 (0.55 students, 0.54 trainees, 0.52 consultants). In light of these internal consistency figures, two strategies were used in the attempt to improve this property of the ICBM.
Item-total Correlation Analysis
First, iterative removal of those items with the lowest item-total correlation was undertaken, identical to the procedure previously used to derive the ICBM10. With the current data, α was maximised with a 17 item scale (ICBM17). Items 8, 11, 18, 21 and 22 were omitted. These items assessed misconceptions of chance (8), framing or anchoring bias (11 and 22), insensitivity to superior reliability of objective over subjective data (18), and insensitivity to the principle of regression (21). The resultant 17 item scale achieved an α of 0.61 (0.56 students, 0.57 trainees, 0.61 consultants), with item-total correlations ranging from 0.11 to 0.37.
Removal of a further item resulted in the total sample α being maintained at 0.61 but a reduction in consistency for two of the three subsamples (0.58 students, 0.55 trainees, 0.59 consultants). This result suggested that further gains in internal consistency could only be attained through differential item removal for each subsample. That is, different scales for different target groups. Summary statistics for the ICBM17 are included in Table 3. As with the ICBM22, correct responses increased with experience.
Second, factor analysis was used as a pragmatic guide to potential subscale membership among the ICBM items . Decisions regarding the nature of the factor analysis were based on this strategy. Maximum likelihood extraction was used to allow generalisation from a sample to a population  and for correlations with more unique variance and less error variance to be given more weight . Most importantly, adjustment is made for the constraints imposed on the data based on the increased potential for non-random measurement error associated with dichotomous variables.
This procedure resulted in an initial solution comprising 10 factors, each with an eigenvalue greater than one, although only 60% of the variance among items was accounted for by these factors. This result reflects the tendency for dichotomous variables to cluster together due to similar response distributions rather than actual item content , producing additional factors that are essentially statistical artefacts. Therefore, a conservative decision rule for retaining factors for rotation was employed. Parallel analysis criteria  takes account of both sample size and the number of items being analysed and is more reliable than the misunderstood 'eigenvalues greater than one' rule , particularly when there are many coefficients of modest size within the available correlation matrix (communality range 0.025 to 0.197). Further, an oblique rotation of the retained factors was undertaken using the oblimin criterion (delta = 0), to acknowledge the expected intercorrelations among any dimensions of the ICBM.
Using this strategy only two factors were retained for rotation, accounting for a mere 19% of variance. Scales were nevertheless computed from these factors. Scale 1 (ICBMs1) comprised items predominantly concerned with insensitivity to sample size and representativeness (items 1, 4, 5, 6, 9, 10, 12, 14, 15, 16, 17, 20). Scale 2 (ICBMs2) tended to tap availability and representativeness (items 2, 3, 4, 6, 7, 10, 12, 13, 14, 15, 19). The substantial overlap in the content of these two scales should be noted (r = .68, p < .001). Interestingly, the five items that did not load on either or both of these scales were identical to those excluded by the item-total correlation analysis. Summary statistics for these scales are included in Table 3. For both scales, correct responses increased with experience. Internal consistency was 0.58 (0.52 students, 0.56 trainees, 0.61 consultants) for ICBMs1 and 0.58 (0.60 students, 0.47 trainees, 0.51 consultants) for ICBMs2.
Correlations between ICBM scores and the Rational Experiential Inventory.
Rational Experiential Inventory
Need for Cognition
Faith in Intuition
To gain acceptance as a useful measurement instrument there are a number of characteristics that any test should demonstrate. Three such characteristics have been reported in the current study. First, the test should possess content validity. That is, it should contain items that appropriately sample the construct to be measured. Second, there should be internal consistency among the chosen items. That is, it is an expectation that items purporting to measure the same construct should demonstrate reasonable item-total correlations before they are used to create a summative scale. This is commonly examined using α. Third, the instrument should demonstrate a reasonable level of construct validity. That is, different measurement tools measuring constructs from similar domains should exhibit relationships that are in accord with underlying theory.
The data presented suggest that the ICBM potentially fails all of these examinations. Considering the latter first, only modest evidence was available of theoretically-supported links between the ICBM and scores from the REI. Such evidence was best illustrated by presenting separate coefficients of association for each of the subsamples. These relationships could generously be described as encouraging for the student group, tenuous among trainees, and non-existent for consultants. Admittedly, construct validity for the ICBM could only be examined using the REI. It is possible that other instruments from the thinking dispositions domain may offer more encouraging results. Nevertheless, we have found in a series of studies [27, 28], some as yet unpublished, that the REI is reliable, valid and predictive, and as such is an entirely sensible standard with which to compare the ICBM. More probable is that the poor associations between the ICBM and REI are artefacts, stemming from the similarly poor internal reliability coefficients that we have presented. No coefficient achieved the level of internal consistency normally considered acceptable for mature scales (.70), although most reached the level deemed respectable for scales under development (.50–.60) .
Failure to attain more appropriate internal consistency draws attention to the very content of the ICBM. Interestingly, the original evidence supporting the validity of the ICBM merely comprised a content review by experts . Yet there remains inherent uncertainty regarding exactly which heuristics or biases are assessed by each ICBM item. The authors  identified 35 biases that were addressed by the 22 items (Table 1), without specific direction as to which item tapped which bias(es). Clearly some items tap more than one bias, there is selective bias (sic) among the biases chosen for measurement, and also significant overlap across some biases concerning the logical error committed by choosing the incorrect response. Unfortunately these observations mean that it remains unknown whether items address "heuristic reasoning", "rational reasoning", or indeed both.
The above comments help to explain the pattern of results obtained from both the item-total correlation analysis and the factor analysis. The five items removed during the former procedure were extrapolated by us as tapping relatively unique biases assessed by only one or two items (misconceptions of chance, framing/anchoring, insensitivity to superior reliability of objective over subjective data, insensitivity to the principle of regression). These five items were equally ineffective in demonstrating shared variance in the factor analysis. The two factors identified accounted for little variance themselves, and there was significant overlap of item content (insensitivity to sample size/representativeness, availability/representativeness). On the one hand this result may reflect the lack of one-to-one measurement of biases among some items, as noted above, while on the other hand perhaps it simply mirrors the biases that are most frequently assessed among the ICBM items.
Rather than measure a thinking style or styles per se, our results also suggest that it is more likely that the ICBM in fact taps a relevant knowledge base that increases with professional experience. While all other results are equivocal, there is clear evidence (Table 3) that correct responses to all versions of the ICBM presented increase significantly with medical experience. Interestingly, and perhaps most importantly, correlations between the ICBM and REI for students (Table 4) further suggest that those who are naïve to the knowledge base required to "successfully" respond to the ICBM may nevertheless profit by a thinking style in tune with logical reasoning (i.e., relatively high need for cognition and/or relatively low faith in intuition). This appears particularly true for those naïve participants who expressed favouring such a thinking style. These observations are in accord with one of the original propositions underlying the development of the ICBM, which suggested its use as a potential teaching tool .
It is unlikely that improvement of the ICBM can be achieved without considered attention to both the audience for which it is designed and a careful analysis and revision of the items themselves. The latter may need to involve both removal of some scenarios deemed to measure multiple biases and the addition of new items in the attempt to more appropriately and fulsomely survey the range of biases now understood to compromise decision making in the medical domain. Such efforts, while substantial, represent the logical prerequisite to the establishment of content validity for the ICBM. Nevertheless, such efforts may yet prove fruitful given the contemporary interest on the role of cognitions in medical decision making.
Inventory of Cognitive Bias in Medicine
Original 22 question version 
10 question-version found by authors in an earlier study
17 question-version based on iterative removal of those items with the lowest item-total correlation
Scale 1 identified through factor analysis
Scale 2 identified through factor analysis
Rational Experiential Inventory.
Ruth Sladek is a National Institute of Clinical Studies (NICS) Scholar. NICS is an institute of the National Health and Medical Research Council (NHMRC), Australia's peak body for supporting health and medical research.
- Rimer MK, Glantz K: Theory at a glance: a guide for health promotion practice. 2005, United States: Department of Health and Human Services, 2Google Scholar
- Denis JL, Hebert Y, Langley A, Lozeau D, Trottier LH: Explaining diffusion patterns for complex health care innovations. Health Care Manage Rev. 2002, 27: 60-73.View ArticlePubMedGoogle Scholar
- Gurmankin AD, Baron J, Hershey JC, Ubel PA: The role of physicians' recommendations in medical treatment decisions. Med Decis Making. 2002, 22: 262-271. 10.1177/02789X02022003008.PubMedGoogle Scholar
- Bonetti D, Pitts NB, Eccles M, Grimshaw J, Johnston M, Steen N, Glidewell L, Thomas R, Maclennan G, Clarkson JE, Walker A: Applying psychological theory to evidence-based clinical practice: identifying factors predictive of taking intra-oral radiographs. Soc Sci Med. 2006, 63: 1889-1999. 10.1016/j.socscimed.2006.04.005.View ArticlePubMedGoogle Scholar
- Walker AE, Grimshaw J, Johnston M, Pitts N, Steen N, Eccles M: PRIME: Process modelling in ImpleMEntation research. BMC Health Serv Res. 2003, 3: 22-10.1186/1472-6963-3-22.View ArticlePubMedPubMed CentralGoogle Scholar
- Croskerry P: The cognitive imperative: thinking about how we think. Acad Emerg Med. 2000, 7: 1223-1231. 10.1111/j.1553-2712.2000.tb00467.x.View ArticlePubMedGoogle Scholar
- Croskerry P: The important of cognitive errors in diagnosis and strategies to minimize them. Acad Med. 2003, 78: 775-780. 10.1097/00001888-200308000-00003.View ArticlePubMedGoogle Scholar
- Brehaut JC, Poses R, Shojania KG, Lott A, Man-Son-Hing M, Bassin E, Grimshaw J: Do physician outcome judgments and judgment biases contribute to inappropriate use of treatments? Study protocol. Implement Sci. 2007, 2: 18-10.1186/1748-5908-2-18.View ArticlePubMedPubMed CentralGoogle Scholar
- Sladek RM, Phillips PA, Bond MJ: Parallel dual processing models of reasoning: a role for implementation science?. Implement Sci. 2006, 1: 12-10.1186/1748-5908-1-12.View ArticlePubMedPubMed CentralGoogle Scholar
- Pacini R, Epstein S: The relation of rational and experiential information processing styles to personality, basic beliefs, and the ratio-bias phenomenon. J Pers Soc Psychol. 1999, 76: 972-987. 10.1037/0022-3518.104.22.1682.View ArticlePubMedGoogle Scholar
- Stanovich KE: Who is rational? Studies in individual differences in reasoning. 1999, Mahway, New Jersey: Lawrence ErlbaumGoogle Scholar
- Myers IB, McCaulley MH, Quenk NL, Hammer AL: MBTI manual: a guide to the development and use of the Myers-Briggs Type Indicator. 1998, Palo Alto: CPP, 3Google Scholar
- Hershberger PJ, Part HM, Markert RJ, Cohen SM, Finger WW: Development of a test of cognitive bias in medical decision making. Acad Med. 1994, 69: 839-842. 10.1097/00001888-199410000-00014.View ArticlePubMedGoogle Scholar
- Kahneman D, Slovic P, Tversky A: Judgement under uncertainty: heuristics and biases. 1982, Cambridge: Cambridge UniversityView ArticleGoogle Scholar
- Gilovich T, Griffin D, Kahneman D, eds: Heuristics and biases: the psychology of intuitive judgment. 2002, Cambridge: Cambridge UniversityGoogle Scholar
- Hershberger PJ, Markert RJ, Part HM, Cohen SM, Finger WW: Understanding and addressing cognitive bias in medical education. Advances in Health Science Education. 1997, 1: 221-226. 10.1023/A:1018372327745.View ArticleGoogle Scholar
- Kahneman D: A perspective on judgment and choice: mapping bounded rationality. Am Psychol. 2003, 58: 697-720. 10.1037/0003-066X.58.9.697.View ArticlePubMedGoogle Scholar
- Sloman S: The empirical case for two systems of reasoning. Psychol Bull. 1996, 119: 3-22. 10.1037/0033-2909.119.1.3.View ArticleGoogle Scholar
- Evans J, Over D: Rationality and reasoning. 1996, UK: Psychology PressGoogle Scholar
- Stanovich KE, West RF: Individual differences in reasoning: implications for the rationality debate?. Behav Brain Sci. 2000, 23: 645-726. 10.1017/S0140525X00003435.View ArticlePubMedGoogle Scholar
- Shaffir E, LeBoeuf RA: Rationality. Annu Rev Psychol. 2002, 53: 491-517. 10.1146/annurev.psych.53.100901.135213.View ArticleGoogle Scholar
- Kim J, Mueller CW: Factor analysis: Statistical methods and practical issues. 1985, Newbury Park, CA: SageGoogle Scholar
- Gorsuch RL: Factor analysis. 1983, Hillsdale, NJ: Lawrence Erlbaum, 2Google Scholar
- Bernstein I, Garbin C, Teng G: Applied multivariate analysis. 1988, NY: PlenumView ArticleGoogle Scholar
- Lautenschlager GJ: A comparison of alternatives to conducting Monte Carlo analyses for determining parallel analysis criteria. Multivariate Behav Res. 1989, 24: 365-395. 10.1207/s15327906mbr2403_6.View ArticlePubMedGoogle Scholar
- Cliff N: The eigenvalues-greater-than-one rule and the reliability of components. Psychol Bull. 1988, 103: 276-279. 10.1037/0033-2909.103.2.276.View ArticleGoogle Scholar
- Sladek RM, Bond MJ, Phillips PA: Thinking styles and doctors' knowledge and behaviours relating to acute coronary syndrome guidelines. Implement Sci. 2008, 3: 23-10.1186/1748-5908-3-23.View ArticlePubMedPubMed CentralGoogle Scholar
- Sladek RM, Bond MJ, Phillips PA: Why don't doctors wash their hands? A correlational study of thinking dispositions and hand hygiene. Am J Infect Control.
- Nunnally JC: Psychometric theory. 1978, NY: McGraw-Hill, 2Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/8/20/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.