Assessment of the potential impact of a reminder system on the reduction of diagnostic errors: a quasi-experimental study
© Ramnarayan et al; licensee BioMed Central Ltd. 2006
Received: 25 September 2005
Accepted: 28 April 2006
Published: 28 April 2006
Computerized decision support systems (DSS) have mainly focused on improving clinicians' diagnostic accuracy in unusual and challenging cases. However, since diagnostic omission errors may predominantly result from incomplete workup in routine clinical practice, the provision of appropriate patient- and context-specific reminders may result in greater impact on patient safety. In this experimental study, a mix of easy and difficult simulated cases were used to assess the impact of a novel diagnostic reminder system (ISABEL) on the quality of clinical decisions made by various grades of clinicians during acute assessment.
Subjects of different grades (consultants, registrars, senior house officers and medical students), assessed a balanced set of 24 simulated cases on a trial website. Subjects recorded their clinical decisions for the cases (differential diagnosis, test-ordering and treatment), before and after system consultation. A panel of two pediatric consultants independently provided gold standard responses for each case, against which subjects' quality of decisions was measured. The primary outcome measure was change in the count of diagnostic errors of omission (DEO). A more sensitive assessment of the system's impact was achieved using specific quality scores; additional consultation time resulting from DSS use was also calculated.
76 subjects (18 consultants, 24 registrars, 19 senior house officers and 15 students) completed a total of 751 case episodes. The mean count of DEO fell from 5.5 to 5.0 across all subjects (repeated measures ANOVA, p < 0.001); no significant interaction was seen with subject grade. Mean diagnostic quality score increased after system consultation (0.044; 95% confidence interval 0.032, 0.054). ISABEL reminded subjects to consider at least one clinically important diagnosis in 1 in 8 case episodes, and prompted them to order an important test in 1 in 10 case episodes. Median extra time taken for DSS consultation was 1 min (IQR: 30 sec to 2 min).
The provision of patient- and context-specific reminders has the potential to reduce diagnostic omissions across all subject grades for a range of cases. This study suggests a promising role for the use of future reminder-based DSS in the reduction of diagnostic error.
A recent Institute of Medicine report has brought the problem of medical error under intense scrutiny. While the use of computerized prescription software has been shown to substantially reduce the incidence of medication-related error [2, 3], few solutions have demonstrated a similar impact on diagnostic error. Diagnostic errors impose a significant burden on modern healthcare: they account for a large proportion of medical adverse events in general [4–6], and form the second leading cause for malpractice suits against hospitals . In particular, diagnostic errors of omission (DEO) during acute medical assessment, resulting from cognitive biases such as 'premature closure' and 'confirmation bias', lead to incomplete diagnostic workup and 'missed diagnoses' . This is especially relevant in settings such as family practice , as well as hospital areas such as the emergency room and critical care [11, 12], 20% of patients discharged from emergency rooms raised concerns in a recent survey that their clinical assessment had been complicated by diagnostic error . The use of clinical decision-support systems (DSS) has been one of many strategies proposed for the reduction of diagnostic errors in practice . Consequently, a number of DSS have been developed over the past few years to assist clinicians during the process of medical diagnosis [14–16].
Even though studies of several diagnostic DSS have demonstrated improved physician performance in simulated (and rarely real) patient encounters [17, 18], two specific characteristics may have contributed to their infrequent use in routine practice: intended purpose and design. Many general diagnostic DSS were built as 'expert systems' to solve diagnostic conundrums and provide the correct diagnosis during a 'clinical dead-end' . Since true diagnostic dilemmas are rare in practice , and the initiative for DSS use had to originate from the physician, diagnostic advice was not sought routinely, particularly since clinicians prefer to store the patterns needed to solve medical problems in their heads . There is, however, evidence that clinicians frequently underestimate their need for diagnostic assistance, and that the perception of diagnostic difficulty does not correlate with their clinical performance . In addition, due to the demands of the information era . diagnostic errors may not be restricted to cases perceived as being difficult, and might occur even when dealing with common problems in a stressful environment under time pressure . Further, most 'expert systems' utilized a design in which clinical data entry was achieved through a controlled vocabulary specific to each DSS. This process frequently took > 15 minutes, contributing to infrequent use in a busy clinical environment . These 'expert systems' also provided between 20 and 30 diagnostic possibilities , with detailed explanations, leading to a lengthy DSS consultation process.
In order to significantly affect the occurrence of diagnostic error, it seems reasonable to conclude that DSS advice must therefore be readily available, and sought, during most clinical encounters, even if the perceived need for diagnostic assistance is minor. Ideally, real-time advice for diagnosis can be actively provided by integrating a diagnostic DSS into an existing electronic medical record (EMR), as has been attempted in the past [27, 28]. However, the limited uptake of EMRs capable of recording sufficient narrative clinical detail currently in clinical practice indicates that a stand-alone system may prove much more practical in the medium term . The key characteristic of a successful system would be the ability to deliver reliable diagnostic reminders rapidly following a brief data entry process in most clinical situations. ISABEL (ISABEL Healthcare, UK) is a novel Web-based pediatric diagnostic reminder system that suggests important diagnoses during clinical assessment [30, 31]. The development of the system and its underlying structure have been described in detail previously [32, 33]. The main hypotheses underlying the development of ISABEL were that the provision of diagnostic reminders generated following a brief data entry session in free text would promote user uptake, and lead to improvement in the quality of diagnostic decision making in acute medical settings. The reminders provided (a set of 10 in the first instance) aimed to remind clinicians of important diagnoses that they might have missed in the workup. Data entry is by means of natural language descriptions of the patient's clinical features, including any combination of symptoms, signs and test results. The system's knowledge base consists of natural language text descriptions of > 5000 diseases, in contrast to most 'expert systems' that use complex disease databases [34–36]. The advantages and trade-offs of these differences in system design have been discussed in detail elsewhere . In summary, although the ability to rapidly enter patient features in natural language to derive a short-list of diagnostic suggestions may allow frequent use by clinicians during most patient encounters, variability resulting from the use of natural language for data entry, and the absence of probability ranking, may compromise the accuracy and usefulness of the diagnostic suggestions.
The overall evaluation of the ISABEL system was planned in systematic fashion in a series of consecutive studies .
a) An initial clinical performance evaluation: This would evaluate the feasibility of providing relevant diagnostic suggestions for a range of cases when data is entered in natural language. System accuracy, speed and relevance of suggestions were studied.
b) An assessment of the impact of the system in a quasi-experimental setting: This would examine the effects of diagnostic decision support on subjects using simulated cases.
c) An assessment of the impact of the system in a real life setting: This would examine the effects of diagnostic advice on clinicians in real patients in their natural environment.
In the initial performance evaluation, the ISABEL system formed the unit of intervention, and the quality of its diagnostic suggestions was validated against data drawn from 99 hypothetical cases and 100 real patients. Key findings from cases were entered into the system in free text by one of the developers. The system included the final diagnosis in 95% of the cases . This design was similar to early evaluations of a number of other individual diagnostic DSS [40–42], as well as a large study assessing the performance characteristics of four expert diagnostic systems . Since this step studied ISABEL in isolation, and did not include users uninvolved in the development of the system, it was vital to examine DSS impact on decision making by demonstrating in the subsequent step that the clinician-DSS combination functioned better than either the clinician or the system working in isolation [44, 45]. Evaluation of impact is especially relevant to ISABEL: despite good system performance when tested in isolation, clinicians may not benefit from its advice either due to variability associated with user data entry leading to poor results, or the inability to distinguish between diagnostic suggestions due to the lack of ranking . A previous evaluation of Quick Medical Reference (QMR) assessed a group of clinicians working a set of difficult cases, and suggested that the extent of benefit gained by different users varied with their level of experience .
In this study, we aimed to perform an impact evaluation of ISABEL in a quasi-experimental setting in order to quantify the effects of diagnostic advice on the quality of clinical decisions made by various grades of clinicians during acute assessment, using a mix of easy and difficult simulated cases drawn from all pediatric sub-specialties. Study design was based on an earlier evaluation of the impact of ILIAD and QMR on diagnostic reasoning in a simulated environment . Our key outcome measure focused on appropriateness of decisions during diagnostic workup rather than accuracy in identifying the correct diagnosis. The validity of textual case simulations has previously been demonstrated in medical education exercises , and during the assessment of mock clinical decision making [50, 51].
The simulated field study involved recording subjects' clinical decisions regarding diagnoses, test-ordering and treatment for a set of simulated cases, both before and immediately after DSS consultation. The impact of diagnostic reminders was determined by measuring changes in the quality of decisions made by subjects. In this study, the quality of ISABEL's diagnostic suggestion list per se was not examined. The study was coordinated at Imperial College School of Medicine, St Mary's Hospital, London, UK between February and August 2002. The study was approved by the Local Research Ethics Committee.
A convenience sample consisting of pediatricians of different grades (senior house officers [interns], registrars [residents] and consultants [attending physicians] from different geographical locations across the UK), and final year medical students, was enrolled for the study. All students were drawn from one medical school (Imperial College School of Medicine, London, UK). Clinicians were recruited by invitation from the ISABEL registered user database which consisted of a mixture of regular users as well as pediatricians who had never used the system after registration. After a short explanation of the study procedure, all subjects who consented for the study were included within the sample.
Cases were drawn from a pool of 72 textual case simulations, constructed by one investigator, based on case histories of real children presenting to emergency departments (data collected during earlier evaluation). Each case was limited to between 150 and 200 words, and only described the initial presenting symptoms, clinical signs and basic laboratory test results in separate sections. Since the clinical data were collected from pediatric emergency rooms, the amount of clinical information available at assessment was limited but typical for this setting. Ample negative features were included in order to prevent the reader from picking up positive cues from the text. These cases were then classified into one of 12 different pediatric sub-specialties (e.g. cardiology, respiratory) and to one of 3 case difficulty levels within each specialty (1-unusual, 2-not unusual, and 3-common clinical presentation, with reference to UK general hospital pediatric practice) by the author. This allocation process was duplicated by a pediatric consultant working independently. Both investigators assigned 57 cases to the same sub-specialty and 42 cases to both the same sub-specialty and the same level of difficulty (raw agreement 0.79 and 0.58 respectively). From the 42 cases in which both investigators agreed regarding the allocation of both specialty and level of difficulty, 24 cases were drawn such that a pair of cases per sub-specialty representing two different levels of difficulty (level 1 & 2, 1 & 3 or 2 & 3) was chosen for the final case mix. This process ensured a balanced set of cases representing all sub-specialties and comprising easy as well as difficult cases.
Data collection website
A customized, password protected version of ISABEL was used to collect data during the study. This differed from the main website in that it automatically displayed the study cases to each subject in sequence, assigned each case episode a unique study number, and recorded time data in addition to displaying ten diagnostic suggestions. Three separate text boxes were provided to record subjects' clinical decisions (diagnoses, tests and treatment) pre- and post-DSS consultation. The use of the customized trial website ensured that subjects proceeded from one step to the next without being able to skip steps or revise clinical decisions already submitted.
Training was intended only to familiarize subjects with the trial website. During training, all subjects were assigned unique log-in and passwords, and one sample case as practice material. Practice sessions involving medical students were supervised by one investigator in group sessions of 2–3 subjects each. Pediatricians (being from geographically disparate locations) were not supervised during training, but received detailed instructions regarding the use of the trial website by email. Context-specific help was provided at each step on the website for assistance during the practice session. All subjects completed their assigned practice case, and were recruited for the study.
Each subject was presented with 12 cases such that one of the pair drawn from each sub-specialty was displayed. Cases were presented in random order (in no particular order of sub-specialty). Subjects could terminate their session at any time and return to complete the remainder of cases. If a session was terminated midway through a case, that case was presented again on the subject's return. If the website detected no activity for > 2 hours, the subject was automatically logged off, and the session was continued on their return. All subjects had 3 weeks to complete their assigned 12 cases. Since each case was used more than once, by different subjects, we termed each attempt by a subject at a case as a 'case episode'.
We aimed to assess if the provision of key diagnostic reminders would reduce errors of omission in the simulated environment. For the purposes of this study, a subject was defined to have committed a DEO for a case episode if they failed to include all 'clinically important diagnoses' in the diagnostic workup (rather than failing to include the 'correct diagnosis'). A diagnosis was judged 'clinically important' if an expert panel working the case independently decided that the particular diagnosis had to be included in the workup in order to ensure safe and appropriate clinical decision making, i.e. they would significantly affect patient management and/or course, and failure to do so would be construed clinically inadequate. The expert panel comprised two general pediatricians with > 3 years consultant level experience. 'Clinically important' diagnoses suggested by the panel thus included the 'most likely diagnosis/es' and other key diagnoses; they did not constitute a full differential containing all plausible diagnoses. This outcome variable was quantified by a binary measure (for each case episode, a subject either committed a DEO or not). Errors of omission were defined for tests and treatments in similar fashion.
1. Change in the number of diagnostic errors of omission among subjects.
1. Mean change in subjects' diagnostic, test-ordering and treatment plan quality scores.
2. Change in the number of irrelevant diagnoses contained within the diagnostic workup.
3. Proportion of case episodes in which at least one additional 'important' diagnosis, test or treatment decision was considered by the subject after DSS consultation.
4. Additional time taken for DSS consultation.
Subjects were used as the unit of analysis for the primary outcome measure. For each subject, the total number of DEOs was counted separately for pre- and post-DSS diagnostic workup plans; only subjects who had completed all assigned cases were included in this calculation. Statistically significant changes in DEO count following DDSS consultation and interaction with grade was assessed by two-way mixed-model analysis of variance (grade being between-subjects factor and time being within-subjects factor). Mean number of DEOs was calculated for each subject grade, and DEOs were additionally analyzed according to level of case difficulty. Statistical significance was set at a p value of 0.05.
Subjects were used as the unit of analysis for the change in mean quality scores (the development of quality scores and their validation has been previously described; however, the scores have never been used as an outcome measure prior to this evaluation). In the first step, subjects' quality score (pre- and post-DSS) was calculated for each case episode. For each subject, a mean quality score across all 12 cases was computed. Only case episodes from subjects who completed all 12 assigned cases were used during this calculation. A two-way mixed model ANOVA (grade as between-subjects factor; time as within-subjects factor) was used to examine statistically significant differences in quality scores. This analysis was performed for diagnostic quality scores as well as test ordering and treatment plan scores. Data from a pilot study suggested that data from 64 subjects were needed to demonstrate a mean diagnostic quality score change of 0.03 (standard deviation 0.06, power 80%, level of significance 5%).
Using subjects as the unit of analysis, the mean count of diagnoses (and irrelevant diagnoses) included in the workup was calculated pre- and post-DSS consultation for each subject as an average across all case attempts. Only subjects who attempted all assigned cases were included in this analysis. Using this data, a mean count for diagnoses (and irrelevant diagnoses) was calculated for each subject grade. A two-way mixed model ANOVA was used to assess statistically significant differences in this outcome with respect to grade as well as occasion. Using case episodes as the unit of analysis, the proportion of case episodes in which at least one additional 'important' diagnosis, test or treatment was prompted by ISABEL was determined. The proportion of case episodes in which at least one clinically significant decision was deleted, and at least one inappropriate decision was added, after system consultation, was also computed. All data were analyzed separately for the subjects' grades.
Two further analyses were conducted to enable the interpretation of our results. First, in order to provide a direct comparison of our results with other studies, we used case episodes as the unit of analysis and examined the presence of the 'most likely diagnosis' in the diagnostic workup. The 'most likely diagnosis' was part of the set of 'clinically important' diagnoses provided by the panel, and represented the closest match to a 'correct' diagnosis in our study design. This analysis was conducted separately for each grade. Second, since it was important to verify whether any reduction of omission errors was directly prompted by ISABEL, or simply by subjects re-thinking about the assigned cases, all case episodes in which at least one additional significant diagnosis was added by the user were examined. If the diagnostic suggestion added by the user had been displayed in the DSS list of suggestions, it strongly suggested that the system, rather than subjects' re-thinking, prompted these additions.
Study participants, cases and case episodes*
Grade of subject
Subjects invited to participate
Subjects who attempted at least one case (attempters)
Subjects who attempted at least six cases
Subjects who completed all 12 cases (completers)
Regular DSS users (usage > once/week)
Diagnostic errors of omission
Mean count of diagnostic errors of omission (DEO) pre-ISABEL and post-ISABEL consultation
Grade of subject
DEO pre-ISABEL (SD)
DEO post-ISABEL (SD)
Mean DEO across all subjects (n = 52)*
Mean DEO count analyzed by level of case and subject grade
Mean quality score changes
Mean quality scores for diagnoses broken down by grade of subject
Mean pre-ISABEL score
Mean post-ISABEL score
Mean score change*
Weighted average (all subjects)†
Number of irrelevant diagnoses
Increase in the average number of diagnoses and irrelevant diagnoses before and after DSS advice, broken down by grade
No. of diagnoses
No. of irrelevant diagnoses
Additional diagnoses, tests and treatment decisions
Number of case episodes in which clinically 'important' decisions were prompted by ISABEL consultation
Number of 'important' decisions prompted by ISABEL
No. of case episodes in which at least ONE 'significant' decision was prompted by ISABEL
Total number of individual 'significant' decisions prompted by ISABEL
751 case episodes were used to examine the presence of the 'most likely diagnosis'. Overall, the 'most likely diagnosis/es' were included in the pre-DSS diagnostic workup by subjects in 507/751 (67.5%) case episodes. This increased to 561/751 (74.7%) case episodes after DSS advice. The improvement was fully attributable to positive consultation effects (where the 'most likely diagnosis' was absent pre-DSS but was present post-DSS); no negative consultations were observed. Diagnostic accuracy pre-ISABEL was greatest for consultants (73%) and least for medical students (57%). Medical students gained the most after DSS advice (an absolute increase of 10%). Analysis performed to elucidate whether ISABEL was responsible for the changes seen in the rate of diagnostic error indicated that all additional diagnoses were indeed present in the system's list of diagnostic suggestions.
Time taken to process case simulations broken down by grade of subject
Median time pre-ISABEL
Median time post-ISABEL
5 min 5 sec
5 min 45 sec
5 min 54 sec
8 min 36 sec
3 min 42 sec
6 min 2 sec (IQR: 4:03 – 9:47)
1 min (IQR: 30 sec – 2:04)
We have shown in this study that errors of omission occur frequently during diagnostic workup in an experimental setting, including in cases perceived as being common in routine practice. Such errors seem to occur in most subjects, irrespective of their level of experience. We have also demonstrated that it is possible to influence clinicians' diagnostic workup and reduce errors of omission using a stand-alone diagnostic reminder system. Following DSS consultation, the quality of diagnostic, test-ordering and treatment decisions made by various grades of clinicians improved for a range of cases, such that a clinically important alteration in diagnostic decision-making resulted in 12.5% of all consultations (1 in 8 episodes of system use).
In a previous study assessing the impact of ILIAD and QMR, in which only diagnostically challenging cases were used in an experimental setting, Friedman et al showed that the 'correct diagnosis' was prompted by DSS use in approximately 1 in 16 consultations . Although we used a similar experimental design, we used a mix of easy as well as difficult cases to test the hypothesis that incomplete workup was encountered in diagnostic conundrums as well as routine clinical problems. Previous evaluations of expert systems used the presence of the 'correct' diagnosis as the main outcome. We focused on clinical safety as the key outcome, preferring to use the inclusion of all 'clinically important' diagnoses in the workup as the main variable of interest. In acute settings such as emergency rooms and primary care, where an incomplete and evolving clinical picture results in considerable diagnostic uncertainty at assessment, the ability to generate a focused and 'safe' workup is a more clinically relevant outcome, and one which accurately reflects the nature of decision making in this environment . Consequently, we defined diagnostic errors of omission at assessment as the 'failure to consider all clinically important diagnoses (as judged by an expert panel working the same cases)'. This definition resulted in the 'correct' diagnosis, as well as other significant diagnoses, being included within the 'minimum' workup. Further, changes in test-ordering and treatment decisions were uniquely measured in this study as a more concrete marker of the impact of diagnostic decision support on the patient's clinical management; we were able to demonstrate an improvement in test-ordering in 1 in 10 system consultations, indicating that diagnostic DSS may strongly influence patient management, despite only offering diagnosis-related advice. Finally, the time expended during DSS consultation is an important aspect that has not been fully explored in previous studies. In our study, subjects spent a median of 6 minutes for clinical data entry (including typing in their unaided decisions), and a median of 1 minute to process the advice provided and make changes to their clinical decisions.
The research design employed in this study allowed us to confirm a number of observations previously reported, as well as to generate numerous unique ones. These findings relate to the operational consequences of providing diagnostic assistance in practice. In keeping with other DSS evaluations, different subject grades processed system advice in different ways, depending on their prior knowledge and clinical experience, leading to variable benefit. Since ISABEL merely offered diagnostic suggestions, and allowed the clinician to make the final decisions (acting as the 'learned intermediary') , in some cases, subjects ignored even important advice. In some other cases, they added irrelevant decisions or deleted important decisions after DSS consultation, leading to reduced net positive effect of the DDSS on decision making. For some subjects whose pre-DSS performance was high, a ceiling effect prevailed, and no further improvement could be demonstrated. These findings complement the results of our earlier system performance evaluation which solely focused on system accuracy and not on user interaction with DDSS. One of the main findings from this study was that consultants tended to generate shorter diagnostic workup lists containing the 'most likely' diagnoses, with a predilection to omit other 'important' diagnoses that might account for the patient's clinical features, resulting in a high incidence of DEO. Medical students generated long diagnostic workup lists, but missed many key diagnoses leading to a high DEO rate. Interestingly, all subject grades gained from the use of ISABEL in terms of a reduction in the number of DEO, although to varying degrees. Despite more DEOs occurring in cases considered to be routine in practice than in rare and difficult ones in the pre-DSS consultation phase, ISABEL advice seemed to mainly improve decision making for difficult cases, with a smaller effect on easy cases. The impact of DSS advice showed a decreasing level of beneficial effect from diagnostic to test-ordering to treatment decisions. Finally, although the time taken to process cases without DSS advice in this study compared favorably with the Friedman evaluation of QMR and ILIAD (6 min vs. 8 min), the time taken to generate a revised workup with DSS assistance was dramatically shorter (1 min vs. 22 min).
We propose a number of explanations for our findings. There is sufficient evidence to suggest that clinicians with more clinical experience resort to established pattern-recognition techniques and the use of heuristics while making diagnostic decisions . While these shortcuts enable quick decision making in practice, and work successfully on most occasions, they involve a number of cognitive biases such as 'premature closure' and 'confirmation bias' that may lead to incomplete assessment on some occasions. On the other hand, medical students may not have developed adequate pattern-recognition techniques or acquired sufficient knowledge of heuristics to make sound diagnostic decisions. It may well be that grades at an intermediate level are able to process cases in an acute setting with a greater emphasis on clinical safety. This explanation may also account for the finding that subjects failed to include 'important' diagnoses during the assessment of easy cases. Recognition that a case was unusual may trigger a departure from the use of established pattern-recognition techniques and clinical shortcuts to a more considered cognitive assessment, leading to fewer DEO in these cases. We have shown that it is possible to reduce DEOs by the use of diagnostic reminders, including in easy cases, although subjects appeared to be more willing to revise their decisions for difficult cases on the basis of ISABEL suggestions. It is also possible that some subjects ignored relevant advice because the system's explanatory capacity was inadequate and did not allow subjects to sufficiently discriminate between the suggestions offered. User variability in summarizing cases may also explain why variable benefits were derived from ISABEL usage – subjects may have obtained different results depending on how they abstracted and entered clinical features. This user variability during clinical data entry has been demonstrated even with use of a controlled vocabulary in QMR . We observed marked differences between users' search terms for the same textual case; however, diagnostic suggestions did not seem to vary noticeably. This observation could be partially explained by the enormous diversity associated with various natural language disease descriptions contained within the ISABEL database, as well as by the system's use of a thesaurus that converts medical slang into recognized medical terms.
The diminishing level of impact from diagnostic to test-ordering to treatment decisions may be a result of system design – ISABEL does not explicitly state which tests and treatments to perform for each of its diagnostic suggestions. This advice is usually embedded within the textual description of the disease provided to the user. Study and system design may both account for the differences in time taken to process the cases. In previous evaluations, subjects processed cases without using the DSS in the first instance; in a subsequent step, they used the DSS to enter clinical data, record their clinical decisions, and processed system advice to generate a second diagnostic hypothesis list. In our study, subjects processed the case and recorded their own clinical decisions while using the DSS for clinical data entry. The second stage of the procedure only involved processing ISABEL advice and modifying previous clinical decisions. As such, direct comparison between the studies can be made only by the total time involved per case (30 min vs. 7 min). This difference could be explained by features in the system's design that resulted in shorter times to enter clinical data and to easily process the advice provided.
The findings from this study have implications specifically for ISABEL as well as other diagnostic DSS design, evaluation and implementation. It is well recognized that the dynamic interaction between user and DSS plays a major role in their acceptance by physicians . We feel that adoption of the ISABEL system during clinical assessment in real time is possible even with current computer infrastructure, providing an opportunity for reduction in DEO. Its integration into an EMR would allow further control on the quality of the clinical input data as well as provision of active decision support with minimum extra effort. Such an ISABEL interface has currently been developed and tested with four commercial EMRs ; this integration also facilitates iterative use of the system during the evolution of a patient's condition, leading to increasingly specific diagnostic advice. The reminder system model aims to enable clinicians to generate 'safe' diagnostic workups in busy environments at high risk for diagnostic errors. This model has been successfully used to alter physician behavior by reducing errors of omission in preventive care . It is clear from recent studies that diagnostic errors occur in the emergency room for a number of reasons. Cognitive biases, of which 'premature closure' and faulty context generation are key examples, contribute significantly . Use of a reminder system may minimize the impact of some of these cognitive biases. When combined with cognitive forcing strategies during decision making, DDSS may act as 'safety nets' to reduce the incidence of omission errors in practice . Reminders to perform important tests and treatment steps may also allow a greater impact on patient outcome . A Web-based system model in our study allowed users from disparate parts of the country to participate in this study without need for additional infrastructure or financial resources, an implementation model that would minimize the cost associated with deployment in practice. Finally, the role of DDSS in medical education and training needs formal evaluation. In our study, medical students gained significantly from the advice provided, suggesting that use of DDSS during specific diagnostic tasks (e.g. problem-based case exercises) might be a valuable adjunct to current educational strategies. Familiarity with DDSS will also predispose to greater adoption of computerized aids during future clinical decision making.
The limitations of this study stem mainly from its experimental design. The repeated measures design raises the possibility that some of the beneficial effects seen in the study are a result of subjects 'rethinking' the case, or the consequence of a reflective process . Consequently, ISABEL's effects in practice could be related to the extra time taken by users in processing cases. We believe that any such effects are likely to be minimal since subjects did not actually process the cases twice during the study – a summary of the clinical features was generated by subjects when the case was displayed for the first time, and subjects could not review the cases while processing ISABEL suggestions in the next step. Subjects also spent negligible time between their first assessment of the cases and processing the diagnostic suggestions from the DSS. The repeated measures design provided the power to detect differences between users with minimal resources; a randomized design using study and control groups of subjects would have necessitated the involvement of over 200 subjects. The cases used in our study contained only basic clinical data gained at the time of acute assessment, and may have proved too concise or easy to process. However, this seems unlikely since subjects only took an average of 8 min to process even diagnostic conundrums prior to DSS use when 'expert systems' were tested. Our cases pertained to emergency assessments, making it difficult to generalize the results to other ambulatory settings. The ability to extract clinical features from textual cases may not accurately simulate a real patient encounter where missed data or 'red herrings' are quite common. The inherent complexity involved in patient assessment and summarizing clinical findings in words may lead to poorer performance of the ISABEL system in real life, since its diagnostic output depends on the quality of user input. As a corollary, some of our encouraging results may be explained by our choice of subjects: a few were already familiar with summarizing clinical features into the DSS. Subjects were not supervised during their case exercises since they may have performed differently under scrutiny, raising the prospect of a Hawthorne effect . The use of a structured website to explicitly record clinical decisions may have invoked the check-list effect, as illustrated in the Leeds abdominal pain system study . The check list effect might also be invoked during the process of summarizing clinical features for ISABEL input; this may have worked in conjunction with 'rethinking' to promote better decision making pre-ISABEL. We also measured decision making at a single point in time, making it difficult to assess the effects of iterative usage of the DSS on the same patient. Finally, our definition of diagnostic error aimed to identify inadequate diagnostic workup at initial assessment that might result in a poor patient outcome. We recognize the absence of an evidence-based link between omission errors and diagnostic adverse events in practice, although according to the Schiff model , it seems logical to assume that avoiding process errors will prevent actual errors at least in some instances. In the simulated setting, it was not possible to test whether inadequate diagnostic workup would directly lead to a diagnostic error and cause patient harm. Our planned clinical impact assessment in real life would help clarify many of the questions raised during this experimental study.
This experimental study demonstrates that diagnostic omission errors are common during the assessment of easy as well as difficult cases. The provision of patient- and context-specific diagnostic reminders has the potential to reduce these errors across all subject grades. Our study suggests a promising role for the use of future reminder-based DSS in the reduction of diagnostic error. An impact evaluation, utilizing a naturalistic design and conducted in real life clinical practice, is underway to verify the conclusions derived from this simulation.
The authors would like to thank Tina Sajjanhar for her help in allocating cases to specialties and levels of difficulty; Helen Fisher for data analysis; and Jason Maude of the ISABEL Medical Charity for help rendered during the design of the study.
Financial support: This study was supported by a research grant from the National Health Service (NHS) Research & Development Unit, London. The sponsor did not influence the study design; the collection, analysis, and interpretation of data; the writing of the manuscript; and the decision to submit the manuscript for publication.
- Institute of Medicine: To Err is Human; Building a Safer Health System. 1999, Washington DC: National Academy PressGoogle Scholar
- Fortescue EB, Kaushal R, Landrigan CP, McKenna KJ, Clapp MD, Federico F, Goldmann DA, Bates DW: Prioritizing strategies for preventing medication errors and adverse drug events in pediatric inpatients. Pediatrics. 2003, 111 (4 Pt 1): 722-9. 10.1542/peds.111.4.722.View ArticlePubMedGoogle Scholar
- Kaushal R, Bates DW: Information technology and medication safety: what is the benefit?. Qual Saf Health Care. 2002, 11 (3): 261-5. 10.1136/qhc.11.3.261.View ArticlePubMedPubMed CentralGoogle Scholar
- Leape L, Brennan TA, Laird N, Lawthers AG, Localio AR, Barnes BA, Hebert L, Newhouse JP, Weiler PC, Hiatt H: The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med. 1991, 324: 377-84.View ArticlePubMedGoogle Scholar
- Wilson RM, Harrison BT, Gibberd RW, Harrison JD: An analysis of the causes of adverse events from the quality in Australian health care study. Med J Aust. 1999, 170: 4115-Google Scholar
- Davis P, Lay-Yee R, Briant R, Ali W, Scott A, Schug S: Adverse events in New Zealand public hospitals II: preventability and clinical context. N Z Med J. 116 (1183): U624-2003 Oct 10Google Scholar
- Bartlett EE: Physicians' cognitive errors and their liability consequences. J Healthcare Risk Manage Fall. 1998, 62-9.Google Scholar
- Croskerry P: The importance of cognitive errors in diagnosis and strategies to minimize them. Acad Med. 2003, 78 (8): 775-80. 10.1097/00001888-200308000-00003.View ArticlePubMedGoogle Scholar
- Green S, Lee R, Moss J: Problems in general practice. Delays in diagnosis. 1998, Manchester, UK: Medical Defence UnionGoogle Scholar
- Thomas EJ, Studdert DM, Burstin HR, Orav EJ, Zeena T, Williams EJ, Howard KM, Weiler PC, Brennan TA: Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care. 2000, 38: 261-71. 10.1097/00005650-200003000-00003.View ArticlePubMedGoogle Scholar
- Rothschild JM, Landrigan CP, Cronin JW, Kaushal R, Lockley SW, Burdick E, Stone PH, Lilly CM, Katz JT, Czeisler CA, Bates DW: The Critical Care Safety Study: The incidence and nature of adverse events and serious medical errors in intensive care. Crit Care Med. 2005, 33 (8): 1694-700. 10.1097/01.CCM.0000171609.91035.BD.View ArticlePubMedGoogle Scholar
- Burroughs TE, Waterman AD, Gallagher TH, Waterman B, Adams D, Jeffe DB, Dunagan WC, Garbutt J, Cohen MM, Cira J, Inguanzo J, Fraser VJ: Patient concerns about medical errors in emergency departments. Acad Emerg Med. 2005, 12 (1): 57-64. 10.1197/j.aem.2004.08.052.View ArticlePubMedGoogle Scholar
- Graber M, Gordon R, Franklin N: Reducing diagnostic errors in medicine: what's the goal?. Acad Med. 2002, 77 (10): 981-92. 10.1097/00001888-200210000-00009.View ArticlePubMedGoogle Scholar
- Barnett GO, Cimino JJ, Hupp JA, Hoffer EP: DXplain: an evolving diagnostic decision-support system. JAMA. 1987, 258: 67-74. 10.1001/jama.258.1.67.View ArticlePubMedGoogle Scholar
- Miller R, Masarie FE, Myers JD: Quick medical reference (QMR) for diagnostic assistance. MD Comput. 1986, 3 (5): 34-48.PubMedGoogle Scholar
- Warner HR: Iliad: moving medical decision-making into new frontiers. Methods Inf Med. 1989, 28 (4): 370-2.PubMedGoogle Scholar
- Barness LA, Tunnessen WW, Worley WE, Simmons TL, Ringe TB: Computer-assisted diagnosis in pediatrics. Am J Dis Child. 1974, 127 (6): 852-8.PubMedGoogle Scholar
- Wilson DH, Wilson PD, Walmsley RG, Horrocks JC, De Dombal FT: Diagnosis of acute abdominal pain in the accident and emergency department. Br J Surg. 1977, 64 (4): 250-4.View ArticlePubMedGoogle Scholar
- Miller RA: Medical diagnostic decision support systems – past, present, and future: a threaded bibliography and brief commentary. J Am Med Inform Assoc. 1994, 1 (1): 8-27.View ArticlePubMedPubMed CentralGoogle Scholar
- Bankowitz RA, McNeil MA, Challinor SM, Miller RA: Effect of a computer-assisted general medicine diagnostic consultation service on housestaff diagnostic strategy. Methods Inf Med. 1989, 28 (4): 352-6.PubMedGoogle Scholar
- Tannenbaum SJ: Knowing and acting in medical practice: the epistemiological politics of outcomes research. J Health Polit. 1994, 19: 27-44.View ArticleGoogle Scholar
- Friedman CP, Gatti GG, Franz TM, Murphy GC, Wolf FM, Heckerling PS, Fine PL, Miller TM, Elstein AS: Do physicians know when their diagnoses are correct? Implications for decision support and error reduction. J Gen Intern Med. 2005, 20 (4): 334-9. 10.1111/j.1525-1497.2005.30145.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Wyatt JC: Clinical Knowledge and Practice in the Information Age: a handbook for health professionals. 2001, London: The Royal Society of Medicine PressGoogle Scholar
- Smith R: What clinical information do doctors need?. BMJ. 1996, 313: 1062-68.View ArticlePubMedPubMed CentralGoogle Scholar
- Graber MA, VanScoy D: How well does decision support software perform in the emergency department?. Emerg Med J. 2003, 20 (5): 426-8. 10.1136/emj.20.5.426.View ArticlePubMedPubMed CentralGoogle Scholar
- Lemaire JB, Schaefer JP, Martin LA, Faris P, Ainslie MD, Hull RD: Effectiveness of the Quick Medical Reference as a diagnostic tool. CMAJ. 161 (6): 725-8. 1999 Sep 21Google Scholar
- Welford CR: A comprehensive computerized patient record with automated linkage to QMR. Proc Annu Symp Comput Appl Med Care. 1994, 814-8.Google Scholar
- Elhanan G, Socratous SA, Cimino JJ: Integrating DXplain into a clinical information system using the World Wide Web. Proc AMIA Annu Fall Symp. 1996, 348-52.Google Scholar
- Berner ES, Detmer DE, Simborg D: Will the wave finally break? A brief view of the adoption of electronic medical records in the United States. J Am Med Inform Assoc. 2005, 12 (1): 3-7. 10.1197/jamia.M1664. Epub 2004 Oct 18View ArticlePubMedPubMed CentralGoogle Scholar
- Greenough A: Help from ISABEL for pediatric diagnoses. Lancet. 360 (9341): 1259-10.1016/S0140-6736(02)11269-4. 2002 Oct 19Google Scholar
- Thomas NJ: ISABEL. Critical Care. 2002, 7 (1): 99-100.PubMed CentralGoogle Scholar
- Ramnarayan P, Britto J: Pediatric clinical decision support systems. Arch Dis Child. 2002, 87 (5): 361-2. 10.1136/adc.87.5.361.View ArticlePubMedPubMed CentralGoogle Scholar
- Fisher H, Tomlinson A, Ramnarayan P, Britto J: ISABEL: support with clinical decision making. Paediatr Nurs. 2003, 15 (7): 34-5.View ArticlePubMedGoogle Scholar
- Giuse DA, Giuse NB, Miller RA: A tool for the computer-assisted creation of QMR medical knowledge base disease profiles. Proc Annu Symp Comput Appl Med Care. 1991, 978-9.Google Scholar
- Johnson KB, Feldman MJ: Medical informatics and pediatrics. Decision-support systems. Arch Pediatr Adolesc Med. 1995, 149 (12): 1371-80.View ArticlePubMedGoogle Scholar
- Giuse DA, Giuse NB, Miller RA: Evaluation of long-term maintenance of a large medical knowledge base. J Am Med Inform Assoc. 1995, 2 (5): 297-306.View ArticlePubMedPubMed CentralGoogle Scholar
- Ramnarayan P, Tomlinson A, Kulkarni G, Rao A, Britto J: A Novel Diagnostic Aid (ISABEL): Development and Preliminary Evaluation of Clinical Performance. Medinfo. 2004, 1091-5.Google Scholar
- Wyatt J: Quantitative evaluation of clinical software, exemplified by decision support systems. Int J Med Inform. 1997, 47 (3): 165-73. 10.1016/S1386-5056(97)00100-7.View ArticlePubMedGoogle Scholar
- Ramnarayan P, Tomlinson A, Rao A, Coren M, Winrow A, Britto J: ISABEL: a web-based differential diagnostic aid for pediatrics: results from an initial performance evaluation. Arch Dis Child. 2003, 88: 408-13. 10.1136/adc.88.5.408.View ArticlePubMedPubMed CentralGoogle Scholar
- Feldman MJ, Barnett GO: An approach to evaluating the accuracy of DXplain. Comput Methods Programs Biomed. 1991, 35 (4): 261-6. 10.1016/0169-2607(91)90004-D.View ArticlePubMedGoogle Scholar
- Middleton B, Shwe MA, Heckerman DE, Henrion M, Horvitz EJ, Lehmann HP, Cooper GF: Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. II. Evaluation of diagnostic performance. Methods Inf Med. 1991, 30 (4): 256-67.PubMedGoogle Scholar
- Nelson SJ, Blois MS, Tuttle MS, Erlbaum M, Harrison P, Kim H, Winkelmann B, Yamashita D: Evaluating Reconsider: a computer program for diagnostic prompting. J Med Syst. 1985, 9: 379-388. 10.1007/BF00992575.View ArticlePubMedGoogle Scholar
- Berner ES, Webster GD, Shugerman AA, Jackson JR, Algina J, Baker AL, Ball EV, Cobbs CG, Dennis VW, Frenkel EP: Performance of four computer-based diagnostic systems. N Engl J Med. 330 (25): 1792-6. 10.1056/NEJM199406233302506. 1994 Jun 23Google Scholar
- Miller RA: Evaluating evaluations of medical diagnostic systems. J Am Med Inform Assoc. 1996, 3 (6): 429-31.View ArticlePubMedPubMed CentralGoogle Scholar
- Miller RA, Masarie FE: The demise of the "Greek Oracle" model for medical diagnostic systems. Methods Inf Med. 1990, 29: 1-2.PubMedGoogle Scholar
- Maisiak RS, Berner ES: Comparison of measures to assess change in diagnostic performance due to a decision support system. Proc AMIA Symp. 2000, 532-6.Google Scholar
- Berner ES, Maisiak RS: Influence of case and physician characteristics on perceptions of decision support systems. J Am Med Inform Assoc. 1999, 6 (5): 428-34.View ArticlePubMedPubMed CentralGoogle Scholar
- Friedman CP, Elstein AS, Wolf FM, Murphy GC, Franz TM, Heckerling PS, Fine PL, Miller TM, Abraham V: Enhancement of clinicians' diagnostic reasoning by computer-based consultation: a multisite study of 2 systems. JAMA. 282 (19): 1851-6. 10.1001/jama.282.19.1851. 1999 Nov 17Google Scholar
- Issenberg SB, McGaghie WC, Hart IR, Mayer JW, Felner JM, Petrusa ER, Waugh RA, Brown DD, Safford RR, Gessner IH, Gordon DL, Ewy GA: Simulation technology for health care professional skills training and assessment. JAMA. 282 (9): 861-6. 10.1001/jama.282.9.861. 1999 Sep 1Google Scholar
- Clauser BE, Margolis MJ, Swanson DB: An examination of the contribution of computer-based case simulations to the USMLE step 3 examination. Acad Med. 2002, 77 (10 Suppl): S80-2.View ArticlePubMedGoogle Scholar
- Dillon GF, Clyman SG, Clauser BE, Margolis MJ: The introduction of computer-based simulations into the Unites States medical licensing examination. Acad Med. 2002, 77 (10 Suppl): S94-6.View ArticlePubMedGoogle Scholar
- Ramnarayan P, Kapoor RR, Coren M, Nanduri V, Tomlinson AL, Taylor PM, Wyatt JC, Britto JF: Measuring the impact of diagnostic decision support on the quality of clinical decision-making: development of a reliable and valid composite score. J Am Med Inform Assoc. 2003, 10 (6): 563-72. 10.1197/jamia.M1338.View ArticlePubMedPubMed CentralGoogle Scholar
- Gordon D Schiff, Seijeoung Kim, Richard Abrams, Karen Cosby, Bruce Lambert, Arthur S Elstein, Scott Hasler, Nela Krosnjar, Richard Odwazny, Mary F Wisniewski, Robert A McNutt: Diagnostic diagnosis errors: lessons from a multi-institutional collaborative project. Accessed 10 May 2005, [http://www.ahrq.gov/downloads/pub/advances/vol2/Schiff.pdf]
- Brahams D, Wyatt J: Decision-aids and the law. Lancet. 1989, 2: 632-4. 10.1016/S0140-6736(89)90765-4.View ArticlePubMedGoogle Scholar
- Croskerry P: Achieving quality in clinical decision making: cognitive strategies and detection of bias. Acad Emerg Med. 2002, 9 (11): 1184-204. 10.1197/aemj.9.11.1184.View ArticlePubMedGoogle Scholar
- Bankowitz RA, Blumenfeld BH, Guis Bettinsoli N: User variability in abstracting and entering printed case histories with Quick Medical Reference (QMR). Proceedings of the Eleventh Annual Symposium on Computer Applications in Medical Care. 1987, New York: IEEE Computer Society Press, 68-73.Google Scholar
- Kassirer JP: A report card on computer-assisted diagnosis – the grade: C. N Engl J Med. 330 (25): 1824-5. 10.1056/NEJM199406233302512. 1994 Jun 23Google Scholar
- Dexter PR, Perkins S, Overhage JM, Maharry K, Kohler RB, McDonald CJ: A computerized reminder to increase the use of preventive care for hospitalized patients. N Engl J Med. 2001, 345: 965-970. 10.1056/NEJMsa010181.View ArticlePubMedGoogle Scholar
- Graber ML, Franklin N, Gordon R: Diagnostic error in internal medicine. Arch Intern Med. 2005, 165 (13): 1493-9. 10.1001/archinte.165.13.1493.View ArticlePubMedGoogle Scholar
- Croskerry P: Cognitive forcing strategies in clinical decision making. Ann Emerg Med. 2003, 41 (1): 110-20. 10.1067/mem.2003.22.View ArticlePubMedGoogle Scholar
- Berner ES: Diagnostic decision support systems: how to determine the gold standard?. J Am Med Inform Assoc. 2003, 10 (6): 608-10. 10.1197//jamia.M1416.View ArticlePubMedPubMed CentralGoogle Scholar
- Mamede S, Schmidt HG: The structure of reflective practice in medicine. Med Educ. 2004, 38 (12): 1302-8. 10.1111/j.1365-2929.2004.01917.x.View ArticlePubMedGoogle Scholar
- Randolph AG, Haynes RB, Wyatt JC, Cook DJ, Guyatt GH: Users' Guides to the Medical Literature: XVIII. How to use an article evaluating the clinical impact of a computer-based clinical decision support system. JAMA. 282 (1): 67-74. 10.1001/jama.282.1.67. 1999 Jul 7Google Scholar
- Adams ID, Chan M, Clifford PC, Cooke WM, Dallos V, de Dombal FT, Edwards MH, Hancock DM, Hewett DJ, McIntyre N: Computer aided diagnosis of acute abdominal pain: a multicentre study. BMJ. 1986, 293: 800-4.View ArticlePubMedPubMed CentralGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/6/22/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.