The results suggest that this questionnaire may provide valid and reliable information about how an implemented EMR system is utilized on an overall level in clinical practice, and how well the system supports clinical tasks.
The tasks-oriented questions are relevant for clinical work, but some are difficult to answer
During development, the tasks have been based on observations of clinical activity, and further refined to suit their purpose as a common denominator for assessments of various EMR systems. In the interviews, the tasks were recognized and correctly interpreted (figure 2) by a wide range of physicians. However, some of the task-oriented questions about EMR use were found difficult to answer, particularly for the higher-level tasks. Four themes appearing in the interviews provided reasons for these problems. First, the respondents were confused when asked about use of EMR for tasks for which no explicit functionality was offered (table 3; theme 3), despite the presence of relevant 'escape' response choices. This confusion may partly explain the contradictory responses in the national survey, where a minor proportion of respondents reported use of the EMR system for tasks it did not explicitly support (tasks 6 and 7), and the low reliability of three questions about EMR use in the test-retest study (tasks 7, 9 and 13). It may also explain the few missing responses in the local study, where unsupported tasks were omitted. As a second problem in describing EMR use, distinguishing EMR from other software or media appeared as a problem in the interviews (theme 4). This problem may explain the many missing responses in parts of the national study (table 4). The reduction of missing responses in the local study suggests that just considering EMR use (and not use of other software) is easier for the respondent. However, the problem will remain for respondents who are using other software than the EMR during clinical work, making reviews of all software available to the physicians necessary. As a third problem, questions about tasks which were not completely supported by the EMR system were found hard to answer, despite the fact that the wording of the questions only implied a supportive role. This problem was in particular attributed to general tasks. However, the test-retest reliability was relatively high in these questions, suggesting a limited negative effect. Fourth and final, distinguishing other employee's use of the system from one's own appeared as a problem in the interviews (theme 7) in tasks 5 and 15. Regarding task 5 [Enter daily notes], the explanation was confusion about whose use of the EMR should be stated, the physician's or the transcriptionist's. This problem is probably amendable by revising the instructions to the respondent in the questionnaire.
In addition to providing explanations to the findings of the closed questions, the results from the open-ended questions addressed a number of themes on their own. First, wording problems (table 3, theme 2) were expressed particularly for tasks 16, 4 and 21. However, the respondents' interpretations of these tasks (figure 1) were all concordant with and covering essential parts of the task definition. Another important theme involved functionality missed by the respondent (table 3, theme 6), i.e. that the questionnaire did not allow them to express what functionality they were missing in the EMR system. This in particular made it difficult to answer the questions about user satisfaction, as the respondent had problems deciding whether to provide answers based on the functionality actually available in the EMR system, or on the functionality that should have been in the system. The problem is closely related to the problems regarding EMR only supporting parts of a given defined task (table 3, theme 8).
The tasks are relevant for EMR systems
Moderately high correlations were consistently found between a majority of task-oriented questions and overall questions on EMR use, task performance and user satisfaction. The correlations to self-reported overall EMR use suggest that the tasks are regarded essential to EMR systems as such, and the correlations to work performance suggest that the tasks are regarded important to clinical work. The correlations to user satisfaction agree with the results of both Sittig et al  and Lee et al , who found significant correlations between user satisfaction and questions about how easily the work was done. In combination, this means that high reported EMR use for individual tasks equals high reported use of the EMR on the whole, and that improved performance of individual tasks equals improved overall work performance and high satisfaction with the system as a whole. Although not proving the validity of each task, it is highly suggestive. Furthermore, the correlations were limited to tasks for which clear functionality existed in the EMR systems. For the uncorrelated tasks, further clarification must await completion of the functionality of current EMR systems.
This way of correlating a set of lower-level task-oriented questions to higher-level questions is commonly used as criterion validation . However, higher-level questions regarding EMR use are difficult to answer, as physicians' work consists of a complex mix of tasks that are suited for computer support and tasks that are not. A more direct form of criterion validation could have been achieved by studying system audit trails . Such trails are readily available, but they must be validated themselves, and they cannot be more detailed than the structure of the EMR system itself. In Norway, the EMR systems are document-based in structure. This limits the interpretation of such trails, particularly when considering information-seeking behavior.
The questionnaire produces interpretable results
The demonstration studies provided readily interpretable results. In the national study, the physicians generally reported a much lower frequency of EMR use than what was expected by the functionality implemented in each hospital. In the local study, the physicians reported a very high frequency of EMR use, mainly for tasks related to retrieval of patient data . In this study, the physicians generally had little choice of information sources, as the paper-based medical records were obliterated in this hospital. The use of the EMR system for other tasks was however much lower. The results from both the national and the local study indicate that the physicians are able to report overall patterns in their use of EMR that is not in line with the implicit expectations signalled by this questionnaire. These results should not be too surprising. The physicians' traditional autonomous position may allow them to withstand instructions from the hospital administration, e.g. regarding ordering of clinical biochemical investigations . Also, in most hospitals having EMR systems, the physicians may freely choose source of patient data. This is due to the fact that both the paper-based and electronic medical record generally are updated concurrently , and they are only two of many information sources available in clinical practice (e.g. asking the patient, calling the primary care physician, etc.).
Compared to the 400–600 tasks commonly found in full task inventories , the number of tasks in the questionnaire is moderate (24). The high response rates suggest that the number of questions is manageable to the respondents. Compared to that of similar questionnaires [4, 21], the task list provides the evaluator with more details about areas for improvement, and it is not designed with one particular EMR system in mind . In addition, more emphasis is placed on clinical use of the EMR system, since the tasks are limited to information-related instead of both practical and information-related tasks , and to clinical instead of both clinical and academic work . On the other hand, questionnaires describing self-reported usage patterns have previously been criticized for lack of precision and accountability [25, 26]. However, the critics often seem to actually consider poorly validated questionnaires or too optimistic interpretations of them , rather than the very principle of self-reporting. When interpreting the results from a survey describing self-reported work patterns, the inherent limitations of self-reporting must be taken into account. Respondents remember recent and extraordinary events much more easily than distant or everyday events, suggesting in our case an over-estimation by those who use the EMR infrequently. Also, in even a systematically validated questionnaire, a considerable degree of bias should be expected towards answers that the respondents believe are expected from them. However, when the responses both fit with the structural premises (i.e. the marked EMR use in the local study, where the paper-based medical record was missing), and defy the implicit expectations (i.e. the lack of EMR use in the national study), the degree of bias seem to be manageable.
Reliability and scaling
The test-retest reliability study generally showed high kappa values both in the section about EMR use and in that of task performance, in spite of some tasks performing poorly in either section. The poorly performing tasks in the EMR use section addressed functionality that was available to few respondents, while those performing excellently addressed functionality supported by all EMR systems. This means that changes demonstrated for well supported tasks are more likely to reflect real changes in the underlying processes than they are likely to happen by chance. On the one hand, small differences should be interpreted with caution when using the questionnaire, e.g. when significant differences are found in rank values but not in median response values. On the other hand, the evaluator should be careful not to disregard non-significant differences in small samples in the tasks having reliability less than 0.6, as the most likely effect of reliability issues are attenuation of real differences .
In the study of the frequency scale (appearing in the questionnaire section about EMR use), the order of the response labels coincide with that of the respondent's visual analogue scale (VAS) markings. In addition, the confidence intervals of the means are clearly separated in this relatively small sample. This suggests that response labels are considered separate steps on an ordinal scale by the respondent. However, the mean VAS values do not increment linearly, but follows a symmetric s-shaped curve, in which the largest increments appear at the middle part of the scale. This suggests that differences in frequency of EMR use might be considered slightly larger when involving or spanning the central label than when involving the labels at each end of the scale. In sum, the scale is ordinal but not linear, making non-parametric methods the best choice for statistical analysis.
Comparing development and evaluation of this questionnaire to that of other questionnaire
When developing questionnaires, existing literature[22, 29] and expert groups[30, 31] are commonly used to produce the initial items. For our questionnaire, the literature search was mostly unfruitful, and we had to rely on expert groups and observational work. A common way of structuring the initial collection of items is by identifying latent (and possibly unrelated) variables by performing exploratory factor analysis. For our questionnaire, no factor analysis has been performed. In the national demonstration study, it was due to the considerable differences in implemented functionality between the various EMR systems. In the local demonstration study, it was due to the low sample size relative to the number of questions, i.e. below 10:1 . Although consistent patterns of use (e.g. "the notes reader", "the super-user", "the lab test aficionado", etc.) might be identified by factor analysis, it is unlikely that completely unrelated variables would be extracted from a set of work tasks all designed for the same profession. Work tasks found irrelevant by the physicians could have been identified by analyses of internal consistency among the task-oriented questions, e.g. Crohnbach's alpha. However, such investigations should ask about the work tasks per se, not about tasks for which the EMR system is used, rendering our demonstration studies of little value in this respect. Instead of performing another survey, we chose to explore the tasks as well as the task-oriented questions in a structured interview study. This way, we had an opportunity of explaining why some of the tasks were performing better than the others in the demonstration studies.
When evaluating questionnaires, criterion and content validation is frequently used[29, 33]. As the list of tasks in our questionnaire is rather heterogeneous and covers a considerable field of clinical activity, a single global criterion is hard to find. Instead, we used either criteria explaining parts of the task list (e.g. the tasks regarding information retrieval) or indirect criteria based on well-documented relations (e.g. overall user satisfaction vs. task performance).
Limitations of this study
The questionnaire described in this study applies to physicians only, missing the contribution of other types of health personnel. Further, the list of tasks does not cover communication or planning, suggesting that the list could be augmented in future versions of the questionnaire. Finally, three different revisions of the questionnaire appear in this paper, which might appear confusing. The revisions are however incremental, and should be considered consequences of lessons learned during the demonstration studies.
Application of the questionnaire
The questionnaire described here may be used as an important part of an EMR system evaluation. Instead of a simple summed score, the questionnaire's task list provides a framework by which EMR systems may be described and compared in an informative way. Since the questionnaire does not provide reasons or hypotheses for the results it produces, surveys involving it should always be accompanied by a qualitative study. The combination of methods will, however, provide more than the sum of its parts. Qualitative studies like in-depth interviews may be probing deeper when the results of the preceding survey are presented to the informant, and observational studies may focus on phenomena explaining the survey results. Conversely, the interpretation of a qualitative study may be aided by the results of a following quantitative study, as it provides a way of weighting the proposed hypotheses.