Evidence-informed decision-making is an essential ingredient of any strategy that aims to deal with the complex challenges posed by population changes, limited resources, advancements in technology, and changing public expectations. High quality information systems are a foundation on which to build evidence to inform decisions. The availability of standardized, representative, comprehensive, reliable and valid data is a precondition to formulating evidence to respond to those challenges. Therefore, an essential step in introducing large-scale data systems aimed at providing such evidence is an evaluation of the quality of those data [1–3].
There are many potential threats to the quality of data in any large health information system. Unless the psychometric properties of the data have been evaluated systematically, one cannot have confidence the information on which decisions are made is inherently reliable or valid, even if the system is implemented and maintained in optimal conditions. Such information systems should perform not only in ideal conditions, but also when faced with the rigors of day-to-day use. Examples of factors that can undermine the quality of data in “real-world” situations include: poor training and lack of on-going education; lack of staff expertise; inadequate “buy-in” by staff; systematic biases in reporting due to financial incentives or avoidance of negative consequences of unfavorable findings; temporary coding problems with the introduction of new systems; declining attention and underfunding; lack of feedback to users; and poor data collection or coding strategies.
The Resident Assessment Instrument 2.0 and Continuing Care Reporting System (CCRS)
The Resident Assessment Instrument 2.0 (RAI 2.0) is a comprehensive assessment system that is at different stages of implementation in two care settings in eight Canadian provinces/territories [4]: long term care facilities (i.e., nursing homes that tend to serve a long-stay, medically stable population with substantial impairments in cognition or physical function) and complex continuing care hospitals/units (post-acute hospital settings that serve medically unstable persons with complex health conditions and functional impairments with stays typically lasting less than 90 days).
The instrument was originally developed by United States (US) researchers [5] after passage of a US law aimed at improving the quality of nursing home care [6]. Since then the system has been maintained and improved by interRAI (http://www.interrai.org), a 32-country network of clinicians and researchers focused on the implementation, application and continuing refinement of a suite of compatible instruments including the RAI 2.0 [7, 8]. interRAI’s goal is to develop these assessment systems to comprise an integrated health information system linking multiple sectors of health and social services for the elderly and other vulnerable populations [9, 10].
The RAI 2.0 is composed of three main components. First, the Minimum Data Set (MDS) data collection form, which includes about 440 items covering domains such as cognition, communication, mood and behaviour, psychosocial well-being, physical functioning, continence, health conditions, nutrition, activities, medication, treatments, procedures, and discharge potential. Second, a corresponding manual with item-by-item descriptions outlining the definitions, intent, assessment process, and coding rules. Third, Clinical Assessment Protocols (CAPs), which support development of care plans in 22 clinical areas [11]. The RAI 2.0 is designed as a comprehensive assessment to be used as part of normal clinical practice, and assessments are intended to be done by trained health professionals working as part of a multidisciplinary team. In addition to care planning [12], other applications include case-mix based funding [13], quality indicators [14, 15], and outcome measurement [16]. These multiple applications for multiple audiences have meant that many stakeholders in continuing care have begun to look to these data as an important source of information for a variety of decisions. For example, the Resource Utilization Groups (RUG-III) case-mix algorithm [13] is now used in the funding formula for both complex continuing care (CCC) hospitals/units [17] and long term care (LTC) homes in Ontario [18]. In addition, Health Quality Ontario (http://www.hqontario.ca) reports publicly on CCC and LTC performance using quality indicators from the RAI 2.0 [15, 19]. Some of these indicators are part of formal accountability agreements between health care organizations and government agencies at the local and provincial levels that link performance expectations to funding.
The Continuing Care Reporting System (CCRS) is a national information system developed and managed by the Canadian Institute for Health Information (http://www.cihi.ca). Originally developed as the Ontario Chronic Care Patient System (OCCPS) to support the implementation of the RAI 2.0 in CCC hospitals/units, the CCRS now serves as a pan-Canadian data repository for eight participating provinces/territories implementing this instrument. It is a vehicle for national statistical reporting on the status of continuing care facilities (see, for example, CIHI report [20]), providing a national perspective on facility-based services for frail elderly and persons with disabilities.
Besides making the reporting system available to other provinces, the launch of the CCRS in 2003 included minor updates to the RAI 2.0 form and coding instructions. Also, forms were added to track demographic changes and to obtain a profile of participating facilities. Currently, all CCC hospital patients and LTC residents are assessed within 14 days of admission. In addition, quarterly re-assessments are done using a shortened form and annual re-assessments are done using the full MDS form. A discharge tracking form is used to record the date and disposition of all discharges, but a full assessment is not done at discharge. Data are sent electronically to the CCRS and must adhere to reporting standards established by CIHI in consultation with the provinces and interRAI. This includes passing new logical checks for data submissions. Seven of the eight provinces/territories that are implementing the RAI 2.0 are now submitting data to CIHI. Data are available for Ontario CCC hospitals/units beginning in 1996 and for LTC in that province as of 2005. More detailed descriptions of the implementation of the RAI 2.0 in Canada, beginning with CCC hospital settings in Ontario, are provided elsewhere [4, 21–28].
With the growing importance of the CCRS for decision-making in continuing care, questions about the system’s data quality have naturally arisen. The integrity of the data system is a precondition of its acceptance as a basis for these decisions. Therefore, there has been growing interest in current data quality of the CCRS and how it has changed over time. Also, there is interest in identifying methods that might be used to monitor, identify, and respond to changes in data quality in the future.
The changing role of facilities in the continuum of care
A variety of factors related to data quality could have introduced measurement error into the data over time, including changes in the population composition and the roles of CCC and LTC in the continuum of care. Therefore, any assessment of data quality must consider strategies for disaggregating indications of error from measures of actual changes in practice patterns or population characteristics in the sector of interest. Hirdes et al. [23] discuss the impact of the Health Services Restructuring Commission (HSRC) on the role of CCC and LTC. The HSRC provided a clear policy directive that CCC should reduce its emphasis on long-stay medically stable patients in favour of an increased emphasis on post-acute care for persons needing physical rehabilitation or complex medical care. The RUG-III algorithm was used to specify cutoffs for the two care settings. The bulk of the RUG-III categories referencing Impaired Cognition, Behaviour Disturbance, and Reduced Physical Functions were designated for LTC. The remaining categories (Rehabilitation, Extensive Services, Special Care and Clinically Complex) were designated for CCC. One of the most profound changes that has occurred has been a substantial reduction of length of stay in CCC, dropping from an average of 224 days in 1996 to less than 90 days by 2010 [4, 29]. In fact, CCC can be best thought of as containing two fairly distinct populations: a) short-stay post-acute patients who tend to be discharged within 90 days; and b) long stay medically complex patients who comprise a larger portion of all patient days while nevertheless representing only about 20 percent of admissions. LTC, on the other hand, tend to have a more stable population with longer stays but substantial impairments in cognition or functional status.
Types of data quality issues
As noted previously, psychometric testing to show reliability and validity of an assessment instrument is a necessary condition for its use as a basis for decision-making. In response to early concerns about the suitability of RAI data for research use and other health care decision-making (e.g., [30–32], the RAI 2.0 and related interRAI instruments have been subject to extensive, on-going testing to establish reliability [33–36] and validity [37–44]. A more detailed review of this evidence is available elsewhere [45]. However, the performance of an assessment instrument in a research context may not be matched by a similar level of reliability and validity when used as part of normal clinical practice [46]. For example, Crooks and colleagues [47] identified low agreement between urinary continence ratings done as part of normal clinical practice and subsequent independent tests of wetness done by research staff. These were explained, at least in part, by poor assessment practices that would be problematic with the implementation of any instrument, no matter how it performed in research trials. Therefore, it is necessary to identify potential threats to data quality in normal use and to develop methods to appraise the extent to which such problems are evident in the data obtained as part of regular clinical practice.
Random error
All assessment instruments will have some random error in which the “test score” obtained by its items will have some level of disagreement with the “true score” for the person who has been assessed. For example, when measuring behaviour based on events that can occur at different times over the course of the day (e.g., angry outbursts), there will be chance variations in whether the events were witnessed by staff or other informants. Characteristics that are more stable over time (e.g., activities of daily living) will have lower levels of random error than those that are relatively more volatile or changeable over time (e.g., fever, delirium, depression, behavior disturbance).
Random error is a source of concern with data quality, because it tends to attenuate the associations between variables of interest. Thus, this error makes it more difficult to detect true differences between populations or to identify relationships between variables. For example, higher levels of random error will weaken evidence of a relationship between a best practice intervention and a desired outcome and it may also create false regional differences in resource intensity or quality of care.
The conventional tests to evaluate random error include measures of reliability such as inter-rater reliability (tests of agreement between two independent raters) and tests of internal consistency (tests of the item correlations within parallel form scales).
Systematic error
A more troublesome form of error introduces a bias for either intentional or unintentional reasons. While random error may reduce the ability to detect associations by increasing “noise,” systematic error may alter associations such that they are not valid reflections of the true relationship between the variables being studied. For example, if an assessor believes that the specific responses on a given item may have negative organizational consequences, the assessor may have an incentive to record a response that would change the likelihood of those consequences occurring.
A specific example that is a worrisome problem for any health information system is the gaming of the case-mix algorithms that inform funding decisions. In this case, assessors may bias their coding practices in favour of more severe ratings on items used in the case-mix system with the aim of getting higher ratings of resource intensity. Gaming has many negative consequences, including the undermining of credibility of the entire information system to the point that participating organizations feel they cannot trust the data from their peer organizations. Unchecked gaming can also lead to the unfair allocation of limited resources if higher severity ratings are the product of biased reporting rather than the residents’ true need.
A distinction should be made between efforts to game a case-mix system and changes in practice patterns resulting from incentives in a case-mix system (e.g., the actual provision of more rehabilitation care, because it is paid at its cost or more). While the latter issue might be of concern for other reasons, it is not the same as the deliberate misrepresentation of resident characteristics in the pursuit of economic or other advantages.
Ascertainment bias is a different type of systematic error that is the result of differences in staff efforts or expertise in detecting difficult to measure resident characteristics. For example, pain, depression, and delirium are subtle, rather complex problems to detect, particularly in populations with substantial impairments in cognition or communication. Staff members who are skilled, sensitive or diligent in assessing these characteristics will detect higher rates of the problem than others who are less adept. Conversely, poor assessment practices may result in a systematic failure to detect a problem. Ascertainment bias can also be the product of preconceived notions that staff may have about the presence or absence of traits in certain types of patients (e.g., activity preferences and cognitive impairment). This type of bias is of concern because it may lead to a failure to detect problems in subpopulations that are difficult to assess (e.g., persons with dementia), and organizational variations may lead to false conclusions about differences in quality of care.
A third type of systematic error involves social desirability bias. In this case, reporting on assessment systems may be biased to avoid negative impressions of the resident, staff member or organization. For example, there may be a tendency to understate true quality problems if it is seen to be a reflection of the individual personally or on the managers of the organization.
Selection bias
Selection bias is typically considered to be a problem in sample data where certain types of individuals may be systematically included or excluded from a data set. However, it will also be a concern in organizational comparisons of quality when facilities tend to admit patients with different histories of potential quality problems. These past differences may also translate into differential likelihoods of the quality problem in the future in the absence of differences in clinical practice. A different type of selection bias may occur when there is discretion exercised about who does or does not receive an assessment (e.g., organizational differences in the rate of late RAI 2.0 assessments).
Autopopulation
Autopopulation refers to the practice of using data from another source to complete the fields of a current assessment, automatically and with no further scrutiny. Autopopulation would include the use of items from one instrument to fill out fields of a different instrument, or it may involve the use of old records from an assessment to complete the same fields in a new assessment. Hirdes et al. [48] explored the options for autopopulation of MDS items over time and concluded there were only a few fields where one might confidently carry forward prior values.
Autopopulation can cause many, serious data quality problems for a health information system. For example, it can make the data set unresponsive to true change if values are carried forward without clinician confirmation that no change has occurred. This, in turn, can reduce the evidence of the effectiveness of interventions or therapies on outcomes of interest. Moreover, it would mask good and poor performance on outcome-based quality indicators, because it would suppress evidence of changes in those measures.
Data completeness
Item non-response can result in data sets that are so incomplete as to render them worthless. Missing values (and items with an “Unknown” response) can make an observation unusable, thereby reducing the sample size in any analysis or report. While some solutions for dealing with missing values have been proposed (e.g., imputation based on other scores, substitution of median or mean scores) these approaches often are unacceptable (e.g., if funding or quality performance decisions are based on those data). Increases in the magnitude of item non-response will make a data set less representative over time, which is a serious problem for smaller organizations where the loss of data on a few cases could substantially alter the estimates of rates of a quality problem. Despite the serious threat posed by missing data, there are several fairly easy solutions to the problem. Examples include educational strategies to sensitize staff to the problem, instrument design to reduce the risk of ambiguity leading to non-response, and the use of electronic data checks to ensure that all fields are complete.
Logical errors
Logical errors include several types of coding inconsistencies that can result from poor assessment practices, poor instrument design, assessor fatigue, systematic biases, or random error. First, they include discrepancies between variables whose values are contingent on each other (e.g., items in a list are checked and “none of the above” is checked). Second, there may be out of range values (e.g., heights or weights scored in units not consistent with national standards or coding responses not associated with any meaning). Third, there may be improbable or impossible combinations of items (e.g., comatose but rated as engaging in group activities; elimination of chronic diseases for which there is no known cure; readmission dates that fall before admission dates). Among the problems caused by logical errors is the distortion of observations to be the equivalent of missing values. For example, if two responses cannot logically both be true, it is unclear which of the two is true. Hence, the problems with data completeness may also apply to logical errors.
Study objectives
The present study aims to examine the quality of CCRS data based on the Ontario CCC and LTC data submitted to CIHI between 1996–2011 and 2005–2011, respectively. It considers changes in population characteristics and practice patterns, and examines the rate of multiple potential quality problems. This study also aims to evaluate analytic strategies that may be used to evaluate data quality in other similar data sets (e.g., the RAI-Home Care and RAI-Mental Health, both of which have been implemented in other sectors or jurisdictions). The focus is on Ontario even though CCRS is a national reporting system because that province has the longest and most complete implementation of the RAI 2.0 at the time of writing.