An evaluation of data quality in Canada’s Continuing Care Reporting System (CCRS): secondary analyses of Ontario data submitted between 1996 and 2011

Background Evidence informed decision making in health policy development and clinical practice depends on the availability of valid and reliable data. The introduction of interRAI assessment systems in many countries has provided valuable new information that can be used to support case mix based payment systems, quality monitoring, outcome measurement and care planning. The Continuing Care Reporting System (CCRS) managed by the Canadian Institute for Health Information has served as a data repository supporting national implementation of the Resident Assessment Instrument (RAI 2.0) in Canada for more than 15 years. The present paper aims to evaluate data quality for the CCRS using an approach that may be generalizable to comparable data holdings internationally. Methods Data from the RAI 2.0 implementation in Complex Continuing Care (CCC) hospitals/units and Long Term Care (LTC) homes in Ontario were analyzed using various statistical techniques that provide evidence for trends in validity, reliability, and population attributes. Time series comparisons included evaluations of scale reliability, patterns of associations between items and scales that provide evidence about convergent validity, and measures of changes in population characteristics over time. Results Data quality with respect to reliability, validity, completeness and freedom from logical coding errors was consistently high for the CCRS in both CCC and LTC settings. The addition of logic checks further improved data quality in both settings. The only notable change of concern was a substantial inflation in the percentage of long term care home residents qualifying for the Special Rehabilitation level of the Resource Utilization Groups (RUG-III) case mix system after the adoption of that system as part of the payment system for LTC. Conclusions The CCRS provides a robust, high quality data source that may be used to inform policy, clinical practice and service delivery in Ontario. Only one area of concern was noted, and the statistical techniques employed here may be readily used to target organizations with data quality problems in that (or any other) area. There was also evidence that data quality was good in both CCC and LTC settings from the outset of implementation, meaning data may be used from the entire time series. The methods employed here may continue to be used to monitor data quality in this province over time and they provide a benchmark for comparisons with other jurisdictions implementing the RAI 2.0 in similar populations.


Background
Evidence-informed decision-making is an essential ingredient of any strategy that aims to deal with the complex challenges posed by population changes, limited resources, advancements in technology, and changing public expectations. High quality information systems are a foundation on which to build evidence to inform decisions. The availability of standardized, representative, comprehensive, reliable and valid data is a precondition to formulating evidence to respond to those challenges. Therefore, an essential step in introducing large-scale data systems aimed at providing such evidence is an evaluation of the quality of those data [1][2][3].
There are many potential threats to the quality of data in any large health information system. Unless the psychometric properties of the data have been evaluated systematically, one cannot have confidence the information on which decisions are made is inherently reliable or valid, even if the system is implemented and maintained in optimal conditions. Such information systems should perform not only in ideal conditions, but also when faced with the rigors of day-to-day use. Examples of factors that can undermine the quality of data in "real-world" situations include: poor training and lack of on-going education; lack of staff expertise; inadequate "buy-in" by staff; systematic biases in reporting due to financial incentives or avoidance of negative consequences of unfavorable findings; temporary coding problems with the introduction of new systems; declining attention and underfunding; lack of feedback to users; and poor data collection or coding strategies.

The Resident Assessment Instrument 2.0 and Continuing Care Reporting System (CCRS)
The Resident Assessment Instrument 2.0 (RAI 2.0) is a comprehensive assessment system that is at different stages of implementation in two care settings in eight Canadian provinces/territories [4]: long term care facilities (i.e., nursing homes that tend to serve a long-stay, medically stable population with substantial impairments in cognition or physical function) and complex continuing care hospitals/units (post-acute hospital settings that serve medically unstable persons with complex health conditions and functional impairments with stays typically lasting less than 90 days).
The instrument was originally developed by United States (US) researchers [5] after passage of a US law aimed at improving the quality of nursing home care [6]. Since then the system has been maintained and improved by interRAI (www.interrai.org), a 32-country network of clinicians and researchers focused on the implementation, application and continuing refinement of a suite of compatible instruments including the RAI 2.0 [7,8]. interRAI's goal is to develop these assessment systems to comprise an integrated health information system linking multiple sectors of health and social services for the elderly and other vulnerable populations [9,10].
The RAI 2.0 is composed of three main components. First, the Minimum Data Set (MDS) data collection form, which includes about 440 items covering domains such as cognition, communication, mood and behaviour, psychosocial well-being, physical functioning, continence, health conditions, nutrition, activities, medication, treatments, procedures, and discharge potential. Second, a corresponding manual with item-by-item descriptions outlining the definitions, intent, assessment process, and coding rules. Third, Clinical Assessment Protocols (CAPs), which support development of care plans in 22 clinical areas [11]. The RAI 2.0 is designed as a comprehensive assessment to be used as part of normal clinical practice, and assessments are intended to be done by trained health professionals working as part of a multidisciplinary team. In addition to care planning [12], other applications include case-mix based funding [13], quality indicators [14,15], and outcome measurement [16]. These multiple applications for multiple audiences have meant that many stakeholders in continuing care have begun to look to these data as an important source of information for a variety of decisions. For example, the Resource Utilization Groups (RUG-III) case-mix algorithm [13] is now used in the funding formula for both complex continuing care (CCC) hospitals/units [17] and long term care (LTC) homes in Ontario [18]. In addition, Health Quality Ontario (www.hqontario.ca) reports publicly on CCC and LTC performance using quality indicators from the RAI 2.0 [15,19]. Some of these indicators are part of formal accountability agreements between health care organizations and government agencies at the local and provincial levels that link performance expectations to funding.
The Continuing Care Reporting System (CCRS) is a national information system developed and managed by the Canadian Institute for Health Information (www. cihi.ca). Originally developed as the Ontario Chronic Care Patient System (OCCPS) to support the implementation of the RAI 2.0 in CCC hospitals/units, the CCRS now serves as a pan-Canadian data repository for eight participating provinces/territories implementing this instrument. It is a vehicle for national statistical reporting on the status of continuing care facilities (see, for example, CIHI report [20]), providing a national perspective on facility-based services for frail elderly and persons with disabilities.
Besides making the reporting system available to other provinces, the launch of the CCRS in 2003 included minor updates to the RAI 2.0 form and coding instructions. Also, forms were added to track demographic changes and to obtain a profile of participating facilities. Currently, all CCC hospital patients and LTC residents are assessed within 14 days of admission. In addition, quarterly reassessments are done using a shortened form and annual re-assessments are done using the full MDS form. A discharge tracking form is used to record the date and disposition of all discharges, but a full assessment is not done at discharge. Data are sent electronically to the CCRS and must adhere to reporting standards established by CIHI in consultation with the provinces and interRAI. This includes passing new logical checks for data submissions. Seven of the eight provinces/territories that are implementing the RAI 2.0 are now submitting data to CIHI. Data are available for Ontario CCC hospitals/units beginning in 1996 and for LTC in that province as of 2005. More detailed descriptions of the implementation of the RAI 2.0 in Canada, beginning with CCC hospital settings in Ontario, are provided elsewhere [4,[21][22][23][24][25][26][27][28].
With the growing importance of the CCRS for decision-making in continuing care, questions about the system's data quality have naturally arisen. The integrity of the data system is a precondition of its acceptance as a basis for these decisions. Therefore, there has been growing interest in current data quality of the CCRS and how it has changed over time. Also, there is interest in identifying methods that might be used to monitor, identify, and respond to changes in data quality in the future.
The changing role of facilities in the continuum of care A variety of factors related to data quality could have introduced measurement error into the data over time, including changes in the population composition and the roles of CCC and LTC in the continuum of care. Therefore, any assessment of data quality must consider strategies for disaggregating indications of error from measures of actual changes in practice patterns or population characteristics in the sector of interest. Hirdes et al. [23] discuss the impact of the Health Services Restructuring Commission (HSRC) on the role of CCC and LTC. The HSRC provided a clear policy directive that CCC should reduce its emphasis on long-stay medically stable patients in favour of an increased emphasis on post-acute care for persons needing physical rehabilitation or complex medical care. The RUG-III algorithm was used to specify cutoffs for the two care settings. The bulk of the RUG-III categories referencing Impaired Cognition, Behaviour Disturbance, and Reduced Physical Functions were designated for LTC. The remaining categories (Rehabilitation, Extensive Services, Special Care and Clinically Complex) were designated for CCC. One of the most profound changes that has occurred has been a substantial reduction of length of stay in CCC, dropping from an average of 224 days in 1996 to less than 90 days by 2010 [4,29]. In fact, CCC can be best thought of as containing two fairly distinct populations: a) short-stay post-acute patients who tend to be discharged within 90 days; and b) long stay medically complex patients who comprise a larger portion of all patient days while nevertheless representing only about 20 percent of admissions. LTC, on the other hand, tend to have a more stable population with longer stays but substantial impairments in cognition or functional status.

Types of data quality issues
As noted previously, psychometric testing to show reliability and validity of an assessment instrument is a necessary condition for its use as a basis for decisionmaking. In response to early concerns about the suitability of RAI data for research use and other health care decision-making (e.g., [30][31][32], the RAI 2.0 and related interRAI instruments have been subject to extensive, on-going testing to establish reliability [33][34][35][36] and validity [37][38][39][40][41][42][43][44]. A more detailed review of this evidence is available elsewhere [45]. However, the performance of an assessment instrument in a research context may not be matched by a similar level of reliability and validity when used as part of normal clinical practice [46]. For example, Crooks and colleagues [47] identified low agreement between urinary continence ratings done as part of normal clinical practice and subsequent independent tests of wetness done by research staff. These were explained, at least in part, by poor assessment practices that would be problematic with the implementation of any instrument, no matter how it performed in research trials. Therefore, it is necessary to identify potential threats to data quality in normal use and to develop methods to appraise the extent to which such problems are evident in the data obtained as part of regular clinical practice.

Random error
All assessment instruments will have some random error in which the "test score" obtained by its items will have some level of disagreement with the "true score" for the person who has been assessed. For example, when measuring behaviour based on events that can occur at different times over the course of the day (e.g., angry outbursts), there will be chance variations in whether the events were witnessed by staff or other informants. Characteristics that are more stable over time (e.g., activities of daily living) will have lower levels of random error than those that are relatively more volatile or changeable over time (e.g., fever, delirium, depression, behavior disturbance).
Random error is a source of concern with data quality, because it tends to attenuate the associations between variables of interest. Thus, this error makes it more difficult to detect true differences between populations or to identify relationships between variables. For example, higher levels of random error will weaken evidence of a relationship between a best practice intervention and a desired outcome and it may also create false regional differences in resource intensity or quality of care.
The conventional tests to evaluate random error include measures of reliability such as inter-rater reliability (tests of agreement between two independent raters) and tests of internal consistency (tests of the item correlations within parallel form scales).

Systematic error
A more troublesome form of error introduces a bias for either intentional or unintentional reasons. While random error may reduce the ability to detect associations by increasing "noise," systematic error may alter associations such that they are not valid reflections of the true relationship between the variables being studied. For example, if an assessor believes that the specific responses on a given item may have negative organizational consequences, the assessor may have an incentive to record a response that would change the likelihood of those consequences occurring.
A specific example that is a worrisome problem for any health information system is the gaming of the casemix algorithms that inform funding decisions. In this case, assessors may bias their coding practices in favour of more severe ratings on items used in the case-mix system with the aim of getting higher ratings of resource intensity. Gaming has many negative consequences, including the undermining of credibility of the entire information system to the point that participating organizations feel they cannot trust the data from their peer organizations. Unchecked gaming can also lead to the unfair allocation of limited resources if higher severity ratings are the product of biased reporting rather than the residents' true need.
A distinction should be made between efforts to game a case-mix system and changes in practice patterns resulting from incentives in a case-mix system (e.g., the actual provision of more rehabilitation care, because it is paid at its cost or more). While the latter issue might be of concern for other reasons, it is not the same as the deliberate misrepresentation of resident characteristics in the pursuit of economic or other advantages.
Ascertainment bias is a different type of systematic error that is the result of differences in staff efforts or expertise in detecting difficult to measure resident characteristics. For example, pain, depression, and delirium are subtle, rather complex problems to detect, particularly in populations with substantial impairments in cognition or communication. Staff members who are skilled, sensitive or diligent in assessing these characteristics will detect higher rates of the problem than others who are less adept. Conversely, poor assessment practices may result in a systematic failure to detect a problem. Ascertainment bias can also be the product of preconceived notions that staff may have about the presence or absence of traits in certain types of patients (e.g., activity preferences and cognitive impairment). This type of bias is of concern because it may lead to a failure to detect problems in subpopulations that are difficult to assess (e.g., persons with dementia), and organizational variations may lead to false conclusions about differences in quality of care.
A third type of systematic error involves social desirability bias. In this case, reporting on assessment systems may be biased to avoid negative impressions of the resident, staff member or organization. For example, there may be a tendency to understate true quality problems if it is seen to be a reflection of the individual personally or on the managers of the organization.

Selection bias
Selection bias is typically considered to be a problem in sample data where certain types of individuals may be systematically included or excluded from a data set. However, it will also be a concern in organizational comparisons of quality when facilities tend to admit patients with different histories of potential quality problems. These past differences may also translate into differential likelihoods of the quality problem in the future in the absence of differences in clinical practice. A different type of selection bias may occur when there is discretion exercised about who does or does not receive an assessment (e.g., organizational differences in the rate of late RAI 2.0 assessments).

Autopopulation
Autopopulation refers to the practice of using data from another source to complete the fields of a current assessment, automatically and with no further scrutiny. Autopopulation would include the use of items from one instrument to fill out fields of a different instrument, or it may involve the use of old records from an assessment to complete the same fields in a new assessment. Hirdes et al. [48] explored the options for autopopulation of MDS items over time and concluded there were only a few fields where one might confidently carry forward prior values.
Autopopulation can cause many, serious data quality problems for a health information system. For example, it can make the data set unresponsive to true change if values are carried forward without clinician confirmation that no change has occurred. This, in turn, can reduce the evidence of the effectiveness of interventions or therapies on outcomes of interest. Moreover, it would mask good and poor performance on outcome-based quality indicators, because it would suppress evidence of changes in those measures.

Data completeness
Item non-response can result in data sets that are so incomplete as to render them worthless. Missing values (and items with an "Unknown" response) can make an observation unusable, thereby reducing the sample size in any analysis or report. While some solutions for dealing with missing values have been proposed (e.g., imputation based on other scores, substitution of median or mean scores) these approaches often are unacceptable (e.g., if funding or quality performance decisions are based on those data). Increases in the magnitude of item non-response will make a data set less representative over time, which is a serious problem for smaller organizations where the loss of data on a few cases could substantially alter the estimates of rates of a quality problem. Despite the serious threat posed by missing data, there are several fairly easy solutions to the problem. Examples include educational strategies to sensitize staff to the problem, instrument design to reduce the risk of ambiguity leading to non-response, and the use of electronic data checks to ensure that all fields are complete.

Logical errors
Logical errors include several types of coding inconsistencies that can result from poor assessment practices, poor instrument design, assessor fatigue, systematic biases, or random error. First, they include discrepancies between variables whose values are contingent on each other (e.g., items in a list are checked and "none of the above" is checked). Second, there may be out of range values (e.g., heights or weights scored in units not consistent with national standards or coding responses not associated with any meaning). Third, there may be improbable or impossible combinations of items (e.g., comatose but rated as engaging in group activities; elimination of chronic diseases for which there is no known cure; readmission dates that fall before admission dates). Among the problems caused by logical errors is the distortion of observations to be the equivalent of missing values. For example, if two responses cannot logically both be true, it is unclear which of the two is true. Hence, the problems with data completeness may also apply to logical errors.

Study objectives
The present study aims to examine the quality of CCRS data based on the Ontario CCC and LTC data submitted to CIHI between 1996-2011 and 2005-2011, respectively. It considers changes in population characteristics and practice patterns, and examines the rate of multiple potential quality problems. This study also aims to evaluate analytic strategies that may be used to evaluate data quality in other similar data sets (e.g., the RAI-Home Care and RAI-Mental Health, both of which have been implemented in other sectors or jurisdictions). The    focus is on Ontario even though CCRS is a national reporting system because that province has the longest and most complete implementation of the RAI 2.0 at the time of writing.

Data source
The 2010-2011 CCRS data, received from CIHI with encrypted facility and resident identifiers, was used for analysis. This dataset contained assessments done from July 1, 1996 to March 31, 2011 in both Ontario CCC hospitals/units (n = 466,767) and LTC (n = 900,885) for a total of 1,367,652 assessments. Because of their unique nature, individuals assessed as comatose were excluded. The dataset was then sorted by assessment date, assigned to a quarter, and analyses were stratified by sector. All assessments were used, including admission, quarterly and annual reassessments. There were 181,548 individuals assessed in CCC settings in that time period compared with 135,245 in LTC after implementation began in 2005 until 2011 when it was complete. With longer stay in the latter, the average number of assessments per individual was 2.6 in CCC and 6.7 in LTC.

Analysis
The primary analytic approach was to examine the quarterly time series trends for various indicators. Stratified analyses were done where differences were evident between CCC hospitals/units and LTC. Linear regression models were fitted to the various time series to estimate the magnitude of change in the selected indicators over time. Various indicators of population change in clinical characteristics, service utilization, and resource intensity were considered. The data were examined for trends in measures of convergent validity using measures of association (e.g., Pearson's r, Cramer's V) for variables expected to be related to each other where the relationship is unlikely to change dramatically over time (e.g., cognitive impairment and ADL status). This is an extension of the approach used by Phillips and Morris [45] in their analyses of US MDS 2.0 data. To examine reliability in the CCRS, Cronbach's alpha was used as a measure of internal consistency for three parallel form scales embedded in the RAI 2.0: the Activities of Daily Living Scale -Long Form [49]; Depression Rating Scale [50] and the Aggressive Behavior Scale [51]. This subset of available interRAI scales was selected because they are likely to have different levels of reliability based on previous research (e.g., the ADL scale tends to have high Cronbach's alpha scores, whereas the DRS tends to be at the lower end of the acceptable range of reliability). These differences are useful for calibrating the degree of decline (or improvement) in internal consistency for scales known to have different baseline levels of reliability. Although the Cognitive Performance Scale [52] was used to examine associations with other variables, its reliability could not be evaluated with Cronbach's alpha, because it is not a parallel form scale. Instead it is a decision-tree algorithm that was derived using diverse correlates of cognition in predictive models.
Some of CIHI's rules for logical inconsistencies in coding practices were considered for both clinical and service utilization indicators. In addition, new longitudinal indicators of logical errors, based on the failure to code diagnoses at follow-up that were present at baseline and were unlikely to have been cured (e.g., multiple sclerosis), were developed by an expert panel of clinicians, interRAI researchers and CIHI staff. Three measures of potential autopopulation were constructed by examining the absence of change in sets of indicators over time. ADL function and mood indicators were chosen to represent two clinical domain areas with different patterns of stability and change over time. Finally, the time between the assessment reference date and the date the assessment was signed off as complete, was considered as a measure of efficiency in completing the assessment and of the appropriateness of assessment practices.

Ethics clearance
The study was reviewed and received ethics clearance through the Office of Research Ethics at the University of Waterloo (ORE #13848). Table 1 shows a number of key descriptive characteristics for the population studied based on all available assessments in the study period. They include percentage estimates for: admission assessments, Cognitive Performance Scale (CPS) scores of three or more (indicating moderate or worse cognitive impairment), Activities of Daily Living (RUG ADL) scores of 11 or more (indicating moderate or worse functional impairment based on the scale used in the RUG-III case-mix system), and Aggressive Behaviour Scale (ABS) scores of five or more (indicating a high-level of behaviour disturbance), all for each quarter from the third quarter in 1996 to the first quarter in 2011, inclusive. Each of these indicators was stratified by type of facility, as there were notable differences evident. There was a consistently higher percentage of admission assessments in the CCC hospitals/units than in LTC over the entire 59-quarter study period with observations for both sectors. The percentage of admission assessments increased over time in CCC reflecting a policy shift toward a greater emphasis on post-acute care during the 15 year study period for that sector. There was only modest change in the proportion of admission assessments in LTC, which serve a longer staying, more stable population. The higher proportion of admission assessments in Quarter 2 and 3 of 2006 reflect an administrative artifact of LTC beginning implementation of the RAI 2.0.

Results
With respect to clinical attributes, the prevalence of moderate or worse cognitive impairment was notably higher in LTC compared with CCC. There was an absolute reduction by 17% of patients with that level of cognitive impairment in CCC, whereas that subgroup remained relatively stable at about 60% of the LTC population. The percentage with moderate or worse ADL impairment remained stable in both settings over time and was consistently higher in the CCC hospitals/ units. The differences in the ABS scores were modest between the two facility types over time, with a somewhat higher proportion with severe behavior disturbance in LTC. The percentage of LTC residents with high ABS scores was greater among early adopter homes, but was relatively stable at about 12% by the final quarter of 2007. The percentage with a DRS score of three or more was stable over time in both settings, but tended to be about 10% higher in LTC compared with CCC. Table 2 reports on measures of service utilization and resource intensity by facility type, including mean rehabilitation therapy minutes (i.e., the total minutes of speech, occupational and physical therapy received in the last 7 days), percentage of patients receiving two or more nursing rehabilitation interventions (includes passive and active range of motion exercises; splint or brace assistance; training/skill practice in bed mobility, transfer, walking, dressing or grooming, eating or swallowing, amputation or prosthesis care, scheduled toileting/bladder retraining, and communication), and mean RUG-III Case-Mix Index (CMI). For both settings, there were a notable increases in reported therapy minutes per week and the proportion reported to receive two or more nursing rehabilitation requirements. In CCC, the mean rehabilitation minutes rose from about 66 minutes when the RAI 2.0 was mandated to about 143 minutes in the first quarter of 2011. In LTC they rose from about 16 minutes in 2005 to 42 minutes in 2011. Similarly, the percentage of CCC patients receiving nursing rehabilitation increased from about 27% at the start of the study period to about 64% and from about 10% in 2006 to 26% in LTC. The mean CMIs were notably higher in all years for CCC compared with LTC, reflecting differences in the populations served and the intensity of interventions received. Table 3 reports the distributions of facility-level percentages of persons qualifying for the RUG-III Special Rehabilitation level in the 2008 to 2011 time period. Although there are changes in both sectors, the changes in the percentage of LTC residents at this RUG-III following its inclusion in the payment system for that sector are striking. Surprisingly, the RUG-III distributions for about 10 percent of LTC suggested levels of Special Rehabilitation that were evident in rehabilitation hospitals, funded at substantially higher rates.
Whereas the previous tables can be presumed, at least in part, to reflect changes in the CCC and LTC populations and the services they received over time, Table 4 considers the patterns of associations (an indicator of convergent validity) intended to yield insight into data quality of clinical elements in the RAI 2.0. The relationships of four variables with cognitive impairment are examined to identify the magnitude and direction of their associations; and to examine the stability of these associations over time. Both ADL impairment and aggressive behaviour are positively correlated with the CPS, but the relationship is strongest for ADL. Similarly, using the Cramer's V statistic for associations in crosstabulations, bowel incontinence is positively related to cognitive impairment. On the other hand, there is a negative correlation of the interRAI Pain Scale with the CPS. This is not surprising given the wealth of literature that point to this negative relationship, which is often explained in part by under-detection of pain in cognitively impaired patients Proctor and Hirdes [53]). To the extent that this latter association is a function of ascertainment bias, one might expect to see some change in this association as practice patterns improve. While there is almost no change in the associations of CPS with ADL, bowel continence, and aggressive behaviour over time, there was a slight weakening of the  correlation of CPS and pain in the study period. There were only modest between-sector differences, suggesting that the associations of these clinical variables were relatively stable between LTC and CCC. Table 5 examines patterns of scale reliability over time as, measured using Cronbach's alpha statistic for internal consistency. Using a cut-off of 0.70 for acceptable reliability and 0.80 for excellent reliability, all three scales (ADL Long Form, DRS, ABS) displayed acceptable or excellent reliability over the entire study period and between the two care settings. The alpha values were lowest for the DRS and highest for the ADL Long Form scale, which is consistent with previous reports in the literature.
The following three tables consider cross-sectional logical inconsistency in coding of clinical and service utilization variables. Table 6 shows the rate of the following logical errors in coding: mood persistence -the mood persistence item records mood indicators are present, but none of the individual items for mood indicators are coded as being present ADLone, but not both, of the ratings for support and performance of specific ADLs are coded as "8did not occur" parental/enteral intake -a mismatch between being reported to have parenteral/IV or feeding tube present and the proportion of total calories and fluid intake from those sources per day), and pressure ulcer staging (i.e., a "highest stage" value is assigned for pressure or stasis ulcers but the number of ulcers at that stage is missing or equal to zero).
These logical errors occurred at very low rates when the RAI 2.0 was originally mandated in CCC. Rates were below 5% at the outset, except for the inconsistency for parenteral/enteral intake in CCC which was below 10% in the initial years of implementation. However, they were completely eliminated in CCC after 2003 when CIHI began to use data quality checks for such errors in its acceptance procedures. After that date, data submissions with these errors present were rejected by CCRS and hospitals were required to remediate the problems prior to resubmissions. These errors never appeared in the LTC data, which is unsurprising given that implementation in that sector occurred after the change was made to CCRS regarding those errors.
As with the abovementioned clinical logical error checks, the rates of logical errors related to coding of therapy time was very low over the entire study period (see Table 7). By 2011, the only logical error that  persisted was for therapy of less than 15 minutes being counted as a day with therapy, which was present for about 2% of cases. A similar pattern is evident in Table 8, which shows very low rates of probable error in coding height and age (both with rates below 5%), although CCC hospitals/ units were more likely than LTC to have problems with coding height. The rates of logical errors in coding weight were considerably higher in CCC between 1996 and 2003, with rates as high as 21%. However, for both settings these rates fell to 5% or lower in the last year of the study period, and in LTC problems with coding of weight were evident less than 1% of the time. Figure 1 shows that the rate for any logical problem in coding of mood indicators, ADL, nutritional intake, pressure ulcers, and therapy were below 10% for the entire study period. There was also a clear trend toward improved data quality in these areas over time. Both the CCC and LTC settings had steady rates of approximately 2% for these indicators after 2005.
The next set of analyses uses longitudinal records to identify unlikely reversals in chronic diseases coded at the initial assessment (see Table 9). If either multiple sclerosis, quadriplegia, cerebral palsy or schizophrenia are noted on an initial assessment, it would be unusual for the condition to have been reversed by the follow-up assessment. Although the symptoms of multiple sclerosis can be less pronounced at times, none of these conditions is considered curable. Either the initial assessment was inaccurate (for which a correction should have been submitted) or this diagnosis was incorrectly coded as absent at follow-up. The instances of not having the condition at follow-up among those who had it coded at the initial assessment became increasingly uncommon over time. However, in the initial phases of implementation about one fifth of those with quadriplegia or with cerebral palsy had this error at follow up. While these problems appear to be declining over time, it would be fairly easy to eliminate them with appropriate longitudinal error checks.
The problem of autopopulation is the focus of analyses reported in Table 10. In an extreme case, autopopulation would be suggested by assessments for which all 231 clinical variables were unchanged between assessments. This is a rare event, although there are some quarters where it has occurred for about 2-3% of patients in CCC facilities. Similar analyses for the 16 mood items and for the 20 ADL self-performance and support items found rates of no change in these values from a previous fiscal quarter in about 35% of CCC patients at the start of the RAI 2.0 mandate, rising to about 47% by 2011. At least some of this stability is reflective of the true absence of clinical change; however, it is interesting to note a much lower rate of identical mood items on reassessment in LTC. When considering the ADL performance opposite trends in potential autopopulation are evident between  the two sectors with the proportion having identical scores on 20 ADL items increasing in CCC but decreasing in LTC. The final analyses examine assessment practice patterns by comparing the time between the assessment reference date (marking the day used as the clinical anchor point for the RAI 2.0 assessment) and the date the assessment is signed off as complete by the assessment coordinator. Table 11 shows the annual rates for dates where this difference is less than 0 days (indicating a coding error in the date variables), 0-6 days, 7-30 days, and more than 30 days. The preferred practice pattern is for the assessment to be signed off as complete as close to the assessment reference date as possible. However, given that some team members may be submitting their minutes of service delivery as a batch for a group of patients (e.g., rehabilitation therapy minutes) some time lag is acceptable between these dates. In all years, the majority of assessments were signed off as complete within 6 days of the assessment reference date in CCC; however, only about half of LTC met this standard. The gap was greater than 30 days in about 24% of the CCC assessments done in 1996, but this rate improved dramatically over time, with less than 10% of assessments having a gap this large in 2011. In LTC the performance in this regard was even better with less than 5% of homes having a gap of greater than 30 days in assessment completion.
Finally, Figure 2 expands on the analyses by Phillips and Morris [46] as well as those reported in Tables 4 and 5 Figure 2 shows that there is a strong correspondence between the patterns of these associations in LTC and CCC with an R 2 of 0.94 for the indicators between the two sectors. This suggests that the clinical elements of the RAI 2.0 behave in fundamentally the same way between the two care settings.

Discussion
Although there are a limited number of specific findings that raise some concern, the overall picture of data quality in the CCRS between 1996 and 2011 is very positive. There is good evidence of reliability in key clinical scales. Tests of convergent validity indicate that major variables like cognition, ADL, continence, and behaviour are related in the expected directions and the associations have been stable over time. The present results replicate and extend previous analyses of US data by Phillips and  Morris [46] that showed good reliability and validity in RAI 2.0 data from that country's nursing homes. In addition, there is clear evidence that the clinical data from the RAI 2.0 behave in a consistent manner between CCC and LTC. Many other associations (e.g., cancer diagnosis and pain) were examined and yielded positive findings, but these were not reported here. Nonetheless, these relationships can be helpful as monitoring tools to examine data quality.
In addition, the rates of logical errors in clinical and service utilization indicators are low, and the efforts of CIHI to prevent such errors has proven to be effective in many areas. Although some of these errors were evident at the start of the study period, many of these rates were reduced or eliminated over time. The present findings strongly support the value of so-called "edit checks" in the CCRS to root out logical errors as noted in this report. In fact, it would be useful to extend these mechanisms to allow for longitudinal coding errors using the methods described in this paper (e.g., changes in diagnosis over time).
There is also little evidence of widespread autopopulation in the data set. To the extent that it is present, the problem is of greater concern in CCC hospitals/units than it is in LTC. Certainly, any tendency to reuse previous ratings must only be occurring at the level of limited item sets, if at all, because there is no evidence that replication of entire records occurs at more than a negligible rate. That said, it would be useful to undertake further efforts to establish the expected rate for stability in mood and ADL indicators in different settings and subpopulations. This would permit this type of check to be used as a more subtle data quality indicator than examining replication of all records. From the perspective of the participating facilities, elimination of autopopulation where it occurs should be considered a priority because it threatens the ability of the home to use interRAI Quality Indicators for quality improvement initiatives. Indeed, autopopulation will magnify the risk of a failure to detect true improvements in quality if the coding practices of the facility hide evidence of that change.
Given the notable changes in therapy minutes, nursing rehabilitation, and facilities qualifying for the Special Rehabiltation level of RUG-III, one must ask whether the change is real or the result of gaming or "upcoding" of variables to maximize case mix scores. To start, it is useful to consider what might be expected in an environment where pervasive gaming of all aspects of the RUG-III case-mix system has occurred. First, the relationship between clinical constructs (e.g., disability and cognition) would be expected to deteriorate. In fact, there is no evidence of such changes in these data. Second, the reliability of scales might be expected to decline as facilities selectively up code only those items within scales that are used to derive case-mix scores; that does  not appear to have occurred in the CCRS data. There were improvements in the logical error rates for coding of therapy minutes, but the changes were modest because the rates were low at the outset. On the other hand, for CCC hospitals/units the increased emphasis on rehabilitation and on post-acute care is entirely consistent with the policy directives of the Health Services Restructuring Commission. In addition, there was a policy initiative implemented in LTC in Ontario to expand access to rehabilitation services. This suggests that at least some of these changes were actual outcomes of policy changes in both sectors. However, it is also striking to note that a number of LTC report rehabilitation well in excess of norms in CCC hospitals/units, which requires further careful examination. A comparison with other regions that have   The present study should be extended in subsequent research to examine how the techniques reported here might be used to evaluate data quality at the facility (or even assessor) level. It is likely the case that data quality problems occur in a more pronounced way among a small number of facilities rather than in either sector as a whole. For example, a troublesome data quality finding is the tendency for a few facilities to have large gaps between the assessment reference date and the date the assessment is signed off as complete. Anecdotal reports from the field have indicated that a limited number of facilities historically relied on chart reviews to complete backlogged RAI 2.0 assessments, and this would be consistent with the findings for a handful of facilities. This practice must be strongly discouraged because it detracts from the clinical applications of the RAI 2.0. It increases the risk of not detecting major clinical problems (e.g., delirium), it excludes the patient and family from the assessment and it compromises the quality of data from the facility in question. In fact, the problem may be sufficiently important to justify rejection of assessments with a gap of greater than 30 days.
Besides the overall finding that data quality appears to be good for the CCRS, it was noteworthy that this has generally been true from the outset. Although conventional wisdom has been that the initial years of data collection for a reporting system like CCRS would yield data with compromised quality, this study provides direct evidence contrary to that assertion. While there were modest problems with coding of weight and some logical errors occurred at very low rates, the reliability and validity of the CCRS data appear to have been good from the first quarter of the mandate in each sector. While a variety of anomalies will occur with any large-scale change of this type, there is no evidence that data from the introduction of the RAI 2.0 could not have been used to inform decision-making shortly after its introduction. In fact, the primary problem with the data set in the first year was probably the absence of data from several facilities that were unable to submit data for several quarters.
The present study employs a wide set of relationships that can provide multiple perspectives from which data can be examined. This may be useful to governments, regulatory bodies, accreditation, and facility administrators who are interested in monitoring for and responding to problems in data quality. The use of statistical techniques as demonstrated here represent a lower cost option as the    first line of data quality monitoring than using more expensive and burdensome techniques (e.g., widespread inter-rater reliability testing). There are at least two ways that these analyses may be used in regular practice. First, the techniques reported here, when applied at the facility level, may be used to identify individual facilities for more careful, in-person scrutiny by expert assessors or representatives of regulatory agencies. Second, the present results may be used as benchmarks to evaluate the quality of implementation of the RAI 2.0 in other jurisdictions. For example, the effectiveness of implementing the RAI 2.0 in long term care facilities in other provinces can be examined, at least in part, by replicating the present analyses for those homes and comparing the results with the Ontario experience.
The positive findings reported here should not be taken to mean the CCRS will yield high quality data without effort. In fact, these results point to the benefits of implementing systematic checks and balances to ensuring data quality. In addition, while there are many positive findings, there remain some important areas of concern that must be addressed expeditiously. There continues to be a strong role for ongoing education and feedback to clinicians to ensure that good assessment practices are sustained over time. In that regard, the present findings provide an historical watermark of what has been achieved in Ontario. It behooves all stakeholders in the CCRS to ensure the quality of CCRS data will be sustained and improved as it becomes more widely used as a basis for decision making in clinical practice, service delivery and policy.
The present findings also point to a methodology that may be employed by other reporting systems based on newer interRAI instruments including CIHI's Home Care Reporting System based on the RAI-Home Care [9,54] and Mental Health Reporting System for the RAI-Mental Health and interRAI Community Mental Health [55][56][57]. The present study was based on the RAI 2.0, which has been updated with the newer interRAI Long Term Care Facility (LTCF) instrument [8]. It will be useful to examine the performance of the various statistical indicators reported here in jurisdictions that have begun to implement the newer instrument.
An interesting question for future research is whether, as one might expect, implementations of interRAI instruments that emphasize clinical applications over their administrative uses will yield higher quality data. The present results provide a baseline of data quality measures against which alternative training and implementation approaches may be evaluated. In addition, future work might also extend these analyses by examining the extent to which reported changes in the amount of therapies (e.g., nursing rehabilitation and restorative care) actually translate to positive outcomes in the areas in which those therapies are reported to have been provided.

Conclusions
The CCRS provides a robust, high quality data source that may be used to inform policy, clinical practice and service delivery in Ontario. The overall picture provided by these analyses provides strong evidence that the RAI 2.0 data from CCRS could appropriately be used for research use, program planning, evaluation and quality monitoring. Only one area of concern was noted (coding related to the Special Rehabilitation RUG-III level in LTC after 2009), and the statistical techniques employed here may be readily used to target organizations with data quality problems in that (or any other) area. There was also evidence that data quality was good in both sectors from the outset of implementation, meaning data may be used from the entire time series. The methods employed here may continue to be used to monitor data quality in this province over time and they provide a benchmark for comparisons with other jurisdictions implementing the RAI 2.0 in similar populations.