Methods for identifying 30 chronic conditions: application to administrative data

Background Multimorbidity is common and associated with poor clinical outcomes and high health care costs. Administrative data are a promising tool for studying the epidemiology of multimorbidity. Our goal was to derive and apply a new scheme for using administrative data to identify the presence of chronic conditions and multimorbidity. Methods We identified validated algorithms that use ICD-9 CM/ICD-10 data to ascertain the presence or absence of 40 morbidities. Algorithms with both positive predictive value and sensitivity ≥70% were graded as “high validity”; those with positive predictive value ≥70% and sensitivity <70% were graded as “moderate validity”. To show proof of concept, we applied identified algorithms with high to moderate validity to inpatient and outpatient claims and utilization data from 574,409 people residing in Edmonton, Canada during the 2008/2009 fiscal year. Results Of the 40 morbidities, we identified 30 that could be identified with high to moderate validity. Approximately one quarter of participants had identified multimorbidity (2 or more conditions), one quarter had a single identified morbidity and the remaining participants were not identified as having any of the 30 morbidities. Conclusions We identified a panel of 30 chronic conditions that can be identified from administrative data using validated algorithms, facilitating the study and surveillance of multimorbidity. We encourage other groups to use this scheme, to facilitate comparisons between settings and jurisdictions. Electronic supplementary material The online version of this article (doi:10.1186/s12911-015-0155-5) contains supplementary material, which is available to authorized users.


Background
Management of chronic disease is the major challenge facing health systems worldwide [1]. Many people with chronic disease have multiple chronic conditions, which is termed multimorbidity [2]. It is clear that multimorbidity is common and associated with worse clinical outcomes and higher health care costs, compared to good health or to the presence of a single chronic condition [3][4][5][6]. However, there are key knowledge gaps concerning the basic epidemiology of multimorbidity [7]; its clinical and economic consequences; and how it contributes to disparities in health [8]. This information is prerequisite to mitigating the impact of multimorbidity and chronic disease [9].
Multimorbidity has been identified as a key research priority by the Public Health Agency of Canada, and is crucial to inform programming and resource forecasting. Knowledge of secular changes in the incidence and prevalence of multimorbidity is required, and would be facilitated by methods for identifying the presence of multimorbidity using administrative data.
Identifying morbidity using administrative data can be simple (e.g., based on a single hospitalization and using only a small number of codes) or complex (e.g., including inpatient and outpatient encounters and long lists of codes). Once developed, algorithms may be validated against a suitable gold standard (e.g., chart reviews; other previously validated algorithms).
Previous work by Barnett et al [3] identified 40 morbidities and were informed by a systematic review of multimorbidity measures [10], the Quality and Outcomes Framework of the UK General Practice contract, and health service planning by NHS Scotland. However, these authors used administrative data sources that are unique to the United Kingdom to identify the presence or absence of these conditions. A corresponding scheme based on the more widely used ICD-9 CM/ICD-10 system is not available.
Therefore, we first identified previously validated algorithms that use the ICD-9 CM/ICD-10 system for ascertaining the presence of chronic conditions using inpatient and outpatient claims and utilization data. We then showed proof of concept for using administrative data to study multimorbidity, by applying these previously validated algorithms to a population of adults residing in Edmonton, Canada between April 2008 and March 2009.

Methods
The institutional review boards at the University of Alberta (Pro00038795) and the University of Calgary (E22590) approved the study.

Morbidities
We did a focused literature search for validated algorithms that use ICD-9 CM/ICD-10 codes in administrative data from inpatient and outpatient encounters to ascertain the presence or absence of the 40 morbidities identified by Barnett et al [3]. We searched MEDLINE using combinations of the following MeSH subject headings together with the specific morbidity of interest: 'International Classification of Diseases', 'Reproducibility of Results', and 'Sensitivity and Specificity'. Based on an a priori decision, we considered algorithms to be of high validity if they had both positive predictive value (PPV) and sensitivity ≥70% as compared to an acceptable gold standard such as chart review. We considered algorithms to be of moderate validity if they had PPV ≥70% but sensitivity <70%. The cut-off values for PPV and sensitivity were based on previous validation studies of administrative data [11]. We did not consider negative predictive value or specificity, because these parameters are generally >90% in studies of chronic diseases among the general population [11,12]. The definition of multimorbidity required the coexistence of two or more of the morbidities. In a secondary analysis we used a more restrictive definition that required three or more morbidities to be present.

ICD codes
Canadian hospital discharge abstract data are coded with ICD-10 CA, which essentially increases specificity compared to the ICD-10 system by adding more digits [11]. All of the ICD-10 codes from the included algorithms are consistent with ICD-10 CA codes, and thus we used ICD-10 and ICD-10 CA codes interchangeably throughout this manuscript. When ICD-10 codes were not given in the primary papers, we used the Canadian Institute for Health Information (CIHI; www.cihi.ca) conversion table to convert ICD-9 CM codes to ICD-10 codes. Many algorithms required multiple codes within a specified time period to determine incidence of morbidity (Table 1). In each case the index date for the disease was considered to be the date of the first code. For example, in order to determine presence of asthma, we searched for ICD-9 CM 493 and ICD-10 J45 codes in hospitalizations and outpatient encounters. We considered asthma to have developed at the first instance of a single hospitalization with either of these codes, or a single outpatient encounter followed by two further outpatient encounters with either of these codes within two years. In either case, we considered the participant to have asthma for the duration of follow-up.

Proof of concept
We applied identified algorithms with high or moderate validity to a population-based administrative dataset from Alberta Health (AH; the provincial health ministry) and Alberta clinical laboratories. Details of this administrative dataset including claims, hospitalizations and Ambulatory Care Classification System (ACCS) utilization are given in Figure 1 and have been reported elsewhere [13]. We assembled a cohort of adults aged ≥18 years who resided in the city of Edmonton, Alberta between April 2008 and March 2009, and included all people registered with AH. All Alberta residents are eligible for insurance coverage by AH, and >99% participate in this coverage. The dataset included demographic information such as postal code of residence, laboratory data, and medication in those aged ≥65 years [13]. We identified Edmonton residents from the AH registry file using the community name variable from the Statistics Canada Postal Code 2008 Conversion file [14] (www.statcan.gc.ca).
To demonstrate proof of concept for applying these algorithms to a large administrative dataset, we presented a simple summary of the prevalence of morbidity and multimorbidity in the study population. Counts and percentages were presented along with a figure showing how the number of morbidities varies by age. In sensitivity analyses, we presented the prevalence of morbidity as assessed by different sources of administrative data.

Algorithms
We identified 16 morbidities for which the best identified algorithm was of high validity: asthma, atrial fibrillation, metastatic cancer, chronic heart failure, chronic kidney disease, chronic pain, cirrhosis, diabetes, hypertension, irritable bowel syndrome, multiple sclerosis, myocardial infarction, peripheral vascular disease, psoriasis, schizophrenia and    severe constipation. We identified an additional 14 morbidities (including two algorithms for other types of cancer and one algorithm for another type of liver disease) for which the best identified algorithm was of moderate validity: alcohol misuse, lymphoma, non-metastatic cancer (breast, cervical, colorectal, lung, and prostate), chronic pulmonary disease, chronic viral hepatitis B, dementia, depression, epilepsy, hypothyroidism, inflammatory bowel disease, Parkinson's disease, peptic ulcer disease, rheumatoid arthritis, and stroke or transient ischemic attack. We excluded the remaining morbidities for which no suitable algorithm could be identified (Additional file 1 Table S1). Thus we identified 30 conditions using administrative algorithms (including ICD-9 CM and ICD-10 codes) that are summarized in the Table 1. Of these 30 algorithms, half were validated for both ICD-9 CM and ICD-10 codes.
We identified all conditions exclusively using ICD-9 CM and ICD-10 data with the exception of chronic kidney disease, for which we used a validated algorithm applied to ICD-9 CM and ICD-10 data [15] and supplemented using serum creatinine and albuminuria data [16]. We considered chronic kidney disease to be present if a participant met either the administrative or laboratory criteria.
In some cases, we made minor changes to the published algorithms to improve anticipated diagnostic performance, to increase consistency between algorithms used for the different conditions, and to include application to the outpatient setting. First, the original publications by Quan et al [11,17] required one hospitalization to identify the presence of a chronic condition; based on input from the first author of that paper, we modified this algorithm to allow either one inpatient code or two outpatient codes within two years to define the presence of these conditions. Second, to improve mapping of ICD-9 codes from the original (published) algorithm into ICD-10, we combined the 'highly likely' and the 'likely' codes from the original algorithm for chronic pain [18]. Third, for consistency, we modified algorithms that defined conditions as present if participants had two codes within any duration of follow-up (no matter how long) to require that the two codes occur within a three year period [19,20]. Fourth, we expanded the criteria for presence of atrial fibrillation, epilepsy, irritable bowel syndrome, and severe constipation to include two outpatient codes within two years for these conditions. However, to ensure that secondary (post-surgical) bowel complications were not incorrectly classified as chronic bowel conditions, we excluded any hospitalization for surgery when assessing the presence or absence of these conditions [21]. Fifth, we expanded the criteria for presence of stroke or TIA to include one outpatient code. Sixth, we reviewed all algorithms for overlapping codes (situations where the same code was  used to identify more than one condition), and modified the algorithms to avoid double-counting of morbidities (see footnotes in the Table 1 for specific details).

Application of the algorithms to the Edmonton cohort
The study cohort included 574,409 participants ( Figure 1, Table 2). Almost two-thirds were less than 50 years of age. Ten percent were 70 years of age or older and the proportion of men and women was similar. Approximately half of all participants were not identified as having any of the 30 morbidities for which high or moderate validity algorithms existed. Approximately one quarter were identified as having one of these 30 morbidities. Another quarter were identified as having 2 or more of these 30 morbidities (meeting the primary criterion for multimorbidity), whereas 12% had three or more (meeting the secondary criterion for multimorbidity).
The apparent prevalence of most morbidities was greatly reduced (often by 50% or more) when we assessed their presence or absence using hospitalization data only ( Table 2). The addition of ACCS data to hospitalization and claims data made little difference to prevalence estimates, with the possible exceptions of chronic pain, hepatitis B and cirrhosis (the prevalences of which all changed by >20%). In most cases, the algorithms as originally validated resulted in a prevalence that was intermediate between the most inclusive approach (using hospitalization, claims and ACCS data) and the most restrictive approach (using hospitalization data only). As expected, adding gold standard laboratory data for kidney function (eGFR and albuminuria) resulted in substantial increases in the apparent prevalence of chronic kidney disease as compared to administrative data alone, regardless of which administrative data sources were used. Figure 2 depicts the percentage of participants with multimorbidity by age group. After hypertension (a prevalence of 23%), 21% had chronic kidney disease, 9% had diabetes, and 9% had depression.

Discussion
From a published list of chronic conditions [3], we identified a total of 30 validated algorithms including 3 algorithms for different types of cancer and 2 algorithms for liver disease. We applied the algorithms to ICD codes from claims and utilization data, and identified the presence or absence of these conditions in a cohort of 574,409 adults residing in Edmonton, Alberta between April 2008 and March 2009 (Figure 1). The overall prevalence of multimorbidity in this cohort was 26%, which is similar to the prevalence as reported in the Barnett study [3]. Our findings demonstrate proof of concept for using administrative data as a surveillance tool for multimorbidity in settings with systems for reliably capturing population-based claims and utilization data.
Multiple prior studies have ascertained the presence of various chronic conditions in the context of assessing multimorbidity [4,7,[22][23][24][25][26][27][28][29][30][31][32][33][34]. Although there is no universally accepted definition of multimorbidity (or a list of conditions that should be used to assess the presence of multimorbidity) there appears to be consensus on several issues. First, health conditions used to define multimorbidity should be chronic but not necessarily permanent. Second, two or more concomitant conditions should be required to identify a person as having multimorbidity. Third, an attempt should be made to standardize definitions across studies to facilitate comparisons between populations [7,9,34,35]. At the same time, it is important that algorithms selected for use with administrative data should be validated against a gold standardand demonstrate acceptable diagnostic properties so as to ensure reasonably accurate classification of individuals with respect to morbidity status. We focused on validated algorithms with positive predictive value and sensitivity ≥70%, compared to an acceptable gold standard such as chart review. Because we had access to laboratory data allowing a gold standard assessment of kidney function, we primarily assessed the presence of chronic kidney disease using estimated glomerular filtration rate (eGFR) and albuminuria rather than administrative data.
To our knowledge, this is the most comprehensive panel of validated algorithms yet applied to administrative data for the study of multimorbidity. Other studies have used reasonable but unvalidated algorithms, a more limited list of candidate chronic conditions or both. Although there are undoubtedly other chronic conditions that could be identified using administrative data, we focused on those for which available algorithms appear to have adequate sensitivity as well as positive predictive value. We will use the set of algorithms described herein as the foundation for a series of studies describing the epidemiology of multimorbidity in Alberta, Canada.
Besides the various definitions of multimorbidity that they have used, existing studies in this area have several other limitations [36]. First, population-based studies are rare (especially in Canadian settings); most studies have captured patients followed by a particular centre and are vulnerable to referral bias. Second, most studies have been unable to assess the link between multimorbidity and clinical outcomes in subgroups defined by age, sex, or low socioeconomic status. Third, little is known about the relative frequency of individual chronic conditions within the multimorbidity syndromeor about which clusters of conditions are most common and/or clinically significant. Fourth, studies examining the economic consequences of multimorbidity have typically used relatively unsophisticated methods and/or studied only select populations. The scheme outlined in the current manuscript will allow our group to do future studies that close  1 Values in brackets include laboratory data (estimated glomerular filtration rate and albuminuria) as gold standard measures of kidney function these knowledge gapsinforming policy and practice. We are optimistic that the scheme will also be used by other researchers from other jurisdictions with similar datasetsfacilitating comparisons between studies. Future studies should test the relative importance of the morbidities identified in the current manuscript, as well as considering other potentially important morbidities for inclusion. Limitations of the current approach include those common to all studies using administrative data. For example, we do not have information on potential confounders related to lifestyle (e.g., diet, smoking, exercise) or on measured blood pressure, which may be confounders when examining the association between multimorbidity and outcomes or costs. However, this limitation would not be expected to affect feasibility of applying the algorithms or the prevalence estimates reported here. Second, identification of some of the chronic conditions we studied might have been enhanced by simultaneous consideration of medication data [3]. We decided against including publicly funded medication data to define these conditions because medication coverage in Alberta is limited to people aged ≥65 years; those of lower SES; or with high annual medication costs. Thus, using medication data to define conditions would have biased towards a higher incidence of multimorbidity in older, poorer and sicker participants. We decided against restricting the cohort to people aged ≥65 years, because multimorbidity is relatively common in younger participantsand there might be important differences in the nature and implications of multimorbidity by age. Therefore, we will include all adult Albertans in our forthcoming analyses. Third, since participants must use medical services to be diagnosed with chronic conditions, our findings underestimate the true population burden of multimorbidityespecially for conditions that are less likely to lead to hospitalization but which may still significantly impact quality of life and other important outcomes. Fourth, we did not identify appropriate algorithms for all of the 40 target conditions, possibly because our searches were not exhaustive, and other important conditions such as obesity were not considered in this study. Therefore, our results likely underestimate the true prevalence of multimorbidity. Finally, although we focused on validated algorithms, the diagnostic performance of algorithms may vary between settings, based on coding practices and the reliability of data captureand we did not systematically evaluate the quality of the original studies. Therefore (despite the lack of an a priori reason to suspect worse performance in our dataset), it is possible that some algorithms that were of high or moderate validity in other jurisdictions may perform less well when applied to Alberta data, especially with the modifications as described herein.

Conclusions
In summary, we identified a panel of 30 chronic conditions that can be identified from administrative data using validated algorithms, facilitating the study and surveillance of multimorbidity. We encourage other groups to use this scheme, to facilitate comparisons of data on multimorbidity between settings and jurisdictions.

Additional file
Additional file 1: Table S1. Validated algorithms for the original 40 morbidities.

Competing interests
There are no competing interests. This study is based in part by data provided by Alberta Health and Alberta Health Services. The interpretation and conclusions are those of the researchers and do not represent the views of the Government of Alberta. None of the study sponsors had a role in the study design; collection, analysis, and interpretation of data; writing the report; or in the decision to submit the report for publication.
Authors' contributions MT conceived the study. MT, NW and HQ designed and drafted the manuscript. NW performed the statistical analyses. All authors have made substantial contributions to the development of the manuscript, and have all been involved in revising it for important intellectual content and approved the final version.