- Research article
- Open Access
- Open Peer Review
Developing model-based algorithms to identify screening colonoscopies using administrative health databases
© Sewitch et al.; licensee BioMed Central Ltd. 2013
- Received: 10 August 2012
- Accepted: 5 April 2013
- Published: 10 April 2013
Algorithms to identify screening colonoscopies in administrative databases would be useful for monitoring colorectal cancer (CRC) screening uptake, tracking health resource utilization, and quality assurance. Previously developed algorithms based on expert opinion were insufficiently accurate. The purpose of this study was to develop and evaluate the accuracy of model-based algorithms to identify screening colonoscopies in health administrative databases.
Patients aged 50-75 were recruited from endoscopy units in Montreal, Quebec, and Calgary, Alberta. Physician billing records and hospitalization data were obtained for each patient from the provincial administrative health databases. Indication for colonoscopy was derived using Bayesian latent class analysis informed by endoscopist and patient questionnaire responses. Two modeling methods were used to fit the data, multivariate logistic regression and recursive partitioning. The accuracies of these models were assessed.
689 patients from Montreal and 541 from Calgary participated (January to March 2007). The latent class model identified 554 screening exams. Multivariate logistic regression predictions yielded an area under the curve of 0.786. Recursive partitioning using the latent outcome had sensitivity and specificity of 84.5% (95% CI: 81.5-87.5) and 63.3% (95% CI: 59.7-67.0), respectively.
Model-based algorithms using administrative data failed to identify screening colonoscopies with sufficient accuracy. Nevertheless, the approach of constructing a latent reference standard against which model-based algorithms were evaluated may be useful for validating administrative data in other contexts where there lacks a gold standard.
- Latent Class
- Fecal Occult Blood Test
- Latent Class Model
- Fecal Immunochemical Test
- Recursive Partitioning
Administrative records are frequently used in health research. While some diagnoses and procedures are recorded with reasonable accuracy [1, 2], others are prone to misclassification [3, 4]. Indications for medical procedures are particularly challenging to derive from administrative health data because of the lack of indication codes, and, therefore, require automated data algorithms [5–7]. Studies validating administrative data algorithms have typically used medical chart review as the gold standard [7–13]. However, information in medical charts may be inaccurate for reasons including the variable quality of record keeping and record extraction. Moreover, with Canadian average wait-times from referral to performance of any gastrointestinal procedure of 155 days and to screening colonoscopy of 201 days , the medical chart may not reflect symptoms at the time the colonoscopy is performed. In this study, we undertook the challenge of evaluating model-based algorithms in the absence of a gold standard measure for the indication of colonoscopy.
Colorectal cancer (CRC) screening is recommended worldwide for asymptomatic persons aged 50 to 75 [15–18]. Many industrialized countries have committed to or already implemented population-based CRC screening programs using modalities such as fecal occult blood test, fecal immunochemical test, flexible sigmoidoscopy, and colonoscopy . Colonoscopy is the only exam that enables visualization and removal of precancerous and cancerous lesions throughout the entire colon, and recent guidelines promote its role as a first-line screening modality [15, 16]. Colonoscopy utilization has increased dramatically owing to its use in CRC screening [20, 21]. In addition to its use in screening, colonoscopy is also performed for surveillance for bowel diseases, diagnostics for large bowel symptoms, and follow-up for positive results by other CRC screening modalities. A screening colonoscopy is defined as one performed in asymptomatic individuals for the early detection of CRC or the detection and removal of precancerous lesions .
Population screening initiatives are accompanied by increasing interest in undertaking large-scale colonoscopic screening studies using existing administrative health databases. However, most such databases either do not have a screening colonoscopy procedure code or the code is underused since the primary purpose is remuneration [21, 23]. Methods to distinguish screening and non-screening colonoscopies would enable monitoring of CRC screening uptake, tracking of health resource utilization, and estimation of cost-effectiveness; they may also be used for quality assessment as a key quality indicator is the adenoma detection rate in screening colonoscopies [22, 24, 25]. Automated data screening colonoscopy algorithms developed in previous studies had sensitivities ranging between 29% and 84% and specificities ranging between 58% and 93%; none of the algorithms had both high sensitivity and specificity [7, 12, 13], which led to the conclusion that administrative data cannot reliably be used to distinguish between colonoscopy indications [7, 26]. However, these prior studies relied on expert opinion regarding which diagnostic and procedural codes to use and in what order. In contrast, model-based algorithms use the data to help determine which variables to include and their weights, and thus have the potential to be more accurate.
A validated database algorithm would be convenient and efficient for researchers and administrators. The objectives of this study were to develop model-based algorithms to identify screening colonoscopies in health administrative databases, to evaluate their accuracy against a latent reference standard, and to compare their accuracies to that from a previously developed algorithm based on expert opinion.
A retrospective cohort study was conducted of endoscopists and a convenience sample of their patients about to undergo colonoscopy between January and March 2007 in two Canadian cities (Montreal, Quebec and Calgary, Alberta). Participating institutions were those where colonoscopy was performed and billed to the provincial health insurance plans (Montreal: McGill University Health Centre, Sir Mortimer B. Davis Jewish General Hospital, St. Mary’s Hospital Centre, Centre hospitalier de l’Université de Montréal, Fleury Hospital, Maisonneuve-Rosemont Hospital; Calgary: Foothills Medical Centre, Peter Lougheed Centre). At the time of the study, the provincial CRC screening program in Alberta relied on family physicians to offer fecal occult blood tests to patients aged 50 to 75 but no such program existed in Quebec, where ‘screening’ occurred opportunistically at the individual physician’s discretion. In both provinces, patients and/or family physicians could opt for colonoscopy as the initial screening exam.
The research assistant assessed endoscopists and patients for eligibility. Eligible endoscopists received remuneration for colonoscopy from the provincial health insurance board. Immediately after each colonoscopy, the endoscopist completed a questionnaire on the colonoscopy indication; the screening indication was defined as ‘performed in asymptomatic people at average-risk for developing colorectal cancer, or in people with a family history of colorectal cancer’. It is unknown whether the endoscopist based the indication on the colonoscopy referral, communication with the patient, or something else. Eligible patients were aged 50 to 75 years; those without provincial health insurance plan coverage in the prior year or unable to give consent were excluded. The research assistant approached patients prior to colonoscopy, explained the study, obtained consent, and administered the patient questionnaire. Patient perceived indication was defined in two ways: 1) non-screening if patient reported that the reason for colonoscopy was to follow-up for a previous test or problem, and is screening otherwise; and 2) non-screening if patient reported specific lower abdominal symptoms and personal history of gastrointestinal (GI) condition, and is screening otherwise. These patient indications, and their agreement with endoscopist indication, have been described in detail elsewhere .
We obtained provincial administrative health data on participating patients for the five years prior to the index colonoscopy as follows: Physician billing records from the Régie de l’assurance maladie du Québec (RAMQ) and Alberta Health and Wellness provided data on patient age and sex, all medical acts (RAMQ billing codes in Quebec, and Canadian Classification of Diagnostic, Therapeutic, and Surgical Procedures (CCP) codes in Alberta). The Maintenance et Exploitation des Données pour l’Étude de la Clientèle Hospitalière (MED-Echo) and the Canadian Institute for Health Information (CIHI) provided information on hospitalizations (The International Classification of Diseases ICD-9 and ICD-10 codes) and surgeries (ICD-9, CCP, and Canadian Classification of Health Interventions (CCI) codes). Data were linked using unique patient health insurance numbers. Prior to study inception, ethics approval was obtained from the Institutional Review Board in the Faculty of Medicine at McGill University and the local research ethics boards.
Since there is no gold standard method for identifying screening colonoscopies, a Bayesian latent class model for diagnostic testing was used to provide the probability that any given colonoscopy was for screening purposes , based on endoscopist indication and the two patient indications. Flat or non-informative prior distributions were used for the two patient indications, which were assumed to be conditionally independent. To examine the robustness of this assumption, a model including a dependence between these two indications was also fitted to the same data . For the endoscopist screening indication, a beta(10.67, 1.06) density, 97% of which covers the range from 70% to 100%, was used for both sensitivity and specificity. A beta(6, 7.6) density, 95% of which covers the range from 20-70% was used for the prevalence of screening. These priors were based on expert opinion, and covered the ranges of all plausible values with relatively flat density. Latent class modeling was carried out using WinBUGS software (MRC Biostatistics Unit, Cambridge).
The predicted probabilities for screening from the latent class model, based on posterior medians, were dichotomized into screening and non-screening using a cut-off of 50% probability. The dichotomized latent class indication was then used as the outcome variable for multivariate logistic regression and recursive partitioning. For comparison purposes, we also fitted models using endoscopist indication alone as the outcome. Predictor variables entered into the models were: age, sex, procedure codes for previous procedures (colonoscopy, polypectomy, sigmoidoscopy, and double contrast barium enema (DCBE)) in the past 4 years, diagnostic codes for risk factors (inflammatory bowel disease (IBD), colorectal polyp, and CRC) in the past 5 years, diagnostic codes for symptoms (rectal bleeding, anemia, diarrhea, vomiting, and weight loss) in the past year, diagnostic codes for hospitalization for large bowel diseases in the past 5 years, and procedure codes for large bowel surgeries in the past 5 years (Additional file 1). The variables selected were based on practice guidelines [15, 16, 18], published studies [7, 12, 13], and expert opinion.
The Bayesian information criterion (BIC) was used to select the multivariate logistic regression model that best predicts the screening indication . Model discrimination was assessed by the area under the curve (AUC) of the receiver operator characteristic curve . The accuracy of classification trees generated by the recursive partitioning model was assessed against the latent class predictions and endoscopist indication; sensitivities, specificities, positive and negative predicative values (PPV, NPV) were computed. Multivariate and recursive modeling were performed using R .
We also applied an algorithm based on expert-opinion previously developed by El-Serag et al., which defines screening colonoscopies as the absence of ICD-9 codes for 28 symptoms or conditions and no colonoscopy in the past 4 years . Sensitivities, specificities, PPVs, and NPVs were estimated by comparing the algorithm classification to latent class predictions and to endoscopist indication.
Patient characteristics (N = 1,230)
Age (mean, sd)
Patient reported symptomsa
Patient reported gastrointestinal conditionsb
Patient reported positive FOBTc in the past 12 months
Endoscopist indication = screening
Patient indication 1d = screening
Patient indication 2e = screening
Frequency of occurrence of diagnostic and procedure codes in provincial administrative databases (N = 1,230)
Diagnostic or procedure codes
Procedures in the past 4 years
Double contrast barium enema
Symptoms in the past year
Gastrointestinal conditions in the past 5 years
Inflammatory bowel disease
Hospitalizations in the past 5 years
Large bowel diseases
Large bowel surgery
The latent class model predicted 554 (45.0%) screening exams. The Kappa statistic for its agreement with endoscopist indication was 0.794 (95% CI: 0.760-0.828). Allowing conditional dependence between the two patient indications yielded virtually identical results (data not shown).
Odds ratio estimates for the multivariate logistic regression models selected by BIC a
Latent class indication
OR (95% CI)
Colonoscopy in the past 4 years
Sigmoidoscopy in the past 4 years
Polypectomy in the past 4 years
DCBEc in the past 4 years
Rectal bleeding in the past year
Diarrhea in the past year
Anemia in the past year
IBDd in the past 5 years
Accuracy measures for recursive partitioning and expert opinion algorithms
Recursive partitioning with latent class outcome
Latent class indication
Recursive partitioning with endoscopist outcome
Latent class indication
Expert opinion algorithm
The algorithm developed by El-Serag et al. was applied to our data. The algorithm identified 395 (32.1%) colonoscopies as screening. The sensitivity and specificity were 49.3% and 82.0%, respectively, compared to the latent class indication, and 49.3% and 83.0% compared to the endoscopist indication (Table 4).
We evaluated model-based algorithms in the absence of a gold standard measure of the outcome for determining the colonoscopy indication (screening vs. non-screening). We tackled this problem by constructing a latent reference standard and then using it to develop and evaluate logistic regression and recursive partitioning models of administrative data variables. Both modeling approaches have been used in previous studies to identify the indications for of medical procedures [6, 10]. The latent class predictions were quite accurate when the various tests all agreed on the indication, i.e. when all tests together indicated either positive or negative for screening. However, when one or more tests disagreed with the others, there was higher variability and less certainty about the inputs. Overall, the stability of the logistic regression model was very good, as evidenced by the robustness of our analyses (using a second latent class model that gave very similar predictions).
From multivariate logistic models, the AUC was 0.786 and 0.791 for the full models fitted with latent and endoscopist indications as the outcomes, respectively. The greater than 20% chance of mistakenly ranking a non-screening exam as more likely to be screening than a screening exam is not sufficiently accurate for most research purposes. The recursive models also did not achieve sufficient accuracy, despite their propensity to overfit data. Compared to the expert opinion algorithm, the recursive models had higher sensitivity but lower specificity; this occurred largely because the El-Serag algorithm defined screening as the absence of 28 diagnosis and colonoscopy procedure in the past 4 years, whereas recursive partitioning chose only the most discriminating variables. However, direct comparisons between the model-based and the El-Serag algorithms should be done with caution, as the models have the advantage of being validated on the same data upon which they were constructed.
Since we had two datasets available, the prudent approach would have been to construct the models using one dataset and validate them in another. However, our intention was to show that even under optimal circumstances – using all the data to generate the models and evaluating with the same dataset – model accuracies were less than satisfactory. Not pruning the classification tree, which tends to overfit the data in the recursive model [33, 34], also did not result in more accurate predictions.
Both the latent class and the endoscopist indications produced very similar results in most cases, since there was relatively high agreement between them. The latent class predictions were likely driven more by the endoscopist indication than patient indications, given the informative prior distribution used for the endoscopist indication, while uninformative priors were used for patient indications. The assumption that physicians know the true indication at least 70% of the time seems conservative.
The poor performance of algorithms, whether model-based or expert-opinion based, may be due to imperfect accuracies of administrative codes for the predictor variables we used  or to the variability in physician clinical and billing practices . The polypectomy procedure code in Quebec, for example, underestimates the number of polypectomies by 15% . Since the primary purpose of health administrative data collection is remuneration, diagnostic codes may be poorly recorded , leading to misclassification. Model selection in the multivariate logistic regression analysis retained all 4 procedure variables (colonoscopy, sigmoidoscopy, polypectomy, DCBE in the past 4 years) as important predictors of indication, while it retained only 4 of 8 diagnostic variables from physician billing data. Procedure codes may have been better predictors than some diagnostic codes because they were recorded with greater accuracy due to the need for remuneration . CRC diagnosis and colorectal polyps were not identified as useful predictors by either logistic regression model selection procedures or recursive partitioning, possibly due to overlapping information with other variables or poor accuracy. Hospitalizations for large bowel disease and large bowel surgeries were also not selected as important predictors, possibly due to the small numbers of patients whose records contained these codes.
Algorithms are typically needed to accurately identify cases of IBD because of problems with misclassification in administrative data [9, 36]. We did not use such an algorithm for IBD as the purpose was not to correctly identify IBD but to evaluate the utility of administrative codes (including those for IBD) in predicting colonoscopy indication. Despite the relatively low frequency of occurrence of IBD codes, the variable emerged as an important predictor in all models.
In conclusion, model-based screening colonoscopy algorithms for administrative databases were insufficiently accurate to be used for most research purposes. However the novel approach that we have employed, constructing a latent reference standard against which model-based algorithms were evaluated, may be useful for validating administrative data in other contexts where gold standards are unavailable.
This research was funded by the Canadian Cancer Society (grant no. 017054) through an operating grant awarded to Maida Sewitch, a Chercheur Boursier Junior 2 of the Fonds de recherché du Québec-Santé (FRQS).
- Wyse JM, Joseph L, Barkun AN, Sewitch MJ: Accuracy of administrative claims data for polypectomy. CMAJ. 2011, 183 (11): E743-E747.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee DS, Donovan L, Austin PC, Gong Y, Liu PP, Rouleau JL, Tu JV: Comparison of coding of heart failure and comorbidities in administrative and clinical data for use in outcomes research. Med Care. 2005, 43 (2): 182-188. 10.1097/00005650-200502000-00012.View ArticlePubMedGoogle Scholar
- Januel JM, Luthi JC, Quan H, Borst F, Taffe P, Ghali WA, Burnand B: Improved accuracy of co-morbidity coding over time after the introduction of ICD-10 administrative data. BMC Health Serv Res. 2011, 11: 194-10.1186/1472-6963-11-194.View ArticlePubMedPubMed CentralGoogle Scholar
- Wilchesky M, Tamblyn RM, Huang A: Validation of diagnostic codes within medical services claims. J Clin Epidemiol. 2004, 57 (2): 131-141. 10.1016/S0895-4356(03)00246-4.View ArticlePubMedGoogle Scholar
- Randolph WM, Mahnken JD, Goodwin JS, Freeman JL: Using Medicare data to estimate the prevalence of breast cancer screening in older women: comparison of different methods to identify screening mammograms. Health Serv Res. 2002, 37 (6): 1643-1657. 10.1111/1475-6773.10912.View ArticlePubMedPubMed CentralGoogle Scholar
- Gregory KD, Korst LM, Gornbein JA, Platt LD: Using administrative data to identify indications for elective primary cesarean delivery. Health Serv Res. 2002, 37 (5): 1387-1401. 10.1111/1475-6773.10762.View ArticlePubMedPubMed CentralGoogle Scholar
- Fisher DA, Grubber JM, Castor JM, Coffman CJ: Ascertainment of colonoscopy indication using administrative data. Dig Dis Sci. 2010, 55 (6): 1721-1725. 10.1007/s10620-010-1200-y.View ArticlePubMedGoogle Scholar
- Myers RP, Shaheen AA, Fong A, Wan AF, Swain MG, Hilsden RJ, Sutherland L, Quan H: Validation of coding algorithms for the identification of patients with primary biliary cirrhosis using administrative data. Can J Gastroenterol. 2010, 24 (3): 175-182.View ArticlePubMedPubMed CentralGoogle Scholar
- Benchimol EI, Guttmann A, Griffiths AM, Rabeneck L, Mack DR, Brill H, Howard J, Guan J, To T: Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: evidence from health administrative data. Gut. 2009, 58 (11): 1490-1497. 10.1136/gut.2009.188383.View ArticlePubMedGoogle Scholar
- Richardson P, Henderson L, Davila JA, Kramer JR, Fitton CP, Chen GJ, El-Serag HB: Surveillance for hepatocellular carcinoma: development and validation of an algorithm to classify tests in administrative and laboratory data. Dig Dis Sci. 2010, 55 (11): 3241-3251. 10.1007/s10620-010-1387-y.View ArticlePubMedGoogle Scholar
- Korst LM, Gregory KD, Gornbein JA: Elective primary caesarean delivery: accuracy of administrative data. Paediatric perinatal epidemiol. 2004, 18 (2): 112-119. 10.1111/j.1365-3016.2003.00540.x.View ArticleGoogle Scholar
- El-Serag HB, Petersen L, Hampel H, Richardson P, Cooper G: The use of screening colonoscopy for patients cared for by the Department of Veterans Affairs. Arch Intern Med. 2006, 166 (20): 2202-2208. 10.1001/archinte.166.20.2202.View ArticlePubMedGoogle Scholar
- Haque R, Chiu V, Mehta KR, Geiger AM: An automated data algorithm to distinguish screening and diagnostic colorectal cancer endoscopy exams. J Nat Cancer Inst Mono. 2005, 35: 116-118.View ArticleGoogle Scholar
- Leddin D, Bridges RJ, Morgan DG, Fallone C, Render C, Plourde V, Gray J, Switzer C, McHattie J, Singh H: Survey of access to gastroenterology in Canada: the SAGE wait times program. Can J Gastroenterol. 2010, 24 (1): 20-25.View ArticlePubMedPubMed CentralGoogle Scholar
- Levin B, Lieberman DA, McFarland B, Smith RA, Brooks D, Andrews KS, Dash C, Giardiello FM, Glick S, Levin TR: Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US Multi-Society Task Force on Colorectal Cancer, and the American College of Radiology. CA Cancer J Clin. 2008, 58 (3): 130-160. 10.3322/CA.2007.0018.View ArticlePubMedGoogle Scholar
- Rex DK, Johnson DA, Anderson JC, Schoenfeld PS, Burke CA, Inadomi JM: American College of Gastroenterology guidelines for colorectal cancer screening 2009 [corrected]. Am J Gastroenterol. 2009, 104 (3): 739-750. 10.1038/ajg.2009.104.View ArticlePubMedGoogle Scholar
- Segnan N, Patnick J, von Karsa L: European guidelines for quality assurance in colorectal cancer screening and diagnosis. 2010, Luxembourg: Publications Office of the European UnionGoogle Scholar
- Leddin DJ, Enns R, Hilsden R, Plourde V, Rabeneck L, Sadowski DC, Signh H: Canadian Association of Gastroenterology position statement on screening individuals at average risk for developing colorectal cancer: 2010. Can J Gastroenterol. 2010, 24 (12): 705-714.View ArticlePubMedPubMed CentralGoogle Scholar
- Inventory of Colorectal Cancer Screening Activities in ICSN Countries. [http://appliedresearch.cancer.gov/icsn/colorectal/screening.html (Accessed 4 April 2013)]
- Harewood GC, Lieberman DA: Colonoscopy practice patterns since introduction of medicare coverage for average-risk screening. Clin Gastroenterol Hepatol. 2004, 2 (1): 72-77. 10.1016/S1542-3565(03)00294-5.View ArticlePubMedGoogle Scholar
- White A, Vernon SW, Franzini L, Du XL: Racial and ethnic disparities in colorectal cancer screening persisted despite expansion of Medicare's screening reimbursement. Cancer Epidem Biomar. 2011, 20 (5): 811-817. 10.1158/1055-9965.EPI-09-0963.View ArticleGoogle Scholar
- Rex DK, Petrini JL, Baron TH, Chak A, Cohen J, Deal SE, Hoffman B, Jacobson BC, Mergener K, Petersen BT: Quality indicators for colonoscopy. Am J Gastroenterol. 2006, 101 (4): 873-885.View ArticlePubMedGoogle Scholar
- Baxter NN, Goldwasser MA, Paszat LF, Saskin R, Urbach DR, Rabeneck L: Association of colonoscopy and death from colorectal cancer. Ann Intern Med. 2009, 150 (1): 1-8. 10.7326/0003-4819-150-1-200901060-00306.View ArticlePubMedGoogle Scholar
- Fletcher RH, Nadel MR, Allen JI, Dominitz JA, Faigel DO, Johnson DA, Lane DS, Lieberman D, Pope JB, Potter MB: The quality of colonoscopy services–responsibilities of referring clinicians: a consensus statement of the Quality Assurance Task Group, National Colorectal Cancer Roundtable. J Gen Intern Med. 2010, 25 (11): 1230-1234. 10.1007/s11606-010-1446-2.View ArticlePubMedPubMed CentralGoogle Scholar
- Rex DK, Bond JH, Winawer S, Levin TR, Burt RW, Johnson DA, Kirk LM, Litlin S, Lieberman DA, Waye JD: Quality in the technical performance of colonoscopy and the continuous quality improvement process for colonoscopy: recommendations of the U.S. Multi-Society Task Force on Colorectal Cancer. Am J Gastroenterol. 2002, 97 (6): 1296-1308. 10.1111/j.1572-0241.2002.05812.x.View ArticlePubMedGoogle Scholar
- Lieberman D: Pitfalls of using administrative data for research. Dig Dis Sci. 2010, 55 (6): 1506-1508. 10.1007/s10620-010-1246-x.View ArticlePubMedGoogle Scholar
- Sewitch MJ, Stein D, Joseph L, Bitton A, Hilsden RJ, Rabeneck L, Paszat L, Tinmouth J, Cooper MA: Comparing patient and endoscopist perceptions of the colonoscopy indication. Can J Gastroenterol. 2010, 24 (11): 656-660.View ArticlePubMedPubMed CentralGoogle Scholar
- Joseph L, Gyorkos TW, Coupal L: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am J Epidemiol. 1995, 141 (3): 263-272.PubMedGoogle Scholar
- Dendukuri N, Joseph L: Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001, 57 (1): 158-167. 10.1111/j.0006-341X.2001.00158.x.View ArticlePubMedGoogle Scholar
- Kass RE, Raftery AE: Bayes Factors. J Am Stat Assoc. 1995, 90: 773-795. 10.1080/01621459.1995.10476572.View ArticleGoogle Scholar
- Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982, 143 (1): 29-36.View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A language and environment for statistical computing. 2005, Vienna, Austria: R Foundation for Statistical ComputingGoogle Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. 1984, Belmont CA: Wadsworth International GroupGoogle Scholar
- Strobl C, Malley J, Tutz G: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 2009, 14 (4): 323-348.View ArticlePubMedPubMed CentralGoogle Scholar
- Sewitch MJ, Hilsden R, Joseph L, Rabeneck L, Paszat L, Bitton A, Cooper M-A: Qualitative study of physician perspectives on classifying screening and non-screening colonoscopy using administrative health data: adding practice does not make perfect. Can J Gastroenterol. 2012, 26: 889-893.View ArticlePubMedPubMed CentralGoogle Scholar
- Bernstein CN, Blanchard JF, Rawsthorne P, Wajda A: Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study. Am J Epidemiol. 1999, 149 (10): 916-924. 10.1093/oxfordjournals.aje.a009735.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/13/45/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.