Study subjects
Our data were derived from the Taiwanese Nationwide Colorectal Cancer Screening Program, which used FIT as the screening tool. Details on the planning and implementation of the screening program have been described in full elsewhere [2, 3]. In brief, the nationwide screening program launched in 2004 provided a biennial FIT to all residents in Taiwan aged between 50 and 69 years. The target population consisted of a group of 5,417,699 subjects with a staggered entry into the program with the goal of a 20% coverage rate during the initial 5 years. During the study period, the program included 1,160,895 participants and achieved a coverage rate of 21.4% and a repeated screening rate of 28.3%. The f-Hb of each participant was measured using two brands of commercial kits (discussed below). Patients with positive results were referred for confirmatory diagnosis via colonoscopy as the major method. Individual information, including age, sex, family history of CRC, and brand of FIT test used, was obtained via questionnaire, and the outcomes regarding colorectal neoplasms were derived from the reports of the confirmatory diagnosis and cancer registry. The histopathology of colorectal neoplasms was classified according to the criteria of the World Health Organization [18].
Colorectal adenoma is categorized into non-advanced adenoma and advanced adenoma based on size and histological types, villous or dysplasia condition. If colorectal adenoma is larger than 10 mm in diameter or have a villous component or high-grade dysplasia, it is classified as advanced adenoma and otherwise non-advanced adenoma.
Participants with missing or unidentifiable FIT values or those for whom an unspecified method was used for the measurement of f-Hb were excluded from our analyses. Because we were interested in the f-Hb concentration just before disease diagnosis, we excluded CRC patients who had positive FIT results but were not compliant with orders for a colonoscopy or who did not participate in the repeated FIT screening after a negative colonoscopy.
Bayesian quantile-based f-Hb for predicting the risk of colorectal neoplasia
To take into account the ordinal feature of f-Hb measured by each FIT test as mentioned earlier, we proposed the novel survival methodology with accelerated failure time (AFT) model on quantile-based f-Hb rather than interval-scaled f-Hb. To relive the concern over incident risk prediction excluding disease at baseline, we applied Bayesian inversion method to estimate incident baseline risk for colorectal neoplasia (prior) and to derive the posterior risk prediction for colorectal neoplasia by combining information on the percentiles of f-Hb by the disease status of colorectal neoplasia (likelihood). That means quantile-based f-Hb AFT model included colorectal neoplasia at baseline, but for predicting future risk with the Bayesian inversion method these cases at baseline were excluded.
In details, the proposed method therefore consists of two steps. First, using the concept of survival analysis, we ranked the value of f-Hb from the lowest to the highest to estimate the median and other percentiles of f-Hb corresponding to non-advanced adenoma, advanced adenoma, and invasive colorectal cancer by treating the value of f-Hb as survival time. The cumulative curve (the complementary survival curve) for the median and other quantiles of f-Hb were therefore plotted by reporting each 10th percentile values to reach different disease statuses of colorectal neoplasms. The adjusted median and other percentiles of f-Hb were estimated by using parametric survival model (see below) making allowance for age, gender, family history, and brands of FIT. In the second step, we then applied the Bayesian inversion method to derive posterior risk prediction for colorectal neoplasia based on two parts, baseline risk for colorectal neoplasm without using information on f-Hb but making allowance for other factors estimated by using Poisson regression model that is only based on incident cases by excluding colorectal neoplasia at baseline, and likelihood function using information with percentiles of f-Hb given the disease status of colorectal neoplasia derived from the first step.
Data collection
Measurement of f-Hb
Patients’ f-Hb measurements were made using two brands of commercially available kits: the OC-Sensor (Eiken Chemical Co., Tokyo, Japan) and the HM-Jack (Kyowa Medex Co., Tokyo, Japan). With these 1-day methods, a single fecal sample was collected at home by each participant and then was sent to certified laboratories within 7 days. Quantitative FIT testing was performed at approximately 125 qualified laboratories nationwide. The cutoffs for the two kits were 100 ng/mL for the OC-Sensor and 8 ng/mL for the HM-Jack; the cutoff concentration in buffer for both tests could be transformed to a standardized reporting unit of 20 μg/g of feces [19]. The details have been previously reported [2].
Information obtained from questionnaires
Screening participants were asked to complete a questionnaire that was administered by face-to-face inquiry by the staff of the public health centers. The questionnaire solicited individual information about age, sex, and family history of CRC, which could be treated as confounding factors in the subsequent multivariable analysis.
Confirmatory diagnosis
Patients who were screened and had positive FIT results were referred to receive a confirmatory diagnosis, mainly on the basis of a total colonoscopy or a sigmoidoscopy plus barium enema. Detailed confirmatory results, including size, location, and histopathology of colonic neoplasms, were recorded. Subjects who had negative FIT results were invited to participate in the next screening round.
Statistical analysis
We first applied the conventional nonparametric method, the Kaplan-Meier method, to determine if there were differences between the presence or absence of colorectal neoplasms associated with the median value of f-Hb, after which we derived the cumulative distribution curve of percentile-based f-Hb by different disease statuses.
To adjust the covariates of interest, we applied the accelerated failure time (AFT) regression model by treating f-Hb as the time to event and disease status as the independent variable, with adjustments for age, sex, family history of CRC, and the brand of FIT used. We chose the Weibull distribution to fit our data. The most important reason we used the AFT model is that we want to estimate the f-Hb in every 10th percentile with the adjustment of covariates by different disease statuses. This information would be informative for clinical applications. It should be noted that as this is a periodical screening program with biennial FIT tests, f-Hb concentration used for analysis would be screen-round (time)-dependent. Namely, f-Hb at first screen may be different from that at second screen in the same individual. All repeated screening histories on f-Hb were all included in analysis. That means if the first screen is the negative FIT results its f-Hb value belongs to the normal group. If the second screen is detected as positive FIT and conformed as colorectal adenoma the f-Hb at second screen belongs to colorectal adenoma. The correlation of such multiple and repeated measurements on f-Hb and the corresponding disease status has been also accommodated in our AFT survival model.
Note that in terms of time ratio (TR) of being the value of f-Hb, an inverse relationship could be shown between the occurrence of colorectal neoplasms and f-Hb concentration. This means that although the rank of f-Hb was analogous to the rank of survival time, the TR, on average, would be the highest in the normal group, followed by the nonadvanced adenoma group, then the advanced adenoma, and the lowest in the CRC group. According to our intuitive hypothesis based on survival analyses, the higher the f-Hb, the lower the TR, but also the higher the risk for developing colorectal neoplasm (Fig. 1), we used the negative value of the coefficients estimated by the AFT model. Also, in order to solve the problem of undetected f-Hb, we added 0.5 units of each observation.
To predict life-time risk for colorectal neoplasia using the results of AFT model regarding the ranking of f-Hb as survival time as the likelihood, we applied Bayesian inversion method to combine this likelihood information on the percentile of f-Hb with prior information on the incidence of colorectal neoplasia to get posterior risk for colorectal neoplasia (the detailed elaboration is given in Additional file 1: Appendix).
Note that interval cancer patients (defined as invasive cancers diagnosed after a negative FIT and less than 2 years to the next screen) did not have information about f-Hb when diagnosed with cancer and turned out to be the censored data on f-Hb. To deal with the missing data, we calibrated the f-Hb of interval cancer cases from random samples of the prevalent and subsequent CRCs detected by screening matched with their corresponding sex and age at first screen using the cold-deck imputation method [20]. The reason of using imputation method for estimating the value of FIT for interval cancer is based on the two premises. The first is that the biological definition of interval cancer here is pursuant to the pathway of adenoma-carcinoma leading to the bleeding phenotype of interval cancer. Those interval cancers may be missed in the previous screen due to the undetectable bleeding phenotype stage and assume these undetectable bleeding phenotype interval cancers would grow up during inter-screening interval to become symptomatic bleeding phenotype as similar as the asymptomatic bleeding phenotype detected in the screen. The second is that as there are two components of interval cancer, the missed cases at prevalent screen (false negative cases) and the rapid progression of newly diagnosed cases after negative screen, the property of f-Hb for former may be therefore estimated from prevalent screen-detected cancers and that of f-Hb for the latter may be estimated from subsequent screen-detected cancers provided age (representing the maturation of tumour) and gender has been matched as we did here because both age and gender are two important factors in relation to time of onset and subsequent progression, and the sensitivity of FIT test.