Study population and data collection
Prostate Cancer data Base Sweden (PCBaSe Sweden) consists of the National Prostate Cancer Register of Sweden (NPCR) linked to a number of different nationwide registers [6]. NPCR became nationwide in 1998 and covers 98 % of all newly diagnosed, biopsy-confirmed cases of prostate cancer, as compared to the Swedish Cancer Registry. Information in PCBaSe on age, serum PSA, primary treatment, tumour grade and stage, and cause and date of death was used. Prostate cancer risk category was defined according to a modification of the National Comprehensive Cancer Network Guideline [7]. The linkage of PCBaSe was approved by the Research Ethics Board at Umeå University.
For the current analysis, we selected men recorded in NPCR with low risk prostate cancer, diagnosed between 2003 and 2012, who received active surveillance (AS), radical prostatectomy (RP), or curative radiotherapy (RT) as primary treatment. Comorbidity was measured with the CCI and was retrieved from the National Patient Register and the National Cancer Register [4, 5]. We used 17 groups of diseases with specific weights (1, 2, 3, or 6) assigned to each disease category, as defined by Charlson et al. [8]. Information on these diseases was based on ICD (International Classification of Diseases) codes for discharge diagnoses. The actual day for each event was retrieved from the national healthcare registers. We then applied the specified weights to the 17 different types of events to calculate the CCI on a daily basis. Thus, CCI is a time-dependent covariate that could change multiple times during follow-up. An overview of all covariates used in our analyses is provided in Additional file 1: Tables S1 and S2.
Data cannot is not freely available following the legislation of the Swedish Public Access to Information and Secrecy Act. However, data can be made available to researchers upon request. The steering groups of NPCR and PCBaSe welcome external collaborations. For more information please see www.npcr.se/in-english where registration forms, manuals, and annual reports from NPCR are found as well as a full list of publications from PCBaSe.
Analysis
We propose a method based on a state transition model approach with states and state transitions, as illustrated in Fig. 1. CCI changes are considered irreversible, i.e., CCI accumulates over time and cannot decrease as indicated by the arrows only pointing towards higher CCI states in Fig. 1. In each CCI-state there is a possibility of death, indicated by the arrows pointing towards the death state. Due to the large number of states and transitions, the proposed model was simplified as described below. An overview of the R-codes used for these models is provided in Additional file 1: Table S3.
Firstly, we simplified follow-up time by discretizing in time steps whereby an individual could experience death, a change in CCI of any size, or remain in the previous CCI state at each time step. In our prostate cancer example, we chose a time step of four weeks. The discretised data was arranged using long format, i.e., each study subject was represented by several rows of data, one for each time step in which the study subject was still alive. Age and CCI were updated at each time step.
Next, we estimated the state transition probabilities in three-step process:
-
Step 1: We started by determining a person’s vital status at the end of each time step. The probability of death was modelled with a logistic regression analysis applied to the long format dataset. This first model used death (yes/no) as the outcome and age (linear), CCI (linear), and their statistical interaction as regressors.
-
Step 2: When a man was alive at the end of a time step, we determined whether a change in CCI had occurred. This was modelled with a logistic regression model using CCI-change (yes/no) as the outcome. The regressors in this second model were treatment (AS/RP/RT), age (linear), CCI (0/1/2/3/4+), time since previous CCI change (1, 2–3, 4–6, or >6 months, with the latter also including no previous CCI change), time since RP (1, 2–3, 4–6, or >6 months, with the latter also including no RP), time since RT (1, 2–3, 4–6, or >6 months, with the latter also including no RT), and a statistical interaction between age and treatment.
-
Step 3: When a change in CCI occurred, the final step defined the size of this CCI change. Here, we made a second simplification based on the observation that changes in CCI size approximately follow a Poisson distribution, with the exception for changes ≥ 6 (Fig. 2). This exception reflects diseases that contribute to CCI with a weight of 6. Therefore, this final third step of the transition model was split into two parts:
-
3a)
First, we applied Poisson regression to the subset of the long dataset where CCI changes occurred. Transformed CCI change was the outcome and calculated as follows: All changes were decreased by 1, except changes of size ≥6, which were decreased by 6. Thus, the smallest possible outcome was zero, corresponding to a CCI-change of 1 or 6. The same regressors as in the model of Step 2 were used.
-
3b)
To handle CCI changes ≥6, an additional logistic regression model was applied with CCI change ≥” (yes/no) as the outcome. In this model, we used the following regressors: treatment (AS/RP/RT), CCI (0/1/2+ together with a linear term), time since previous CCI change (as in Step 2), and age (linear).
The above steps yielded a set of parameter estimates which were used to simulate CCI in a microsimulation [9], i.e., a simulation of CCI changes in individual study subjects. In this simulation, outcomes according to models 1, 2, 3a, and 3b were generated. The dichotomous outcome from Step 3b indicated whether the simulated Poisson outcome from Step 3a should be increased by 1 or 6 to recover the actual CCI-change. We performed this simulation of CCI development in 1,000,000 study subjects with pre-defined values for primary treatment, initial CCI, and age.
The modelling-simulation approach [9] made it possible to calculate confidence intervals for the predicted CCI at specific time points, and in particular differences in CCI between exposure groups. The confidence intervals were computed using the unscented transform [10]. This method resembles bootstrap techniques in that the final estimate varies as a result of repeated simulations using different values of the regression coefficients. The different estimates were then used to calculate the confidence intervals. However, the unscented transform is more efficient since the number of simulations needed is limited to about twice the number of regression coefficients, a much smaller number when compared to traditional bootstrap techniques. In accordance with the unscented transform, the combinations of regression coefficients were chosen deterministically, based on the estimated covariance matrices.