Skip to main content

How to measure temporal changes in care pathways for chronic diseases using health care registry data



Disease trajectories for chronic diseases can span over several decades, with several time-dependent factors affecting treatment decisions. Thus, there is a need for long-term predictions of disease trajectories to inform patients and healthcare professionals on the long-term outcomes and provide information on the need of future health care. Here, we propose a state transition model to describe and predict disease trajectories up to 25 years after diagnosis in men with prostate cancer (PCa), as a proof of principle.


States, state transitions, and transition probabilities were identified and estimated in Prostate Cancer data Base of Sweden (PCBaSeTraject), using nationwide population-based data from 118,743 men diagnosed with PCa. A state transition model in discrete time steps (i.e., 4 weeks) was developed and applied to capture all possible transitions (PCBaSeSim). Transition probabilities were estimated for changes in both treatment and comorbidity. These models combined yielded parameter estimates to run an individual-level simulation based on the state-transition model to obtain prediction estimates. Predicted estimates were then compared to real world data in PCBaSeTraject.


PCBaSeSim estimates for the cumulative incidence of first and second transitions, death from PCa and death from other causes were compared to observed transitions in PCBaSeTraject. A good agreement was found between simulated and observed estimates.


We developed a reliable and accurate simulation tool, PCBaSeSim that provides information on disease trajectories for subjects with a chronic disease on an individual and population-based level.

Peer Review reports


Many previously fatal diseases have become chronic diseases due to improvements in diagnosis and treatment [1]. These disease pathways (referred to as disease trajectories below) can span over several decades and incorporate a variety of treatments and outcomes. Moreover, ageing and increasing comorbidity will affect treatment decisions. Thus, there is a need to predict disease trajectories for several decades to inform patients and healthcare professionals on the long-term outcomes of disease and provide information on the need of future health care [2].

There is a wide range in the clinical course of prostate cancer (PCa) [3]. Due to improvements in detection, diagnostics, and treatment, many men currently diagnosed with PCa have a low risk of PCa death during the first 15 years after diagnosis and hence for many men PCa has become a chronic disease, albeit with a small risk of progression many years after date of diagnosis [4].

This study aimed to create a model to describe and predict disease trajectories over a period of 25 years using PCa as a proof of principle for a chronic disease. More specifically, we used data on PCa from nationwide, population-based health care registers in Sweden to predict disease trajectories in men with different disease severity, comorbidities, and treatments using state transition models.


Study population

This study was based on 118,743 men diagnosed with PCa from 1992 until 2014 who were registered in The National Prostate Cancer Register (NPCR) Sweden [5]. Men with unknown disease severity at time of diagnosis (3%) were excluded [6]. By use of the unique Swedish personal identity number, NPCR has been linked to other population-based healthcare registers and demographic databases to form the Prostate Cancer data Base Sweden (PCBaSeTraject). This provided additional information on cause of death, comorbidities, drug use, and socio-economic status, and treatment changes [7, 8]. Comorbidities were specifically measured with the Charlson comorbidity index (CCI), which was based on discharge diagnoses retrieved from the National Patient Register and the National Cancer Register [9]. The linkage of PCBaSe was approved by the Research Ethics Board at Umeå University.

The men were divided into five groups based on the primary management strategy: active surveillance (AS), radical prostatectomy (RP), radiotherapy (RT) performed as either external beam radio therapy or brachy therapy, watchful waiting (WW), and androgen deprivation therapy (ADT). Different risk categories were defined for each management strategy. For AS risk categories, we applied a modification of the National Comprehensive Cancer Network guidelines [10]. The two lowest NCCN risk categories (low and intermediate-risk PCa) were divided into two subgroups each. The least favourable intermediate-risk category was not considered as suitable for AS, resulting in three AS risk categories (Additional file 2: Table S1). Curative treatment consisted of RP or RT, which could be either primary treatment or initiated after a period of AS, i.e. deferred RP or RT. Primary and deferred RPs were divided into six risk categories based on pathological tumour stage (pT stage) and pathological Gleason Grade Group (pGGG) as recorded in NPCR (Additional file 2: Table S2). Primary and deferred RT was divided into eight risk categories (Additional file 2: Table S3). WW as primary management strategy was classified into six risk categories, based on T stage, N stage, M stage, serum levels of prostate-specific antigen (PSA), and Gleason Grade Group (GGG) (Additional file 2: Table S4).

Men receiving ADT were divided into eight risk categories (Additional file 2: Table S5). ADT consisted of either antiandrogen monotherapy (AA) or gonadotropin-releasing hormone (GnRH) agonists (Additional file 2: Figure S1). For all treatment-specific risk categories, the lowest category indicated men with the most favourable risk.

Missing data

There is no specific follow-up data in NPCR. For example, men primarily managed with AS who eventually experience disease progression can be treated with curative intention with RP or RT. At this point, the updated TNM stage, GGG and PSA have not routinely been reported to NPCR. This yields a missing data problem. We solved this problem by applying multiple imputation by chained equations [11] using five imputation datasets. Similar problems with lack of detailed disease progression data also appeared for other non-primary treatments, where change was triggered by disease progression and these situations were also solved by multiple imputation. Further details regarding imputation models are presented in the Additional file 1.


We applied a method based on a state transition model (Fig. 1). Men diagnosed with PCa entered their primary state according to their primary treatment (AS, RP, RT, WW, ADT) and their treatment specific risk category (AS1-AS3, RP1-RP6, RT1-RT8, WW1-WW6, and ADT1-ADT8). The probability of a transition to another state was based on age, comorbidity, history of previous treatments, and treatment-specific risk category. All transitions were considered irreversible and state transitions were allowed until an absorbing state, i.e. PCa-death or death from other causes, was reached. Transitions between treatment-specific risk categories were not considered, i.e. each man stayed in his designated treatment risk category until a new treatment was introduced. Our model building consisted of several steps:

Fig. 1
figure 1

States and state transitions in Prostate Cancer data Base of SwedenSim (PCBaSeSim). State transition model of transitions (arrows) between states (circles) for men diagnosed with prostate cancer. The states are active surveillance (AS), watchful waiting (WW), curative treatment; radical prostatectomy (RP) or radiotherapy (RT), adjuvant or salvage radiotherapy following RP (RT-adj/salv), androgen deprivation therapy (ADT), death from other causes, and prostate cancer death. Dashed lines represent the choice of primary treatment following diagnosis (not part of the model), solid lines are transitions included in the models. Multi-colored circles represent transient states with colors indicating the proportion of men with increasing disease risk categories defined by data at date of diagnosis. Orange circles represent absorbing states. Dashed circles represent additional information gathered to facilitate estimates of transition probabilities, i.e. biopsy and Charlson Comorbidity Index (CCI). Risk categories are defined in detail in the Additional file 1

First, we randomly split our set of data in two equally large subsets, the first set was a training set that was used to estimate transition probabilities and the second set was used to assess internal validity of the simulation process.

Second, we simplified follow-up time by use of four-week time steps. At the end of each time step a man either remained in his current state or transited to a new state. The discretised follow-up data was arranged using long format, i.e., each man was represented by several rows of data, one for each time step in which he was alive. Age and CCI were updated in each time step.

Next, we estimated state transition probabilities, which were determined from both registered and non-registered data regarding date of treatment change. Specifically, for all treatment state transitions, except the transition from AS ➔ WW, the date of treatment change was retrieved from PCBaSeTraject using the training set of data. Left truncation on January 1, 2006, was used as previously described [8]. The non-registered transition date for the transition of AS ➔ WW was managed as previously described [12] (Additional file 2: Figures S2 and S3). We considered a change in CCI as a state transition. For such a CCI transition, we applied a CCI state transition model as previously described [9]. The probability of CCI changes was state-specific, i.e. different CCI model parameters were estimated for each state.

Finally, our prediction was carried out as a microsimulation, an individual-level simulation based on the state-transition model [13]. Simulations were performed for each specific combination of treatment, age and CCI at treatment start, and treatment-specific risk categories. The simulation process consisted of three different steps, which are described in brief below. Further details and specifications can be found in the Additional file 1.

  1. 1)

    A man’s vital status at the end of each time step was determined as alive, dead from PCa or dead from other causes. The probability of PCa death was modelled in a logistic regression, whereas the probability of death from other causes was modelled conditioned on no previous PCa death in a second logistic regression (see Additional file 1 for further details).

  2. 2)

    For a man who was alive at the end of a time step, we determined if a treatment change had occurred. Such change could be either direct, when the treatment risk category in the new state was known and remained unchanged, e.g. RP ➔ RTadj/salv, or preceded by an estimation of new treatment-specific risk category (Additional file 2: Table S6). In the latter case, an ordinal regression analysis determined the new risk category, at the same time step as the transition.

  3. 3)

    If a man was alive at the end of a time step, CCI was updated, according to a separate CCI transition model based on a combination of Poisson regression and logistic regression models, as described previously [9]. The CCI transition model was specific for each individual treatment.

We refer to this simulation method in the following text as PCBaSeSim. Further specifications of possible transitions and used models are provided in the Additional file 1. In order to assess internal validity of PCBaSeSim, we graphically compared cumulative incidence of subsequent transitions obtained from the simulation with those observed in validation set.


The study population consisted of 118,743 men diagnosed with PCa and included in NPCR. Most of these men (n = 32,537, 27%) were primarily managed with RP (Table 1). Detailed characteristics of men included in PCBaSeTraject were previously described [7, 8].

Table 1 Baseline characteristics for men in Prostate Cancer data Base Sweden (PCBaSe)Traject

Our proposed model starts at time of PCa diagnosis and ends in either of the following absorbing states: death from PCa or death from other causes. Figure 1 illustrates states and state transitions, with all the possible treatment changes included in the model.

To assess the internal validity of the simulation, the cumulative incidence of the first transitions observed in PCBaSeTraject was compared with the simulated first transitions for the same men up to 25 years after diagnosis. To decrease random error, we ran the simulation 100 times for each man. Observed and simulated first transitions almost perfectly overlapped for all groups (Fig. 2). At ten years after RP, 21% of the observed PCBaSeTraject cohort had transitioned to adjuvant/salvage radiotherapy vs. 20% as predicted by PCBaSeSim. Estimates at 20-years were still consistent for all the analysed primary management strategies (e.g., 5% of men treated with primary RP received GnRH agonists in both PCBaSeTraject and PCBaSeSim).

Fig. 2
figure 2

Cumulative incidence of first observed and simulated transition according to primary management strategy in men with prostate cancer in PCBaSe. Graphs show the cumulative incidence of first observed transition in PCBaSeTraject (continuous line) compared to the cumulative incidence of first simulated transition in PCBaSeSim (dashed lines). PCBaSeSim transitions are based on the simulation of men in PCBaSeTraject stratified by primary management strategy. AA: anti-androgens; GnRH: gonadotropin releasing hormone agonists; PC: prostate cancer; RP: radical prostatectomy; RT: radiotherapy; WW: watchful waiting

Similarly, we analysed second transitions. Men first treated with RP or RT, which are the most common treatment strategies overall for PCa, were followed until their second transition. Also in this model, the observed and simulated cumulative incidences were very similar, e.g. 10-year proportion of men treated with primary RT and later receiving GnRH agonists was 5% in both PCBaSeTraject and PCBaSeSim. At 15 years these proportions were 7% in PCBaSeTraject and 9% in PCBaSeSim, respectively (Fig. 3). Eventually, the agreement between predicted vs. observed cumulative incidence of death was acceptable for both PCa and other causes of death (Fig. 4).

Fig. 3
figure 3

Cumulative incidence of second observed and simulated transition according to primary management strategy in men diagnosed with prostate cancer and primarily treated with radical prostatectomy and radiotherapy. Graphs show the cumulative incidence of second observed transition in PCBaSeTraject (continuous line) compared to the cumulative incidence of second simulated transition in PCBaSeSim (dashed lines). PCBaSeSim transitions are based on the simulation of men in PCBaSeTraject primarily treated with radical prostatectomy and radiotherapy. AA: anti-androgens; GnRH: gonadotropin releasing hormone agonists; PC: prostate cancer; RP: radical prostatectomy; RT: radiotherapy

Fig. 4
figure 4

Prostate cancer death and death from other causes in PCBaSeTraject and PCBaSeSim. Graphs show the cumulative incidence of prostate cancer death and death from other causes (i.e. final absorbing states) in PCBaSeTraject (observed, continuous line) compared to PCBaSeSim (simulated, dashed lines). PCBaSeSim transitions are based on the simulation of men in PCBaSeTraject


We developed, applied, and tested a novel state transition model, based on longitudinal data in health care registers including men with PCa, to predict long term disease trajectories. More specifically, we tested the accuracy of first and second transitions (i.e. treatment changes), as well as the transitions to final absorbing states, by comparing observed data vs. data obtained by our newly developed tool, PCBaSeSim.

Treatment decisions are a challenging process in real life, as well as predictions of health-related outcomes. Due to the long life expectancy, e.g. > 20 years for healthy men with low-risk PCa [4], and the availability of many therapeutic options [3], PCa disease trajectories can thus be difficult to predict at time of diagnosis. These pathways include different treatment options and may be protracted over time. Although international guidelines recommend therapeutic options according to disease status, there are several combinations of disease states and life expectancy for which there is little evidence to support treatment decisions [3]. Moreover, recently introduced management strategies, e.g. active surveillance, lack long-termfollow-up, so there are no data on which to base outcome predictions. Previous attempts have been made to implement state-transition models in urology [14, 15] and other medical areas [16], mainly from the health-economy standpoint. However, these attempts were not based on a detailed and comprehensive, population-based source of data, such as PCBaSeTraject. Therefore, previous models have not been validated against real world data. In contrast, the state transition model we propose in this paper was compared with real world data, and provides as a user-friendly data output that makes both interpretation of results and replicability easy.

More specifically, these results provide a proof-of-principle for state transition models that can be applied to many other chronic diseases for which there is also a need for long term predictions of outcomes. Using complete nationwide, population-based data of ~ 120,000 Swedish men with PCa [8], we were able to develop a simulation programme, PCBaSeSim, which reliably models treatment changes up until 25 years after diagnosis. To assess the reliability of our state transition model, we compared real-worldfollow-up data in PCBaSeTraject with simulated data from PCBaSeSim. We found that observed and simulated transition profiles were overlapping and consistent, suggesting that PCBaSeSim estimates were accurate and reliable. Moreover, in PCBaSeSim, it is possible to set baseline parameter (e.g. number of subjects, initial state, subject- and disease-specific features, temporal duration of the simulation) prior to starting the simulation process. During the simulation, individual features are constantly updated (e.g. age increase, comorbidity changes [9]) and used for estimation of transition probabilities. Once a simulation is complete (i.e. end of virtual follow-up time), it is possible to retrieve every change in both treatment and individual characteristics (i.e. age, CCI, and risk group).

Our proposed state transition modelling approach consistently differs from conventional survival analyses and prediction tools, which focus on a single outcome over time, with little information regarding the actual path that leads to a specific outcome and the changes in the disease trajectory [17]. Conventional survival analyses inevitably result in a consistent loss of information and accuracy, especially when dealing with extended follow-up times. We have previously shown that it is possible to model temporal changes in comorbidity for cancer patients [9]. In PCBaSeSim, thanks to the continuous update of baseline characteristics at the end of every time step, not only are the predictions likely to be accurate, but also is it possible to keep track of individual and disease-specific features and trajectories. Although the difference in model fit between time-updated CCI and CCI only measured at baseline was small (data not shown), we used the time-updated CCI since this approach was crucial to solve the problem of non-registered AS ➔ WW transition [12]. Therefore, in order to be congruent with our previous publication [12] as well as with future studies on health economy, we support the use of time-updated CCI since increases in comorbidity are related to increased costs and therefore affect treatment decisions.

Moreover, PCBaSeSim makes it possible to model disease trajectories over a long time-horizon and can therefore make long-term predictions even in the absence of “observed” (i.e. registered) data. Eventually, the flexibility of the model allows to include novel modelling parameters and states.

The main limitation of our proposed model is the absence of an external validation. However, PCBaSeSim is ready to be tested in an external setting, and the authors welcome collaborators to validate our simulation program. It is worth noticing that in PCBaSeTraject, we had to use left truncation on January 1, 2006 since the Swedish Prescribed Drug Registry was initiated on July 1, 2005 [8]. The relatively low number of diagnosis during the early PCBaSeRapid period is caused by this left truncation. This could theoretically limit the predictive power of PCBaSeSim. On the other hand, we argue that the most recent data are the most valuable for prediction for current patients and that therefore it is advantageous that the impact of historically older diagnosis and treatments on predictions was limited by the use of this approach. Further limitations include the use of administrative data for the definition of comorbidities, though it has previously been shown that the accuracy of ICD codes for discharge diagnoses in the Patient Registry is high in the range of 85–95% [18]. Another limitation is represented by missing data related on some transitions/risk categories. However, we have previously [12] and in this current study provided reliable solutions to these issues.


We developed a reliable and accurate simulation tool, PCBaSeSim, to provide information to patients and healthcare professionals, on an individual as well as population-based level on treatment trajectories. Aside from clinical implications, our model provides information applicable to healthcare resources allocation. Since it is possible to keep track of specific disease trajectories including the mean time spent in each state, estimates for costs can be obtained by use of our model.

Availability of data and materials

Data used for the current study can be retrieved by contacting The steering groups of NPCR and PCBaSe welcome external collaborations. For more information please see where registration forms, manuals, and annual reports from NPCR are found as well as a full list of publications from PCBaSe.





Androgen deprivation therapy


Active surveillance


Charlson comorbidity index


Gleason grading group


Gonadotropin releasing hormone


National Prostate Cancer Register


Prostate cancer


Prostate Cancer database of Sweden


Radical prostatectomy




Watchful waiting


  1. Vaiserman A, Lushchak O. Implementation of longevity-promoting supplements and medications in public health practice: achievements, challenges and future perspectives. J Transl Med. 2017;15:160.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Lopreite M, Mauro M. The effects of population ageing on health care expenditure: a Bayesian VAR analysis using data from Italy. Health Policy. 2017;121:663–74.

    Article  PubMed  Google Scholar 

  3. Santis D, Henry A, Joniau S, Lam T, Matveev V, van der Poel H, et al. Prostate Cancer EAU ESTRO SIOG Guidelines. 2017.

    Google Scholar 

  4. Johansson J-E, Andren O, Andersson S-O, Dickman PW, Holmberg L, Magnuson A, et al. Natural history of early, localized Prostate Cancer. Jama. 2014;291:2713–9.

    Article  Google Scholar 

  5. NPCR. National Prostate Cancer Register of Sweden n.d. (Accessed 25 Aug 2018).

  6. Tomic K, Westerberg M, Robinson D, Garmo H, Stattin P. Proportion and characteristics of men with unknown risk category in the National Prostate Cancer Register of Sweden. Acta Oncol (Madr). 2016:1–6.

    Article  Google Scholar 

  7. Van Hemelrijck M, Wigertz A, Sandin F, Garmo H, Hellström K, Fransson P, et al. Cohort profile: the national prostate cancer register of Sweden and prostate cancer data base Sweden 2.0. Int J Epidemiol. 2013;42:956–67.

    Article  PubMed  Google Scholar 

  8. Van Hemelrijck M, Garmo H, Wigertz A, Nilsson P, Stattin P. Cohort profile update: the national prostate cancer register of Sweden and prostate cancer data base-a refined prostate cancer trajectory. Int J Epidemiol. 2016;45:73–82.

    Article  PubMed  Google Scholar 

  9. Lindhagen L, Van Hemelrijck M, Robinson D, Stattin P, Garmo H. How to model temporal changes in comorbidity for cancer patients using prospective cohort data. BMC Med Inform Decis Mak. 2015;15:96.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Mohler J, Bahnson RR, Boston B, Busby JE, D’Amico A, Eastham JA, et al. NCCN clinical practice guidelines in oncology: prostate cancer. J Natl Compr Cancer Netw. 2010;8:162–200.

    Article  CAS  Google Scholar 

  11. White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–99.

    Article  PubMed  Google Scholar 

  12. Van Hemelrijck M, Garmo H, Lindhagen L, Bratt O, Stattin P, Adolfsson J. Quantifying the transition from active surveillance to watchful waiting among men with very low-risk Prostate Cancer. Eur Urol. 2016:4–11.

    Article  Google Scholar 

  13. Siebert U, Alagoz O, Bayoumi AM, Jahn B, Owens DK, Cohen DJ, et al. State-transition modeling: a report of the ISPOR-SMDM modeling good research practices task force--3. Value Health J Int Soc Pharmacoeconomics Outcomes Res. 2012;15:812–20.

    Article  Google Scholar 

  14. Sanyal C, Aprikian AG, Cury FL, Chevalier S, Dragomir A. Management of localized and advanced prostate cancer in Canada: a lifetime cost and quality-adjusted life-year analysis. Cancer. 2016;122:1085–96.

    Article  PubMed  Google Scholar 

  15. Sanyal C, Aprikian A, Cury F, Chevalier S, Dragomir A. Clinical management and burden of prostate cancer: a Markov Monte Carlo model. PLoS One. 2014;9:e113432.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Zhang L, Lim CY, Maiti T, Li Y, Choi J, Bozoki A, et al. Analysis of conversion of Alzheimer’s disease using a multi-state Markov model. Stat Methods Med Res. 2018:962280218786525.

  17. Kim HT. Cumulative incidence in competing risks data and competing risks regression analysis. Clin Cancer Res. 2007;13:559–65.

    Article  PubMed  Google Scholar 

  18. Ludvigsson JF, Andersson E, Ekbom A, Feychting M, Kim J-L, Reuterwall C, et al. External review and validation of the Swedish national inpatient register. BMC Public Health. 2011;11:450.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


This project was made possible by the continuous work of the National Prostate Cancer Register of Sweden (NPCR Swe) steering group: Pär Stattin chairman, Anders Widmark, Lars Egevad, Magnus Törnblom, Stefan Carlsson, Jan Adolfsson, Anna Bill-Axelson, Jan-Erik Johanssson, Ove Andren, Mats Lambe, Erik Holmberg, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Maria Nygren, Ola Bratt, and Göran Ahlgren. We would also like to thank Börje Ljungberg and Staffan Jahnson for providing us with data from the National Kidney and Bladder Cancer Register, respectively.


Funding came from the Swedish Research Council 825–2008-5910, Stockholm Cancer Society, Cancer Research-UK, the Swedish Council for Working Life and Social Research, and Västerbotten County Council. Funding contributed to data collection and maintenance of the data source.

Author information

Authors and Affiliations



Conception and design: HG, LL, EV. Acquisition of data: HG, PS. Analysis and interpretation of data: HG, LL, EV. Drafting the manuscript: EV, HG, MVH, PS. Revising critically for important intellectual content: HG, MVH, PS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hans Garmo.

Ethics declarations

Ethics approval and consent to participate

The Research Ethics Board at Umeå University approved this study. A written consent to participate in the registration of NPCR as well as other quality registries is not legally mandated, however an opt-out procedure is applied so a person can reject to be registered.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Detailed specifications of the model, including states, transitions, and transition probabilities. (DOCX 45 kb)

Additional file 2:

Supplementary figures and tables. (DOCX 212 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ventimiglia, E., Van Hemelrijck, M., Lindhagen, L. et al. How to measure temporal changes in care pathways for chronic diseases using health care registry data. BMC Med Inform Decis Mak 19, 103 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: