Bmc Medical Informatics and Decision Making Description and Validation of a Markov Model of Survival for Individuals Free of Cardiovascular Disease That Uses Framingham Risk Factors

Background: Estimation of cardiovascular disease risk is increasingly used to inform decisions on interventions, such as the use of antihypertensives and statins, or to communicate the risks of smoking. Crude 10-year cardiovascular disease risk risks may not give a realistic view of the likely impact of an intervention over a lifetime and will underestimate of the risks of smoking. A validated model of survival to act as a decision aid in the consultation may help to address these problems. This study aims to describe the development of such a model for use with people free of cardiovascular disease and evaluates its accuracy against data from a United Kingdom cohort.


Background
The evaluation of the risk of developing coronary heart disease (CHD) is increasingly used as a basis for making treatment decisions to prevent cardiovascular disease (CVD). Internationally, the most established measure is the Framingham risk [1] for developing CHD over a 10year period [2][3][4]. Such measures also have some value as a means of communicating risk to individuals in consultations. This facilitates patient participation in treatment decisions and can help inform advice about the risks of smoking. However, there are weaknesses to this approach. A common approach is to use a 15% 10-year risk of CHD as a threshold for using antihypertensive drugs in people with a systolic BP between 140 and 160 mmHg [3,4]. However, there are weaknesses to using simple absolute CHD risk without consideration of other factors [5]. Cardiovascular risk increases with age and so the elderly more often cross this treatment threshold. Most other causes of death also increase with age and life expectancy reduces with age. Given that the benefits of treatment are accrued over time, younger patients with a lower risk of CHD, but a greater life expectancy might have more to gain from treatment. Also when communicating the risks of smoking, the CHD risk is an underestimate of the true risks of smoking because of the wide variety of other pathologies that it causes.
The 10-year Framingham CHD risk for an individual only gives a crude idea of the likely impact of treatment or smoking cessation on an individual's life as it does not take into account the impact of competing causes of death, in particular other significant causes of mortality related to smoking. Other Markov models have been developed to assess the impact of cardiovascular disease mortality [6] or to evaluate the cost-effectiveness of treatment [5], but most do not take into account the relative risks of non-cardiovascular death for smokers compared to non-smokers [7][8][9][10], or model survival in populations rather than individuals [11]. Grover et al developed a model based on the Lipid Research Clinics program which makes some adjustment for the relative risks of smoking [12]. It has been validated against a number of intervention trials showing that its predictions of survival correlate highly with the observed survival. A Markov cycle tree evaluated using cohort simulation was developed to estimate survival over a lifetime. The model uses the Framingham equations for calculating CVD risks. The model also takes into account non-cardiovascular competing causes of death and models the changes in CHD risk factors with age. A Markov cycle tree structure was used because of the complex variety of pathways between the starting 'well' state and the absorptive 'dead' state [13].
The model presented in this paper takes into account competing causes of death, changes in risk factors with age and the relative risks of smoking on non-CVD mortality. It can estimate the probability of survival (between 0.0 and 1.0) at annual increments from the start age to the age of 85. Before such a model could be used in clinical practice to inform treatment decisions, it is important that some measure of its predictive accuracy is obtained.
This study uses data from the Whickham study [14,15] to assess the accuracy of the model at predicting survival in this cohort at twenty years. The original Whickham study was conducted between 1972 and 1974 in a mixed urban and rural area close to Newcastle upon Tyne. The cohort included 2779 adults aged over the age of 18 years, randomly identified to generate a sample that closely matched the United Kingdom (UK) population in terms of age, gender and social class. The original data set included blood pressure (BP), electrocardiogram (ECG) and serum total cholesterol (TC), thus allowing its use for Framingham equations. A 20-year follow-up study was also conducted which collected further data on the incidence of thyroid disorders and also collected data on morbidity and mortality.

Methods
Absolute, annual non-CVD risk of death were derived by linear interpolation from the UK National mortality statistics for 1998 [16]. The risks for those causes of death that were smoking related were adjusted for smokers and non smokers using the relative risks from the 4-year follow-up of the US Cancer Society's 50-state study (CPS-II) quoted in the US Surgeon General's report of 1989 (Table  1) [17].
These relative risks were used to adjust the death rates from smoking related causes of death using the formula below: The proportion of smokers at each age for each sex was taken from the Health Survey for England 1998 [18]. A similar formula was used by Pharoah in his life table modelling intervention with statins [10].
The model used the Framingham risks for CHD and stroke rather than the Framingham risk for CVD death as this accounted for the majority of CVD deaths and was easier to map to the UK mortality statistics. The annual risks for the various CVD risks included in the model were calculated by taking one quarter of the 4-year Framingham risk [1]. The 4-year Framingham risk was used as this is the shortest period calculable using the Framingham equation.
For the states 'Survived a myocardial infarction (MI)', 'Survived other CHD', 'Survived a stroke' and 'Survived other CVD' the risk of death is estimated by taking the Framingham risk for that individual and multiplying it with the corresponding relative risk in Table 2.

Model structure
Six states are modelled: 'Alive and well', 'Survived an MI', 'survived other CHD ', 'survived a stroke', 'survived other CVD' and 'Dead'. The state 'survived other CHD' would largely consist of those who develop angina without first suffering an MI. The state 'survived other CVD' would consist of other CVD states such as intermittent claudication. Their relationships are shown in the state transition diagram in Figure 1. There is no transfer between the four states representing survival of a cardiovascular event and there is a modelling assumption that the risk of death is not increased by further cardiovascular events.
The basic method used by the model is outlined in the Markov cycle-tree in Figure 2 [13]. The activity diagram is shown in Figure 3. The time horizon of the model is to the age of 85, and the cycle length is one year. Through the lifetime of the individual, the model adjusts the BP and TC using the change in mean BP for each year derived from the 'Health Survey for England: cardiovascular disease in 1998' [18]. It was implemented using the Microsoft Excel™ spreadsheet package.
The model estimates survival for an individual between 35 and 75. It starts at the current age of the individual, estimating mortality in each successive year, up to the Table 1  State transition diagram for the six states in the model  Table 3.

Validation
All cases within the Whickham data set were identified between the ages of 35 and 65 and the risk factors described in Table 3 were extracted. Absence of left ventricular hypertrophy (LVH) and an average high density lipoprotein (HDL) of 1.3 for males and 1.6 for females was assumed as these are the approximate averages in the Health Survey for England: CVD in 1998 [18].
Cases with missing data or any kind of heart disease or cerebrovascular disease at baseline were excluded. The model estimate of the probability of survival for each case was identified at 20 years. Once the survival probabilities had been generated the average actual survival was calculated by finding the proportion of the cohort still alive at 20 years. The average probability of survival was calculated by finding the mean of all the estimated 20-year survival probabilities generated by the model.
Further analysis was conducted by sorting and grouping subjects in the order of their rank for each factor in Table  4. The mean actual survival in each group of these subjects was plotted against the mean estimated survival probability at 20 years.
The biological and measurement variability of BP and TC are significant. A sensitivity analysis for these two factors was conducted. For systolic BP, a total coefficient of variation (CV T ) of 5.6% was taken [19] giving 95% confidence intervals (95% CI) of approximately +/-11%. For TC, a CV T of 7.4% was used giving approximate 95% CI of +/-15%. A grid of 12 hypothetical subjects of the ages 35, 50 and 65 with gender and smoking status was drawn up

Ranking factor Grouping
Model estimates of 20-year survival 39 groups of 35 subjects and 1 of 29 Age (1) One year age bands Age (2) Five year age bands Systolic BP 10 mm Hg bands from 100 to 200 mm Hg TC 0.5 mmol/l bands from 3.0 to 9.99 mmol/l. A sub-group analysis was performed to examine the performance of the model in the sub-groups in Table 5.
The model was used to estimate the potential gains in life expectancy (PGLE) from the elimination of CVD as a cause of death. Half cycle correction was used [13]. This was firstly done on an typical example 35 year-old nonsmoking man and 35 year-old woman non-smoking with a systolic BP of 131 mmHg and a TC/HDL ratio of 4.08. Secondly, the model was used to estimate the PGLE of each 35-year-old in the Whickham sample to give the average PGLE's for these men and women at the age of 35 years.

Results and discussion
Of the 2779 people in the Whickham cohort, 1,541 were between the ages of 35 and 65 inclusive. Of these 8 had missing data. Another 142 were excluded because of preexisting CVD. The results of the subgroup analysis are shown in Table 5. The correlation between the model's estimated survival and actual survival are shown in Table  6. Graphs showing the plots of the model's estimates of survival probability and the average survival in the Whickham cohort for each of the groups analysed in Table 3  There is a high level of agreement between the predictions of the model and the actual survival in the Whickham cohort. However, as can be seen from the Table 5, the model statistically significantly underestimates mortality at twenty years in those people with a systolic BP over 180 mmHg even though there were only 95 cases in that group (p = 0.006). There were no other significant differences between the two groups.
The correlations between model estimates of the probability of survival and actual survival in the specified groupings are given in Table 6. The results of the sensitivity analysis are given in Table 7.
The PGLE for our typical 35 year-old man was 2.7 years and 1.8 years for our typical 35-year-old woman. For the twenty-four 35-year-old men in the sample used in the validation, the average PGLE was 4.0 years. For the thirtytwo 35-year-old women the PGLE was 1.8 years.

Ranking factor Correlation
Model

Discussion
The survival estimated by this model, and the observed survival in the Whickham study correlate highly. It significantly overestimates survival in those with a systolic BP of 180 mmHg or more (p = 0.006).
For 50-year-old men the sensitivity analysis shows up to a 9.5% range in the 95% confidence intervals (CI) for the estimated survival at 20 years based on 3 BP readings and a 6.5% range based on a single measurement of the TC. Otherwise the 95% CI do not exceed +/-2% of the mean. This would underline the need to take more than 3 BP readings and at least two serum TC measurement in middle aged men.
This model is focused on individuals and is not a population simulation as is the CHD Policy Model [6]. Consequently it can be used to give individualised risk information in real time. Grover et al used data from the Lipid Research Clinics cohort and included a multivariate model for death from 'other causes' in addition to CHD and stroke and so took account of competing causes of death [12]. In addition to modelling competing risk of death, the model described here adjusts the risks of noncardiovascular death by smoking status and also models the change in BP and cholesterol through a lifetime.
This system may be of value in the development of public health programs where different intervention and prevention strategies could be modelled giving reliable estimates of the impact on survival. The inclusion of an adjustment for the non-CVD risks of death from smoking is particularly important here, as these competing causes of death will have a greater impact in smokers than in non-smokers.
Tools such as these will naturally be attractive to insurers for actuarial assessment but their use may be controversial. The model may reduce uncertainty in survival in certain groups and reduce risk in setting the levels of premiums. However, this could be regarded as Scatterplot of the estimated probability of survival against the actual survival in the subjects grouped into seven 5-year age bands between 35 and 65 years old Figure 4 Scatterplot of the estimated probability of survival against the actual survival in the subjects grouped into seven 5-year age bands between 35 and 65 years old.
undermining the very principles on which insurance is based -uncertainty and the sharing of risk.

The weaknesses of the model and further development Historical variation and the effect on conclusions of validity
The Framingham equations used here were developed from cohort data collected in the 1960s and 1970s. We know that since then there has been a fall in the incidence of CHD mortality and a change in the prevalence of smoking. Whilst it may be that the change in incidence is due to the change in the prevalence of risk factors included in the Framingham equation, we cannot be sure that there are not other extraneous factors that have varied over that time and may have affected the incidence of CHD. If so, the validity of the Framingham equation in modern populations may be undermined. For example, some other risk factor such as serum fibrinogen levels, homocysteine, soft water, chlamydia infection or an as yet undetected factor my have altered which would distort the Framingham predictions. It is not just risk factors that are relevant here, but also protective factors such as a moderate intake of red wine [20] or exercise [21]. We have used the Whickham survey that collected its 20-year follow-up data in the early 1990s. This model will be used in individuals at the beginning of a period of prediction rather than at the end, so the best we can say is that this model was valid when making predictions of survival in a UK population 30 years ago when the Whickham data was first being collected.

Smoking cessation
This model makes a number of assumptions about nonsmokers and smokers that do not quite fit the real world. These assumptions are that: • Smokers remain smokers and do not quit.
• Non-smokers have never smoked at all.
Scatterplot of the estimated probability of survival against the actual survival in the subjects grouped into fifty one 1-year age bands between 35 and 65 years old If this model were to be used to assess the benefits of quitting smoking, it would be on the assumption that a quitter's risk falls instantly to the risk of non-smokers. These are clearly invalid assumptions. When evaluating the survival of smokers it is on the assumption that they will remain smokers all their life. This impacts upon our evaluation, because those individuals who were recorded as smokers at their baseline assessment at time 0 were assumed to remain smokers until death or their 20-year follow-up. Clearly many will have given up in the intervening period. The net results of this would be an overestimate of mortality in smokers. This is in keeping with our results that show a small, non-significant overestimate of mortality in smokers.
The risks of ex-smokers probably never fall to those of non-smokers, and the reduction in risk varies with the age at which you quit. Quitting before middle-age reaps greater benefit than quitting later in life [22]. It would be feasible to use the different relative risk for ex-smokers given in the Surgeon General's report of 1989 to improve the model for ex-smokers. It would even be possible to indicate the impact of quitting at various times in the future.

Modelling intervention
This model can give an indication of life gained with intervention or elimination of CHD and stroke. For example, our typical 35 year-old, non-smoking man (systolic BP of 131 and a TC/HDL ratio of 4.08) eliminating CHD and stroke as a cause of death (relative risk reduction of 100%) would reduce his mortality to the age of 85 of by about 12.1% ( Figure 10). This amounts to an average PGLE of 2.71 years. The pale yellow coloured part of the graph in Figure 10 represents the life gained.
The Coronary Heart Disease Policy Model predicted the PGLE from eliminating CHD to be 3.1 years for men. This is in keeping with the predictions of our model. However, the Coronary Heart Disease Policy Model also predicted a PGLE of 3.3 years for women. This is actually higher than the quoted PGLE for men and nearly double the PGLE The improvement in survival gained from eliminating CVD in a typical non-smoking 35 year-old man Figure 10 The improvement in survival gained from eliminating CVD in a typical non-smoking 35 year-old man. predicted by our model. CVD death rates are higher in men than women across all ages in the UK and so the model described here would seem more consistent with the observed epidemiological data [16].
If the forty people aged 55 in this study reduced their risk of CHD or stroke by 88%, then the PGLE would be 2.2 years. Wald et al in their paper on the Polypill estimated that a third of 55 year-olds would gain about 11 years free of CHD events [23]. Their 'simple Markov model' did take account of CVD as well as of 'dying from another cause' but had markedly divergent results from this study. This may reflect the small sample size of 55 year-olds in this study (n = 40), or a failure of Wald's model to take into account all of the factors included in this model.
Mackenbach estimated that the PGLE from the elimination of CVD would be about 4.0 years from birth [24]. This would seem to be in keeping with the results of our model as the proportion of deaths from CVD prior to the age of 35 is very small [16]. On the whole it would seem that the predictions of this model are in keeping with the bulk of other model estimates of PGLE.

Conclusions
This model gives valid estimates of 20-year survival in Whickham cohort members between the ages of 35 and 60 who are free of CVD and have systolic BPs below 180 mmHg. It could form the basis of a decision aid in the primary prevention of CVD. It would be useful in the modelling of intervention and prevention strategies and could be a valuable tool for actuarial assessment.