Skip to main content

Prediction of long-term hospitalisation and all-cause mortality in patients with chronic heart failure on Dutch claims data: a machine learning approach



Accurately predicting which patients with chronic heart failure (CHF) are particularly vulnerable for adverse outcomes is of crucial importance to support clinical decision making. The goal of the current study was to examine the predictive value on long term heart failure (HF) hospitalisation and all-cause mortality in CHF patients, by exploring and exploiting machine learning (ML) and traditional statistical techniques on a Dutch health insurance claims database.


Our study population consisted of 25,776 patients with a CHF diagnosis code between 2012 and 2014 and one year and three years follow-up HF hospitalisation (1446 and 3220 patients respectively) and all-cause mortality (2434 and 7882 patients respectively) were measured from 2015 to 2018. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated after modelling the data using Logistic Regression, Random Forest, Elastic Net regression and Neural Networks.


AUC rates ranged from 0.710 to 0.732 for 1-year HF hospitalisation, 0.705–0.733 for 3-years HF hospitalisation, 0.765–0.787 for 1-year mortality and 0.764–0.791 for 3-years mortality. Elastic Net performed best for all endpoints. Differences between techniques were small and only statistically significant between Elastic Net and Logistic Regression compared with Random Forest for 3-years HF hospitalisation.


In this study based on a health insurance claims database we found clear predictive value for predicting long-term HF hospitalisation and mortality of CHF patients by using ML techniques compared to traditional statistics.

Peer Review reports


Chronic heart failure (CHF) is a severe condition that is characterized by high mortality and morbidity. Evidence exists that a substantial portion of CHF patients, in particular those with (multiple) comorbidities, do not currently receive optimal medical therapy, leading to potentially avoidable specialist-visits and frequent HF hospitalisations, impaired quality of life or even life-threatening complications [1,2,3,4,5]. Patients admitted for CHF are at a considerably higher risk of (long-term) adverse outcomes after a hospital discharge than the general elderly population, even higher than after other common serious events such as pneumonia and myocardial infarction [6]. Accurately predicting which CHF patients are particularly vulnerable for adverse outcomes, such as renewed HF hospitalisation or even death, is of crucial importance to support clinical decision-making. Advances in statistical approaches and computational power, including fully utilizing machine learning techniques on Big Data, potentially provide better knowledge extraction and evidence-based clinical decision support [7,8,9,10,11]. In addition to traditional statistical analysis, novel machine learning (ML) algorithms can identify patterns in large datasets and build both linear and non-linear models in order to make effective data-driven predictions [12]. All residents of the Netherlands are entitled to a comprehensive basic health insurance package and this includes the bulk of essential medical care, medications and medical aids. Health insurance claims (HIC) databases are attractive for research because of their large size, their longitudinal perspective, and their practice-based information. As they are based on financial reimbursement, the information is generally reliable, moreover databases are audited every year to ensure that they meet the required quality level for the Dutch risk equalization model [13, 14]. ML techniques could potentially better utilize the richness of these databases [7, 8, 15, 16]. The goal of the current study was to examine the predictive value of Dutch HIC data on long term HF hospitalisation and all-cause mortality in CHF patients, by exploring and exploiting ML and traditional statistical techniques.



A HIC database containing anonymous data that can be considered a representative sample of ~ 30% of the Dutch population from Zilveren Kruis, the largest insurance company in the Netherlands, was analysed retrospectively. Patients aged 18–85 years with a diagnosis code for CHF between 2012–2014 were included and follow-up HF hospitalisation and all-cause mortality in 2015–2018 was measured [17]. Patients had to have a CHF-related claim according to the national diagnosis-treatment classification system called ‘Diagnose Behandeling Combinatie’ (DBC), which is based on a combination of the International Classification of Diseases, 10th revision (ICD-10) and applied treatment [18]. Additionally, they had to have used at least one medication within the cardiovascular system (“C”) based on the World Health Organization Anatomical Therapeutic Chemical Classification index and Defined Daily Dose (WHO ATC/DDD) in the same period [19]. According to the European Society of Cardiology (ESC) heart failure guidelines [20], CHF patients should visit their treating physician at least once per year. Therefore, patients were excluded who lacked any HF insurance claim after January 2015, because they are most likely wrongly diagnosed or labelled HF patients. Patients who switched insurance company between 2012–2017 were also excluded. A total of 25,776 patients were included in the final analysis (Fig. 1).

Fig. 1

Flowchart of study population. See also: Gürgöze MT, van der Galiën OP, Limpens MAM, Roest S, Hoekstra RC, IJpma AS, Brugts JJ, Manintveld OC, Boersma E. Impact of sex differences in co-morbidities and medication adherence on outcome in 25 776 heart failure patients. ESC Heart Fail. 2020


The study endpoints were HF hospitalisation and all-cause mortality. The risk of HF hospitalisation and all-cause mortality were predicted on a one (2015)- and three-years (2015–2017) perspective. This resulted in the following study endpoints (i.e. dependent features): (1) 1-year HF hospitalisation (1446 patients), (2) 3-years HF hospitalisation (3220 patients), (3) 1-year all-cause mortality (2434 patients), (4) 3-years all-cause mortality (7882 patients). HF hospitalisation was defined as at least one night of stay in inpatient care for acute or chronic HF based on the DBC system. All-cause mortality was defined as death due to any cause. No clinical adjudication committee reviewed the HF hospitalisation endpoint.


The process of feature selection is graphically displayed in Fig. 2. Claim-based input features between 2012–2014 were divided into three categories: hospital claims, pharmaceutical claims and claims of other caregivers. Hospital claims are all DBC’s, a combination of diagnosis and treatment, for instance a DBC for hospital admission for CHF with more than five nursing days. Within the hospital claims we also included the diagnosis related groups (DRG) based on the ICD-10 code of each DBC. The pharmaceutical claims were divided in seven categories:

  1. 1.

    Use of an individual prescription on a full anatomical therapeutic chemical (ATC) level

  2. 2.

    An ATC3 therapeutic subgroup level [21]

  3. 3.

    Medical adherence by defining the medication possession ratio (MPR) [22, 23]

  4. 4.

    Use of automatic pill dispenser

  5. 5.

    Sum of prescribed daily doses (PDD) which takes into account dosage schemes as prescribed by the treating physician

  6. 6.

    Number of times medicines were collected

  7. 7.

    Number of different medication within the same ATC3 subgroup.

Fig. 2

Flow diagram of the process of feature selection

An example of claims of other caregivers are number of visits to a GP or physiotherapist or use of a medical device.

Most features are categorical and are binary coded to represent whether the corresponding medical service was provided to the patient or not. Due to the large number of claim-related features of the dataset (> 6000 features), feature selection plays an important role in reducing noise and computational costs, while simultaneously improving accuracy [24]. Feature selection was done in two stages; first, prevalence prioritization and, second, a Lasso Regression [25]. Prevalence prioritization was performed in each of the three categories and their relevant subcategories for two-time episodes, by first selecting all features, with a threshold of > 250 patients in each category. In this way we included for two-time episodes 96 and 160 out of 2290 hospital claim features, 73 and 85 out of 141 DRG features, 192 and 232 out of 901 pharmaceutical claims features, 55 and 60 out of 299 pharmaceutical claims on a ATC3 level and 377 and 508 out of 3232 other claims features. Input features between 2012–2014 were divided in two-time episodes; (1) year 2014 and (2) combined years 2012–2013. Patient characteristics on postal code level, such as income (high, medium, low) and distance to nearest facilities such as GP and hospital, were also included as input features. Two time-related features were included in the model; days between last hospital visit in 2012–2014 and January 1, 2015, and duration since first hospital visit, by determining the period between the first occurrence of DBC for CHF in this baseline period up to January 1, 2015. The total number of input features in this first stage was > 2000. In the second stage of feature selection we ran a Lasso regression to obtain the (maximum) 150 most significant features on the partition of the dataset that was subsequently used for model training (49%) and validation (21%), on all the input features of stage 1, related to each of the four endpoints separately. The LASSO method puts a constraint on the sum of the absolute values of the model parameters, the sum has to be less than a fixed value (upper bound). In order to do so the method applies L1 regularization, where some of the coefficients become exactly zero. The variables corresponding to the non-zero coefficients remain in the dataset. The goal of this process is to reduce computation cost and was to minimize the prediction error [25]. We used SAS Enterprise Guide 7.1 for Lasso regression (proc GLMSELECT for binary outcome with Schwarz Bayes selection Criterion) [26]. Baseline characteristics age, sex and marital status were added to the final set. The features and the total number of features in the final set is described in Additional file 1: Table I. The definition of demographics, socio-economic status, selection of medication and all other input features are described in Additional file 1: Table II.

Statistical analysis

We compared four computational techniques to determine which method yields the best prediction for the study endpoints: backward logistic regression (LR), regularized logistic regression (Elastic Net, EN), random forests (RF) and neural networks (NN). We used the area under the receiver operating characteristic (ROC) curve (AUC) as primary and sensitivity and specificity as the secondary performance metric for comparing the models. Sensitivity and specificity have the advantage that they express equal importance to the correct prediction of hospitalisation/mortality and the prediction of no hospitalisation or mortality [27] Additional performance metrics, such as true negatives and precision are calculated and shown in a confusion matrix in Additional file 1: Table III.

The dataset has been split randomly into two partitions, to learn (training and validation) and evaluate (test) the models. The first partition of 70% of the complete dataset (patients) to learn is used for model training (49%) and validation (21%). Various combinations of hyperparameter values (Table 1) were explored to optimize the AUC between training set and validation set to obtain the best trained model. For the hyperparameters not mentioned, the respective default values of the software packages R Statistical Software and SAS Enterprise Guide were used. The second partition of 30% of the complete dataset is then used to evaluate the final prediction performance. The same learning and evaluation partitions were used for all techniques.

Table 1 Hyperparameters used in the several models

Because the dichotomous endpoints (hospitalisation and mortality) of the models are imbalanced (> 90% has class value ‘no mortality’, or ‘no hospitalisation), by default the predictions could be biased towards the class that has the highest prevalence [28]. Therefore, the loss is separated per class and the class loss is weighted proportionally to the inverse of the proportion of the corresponding class (formula described in Additional file 1: Table II).

The backward logistic regression (LR) starts with all coefficients in the model and deletes them consecutively. In each step the coefficient that does not (significantly) improve the prediction on the dependent variable is removed until all features have a significance greater than 0.10.

The regularized Logistic Regression was estimated with the elastic net regularization (EN). This is a combination of the LASSO regularization (L1 penalty) and ridge regularization (L2 penalty). Therefore, there are two hyperparameters that need to be tuned: alpha (L1 penalty) and lambda (L2 penalty). The optimal combination of alpha and lambda is searched for with tenfold cross validation on the validation set. For alpha all values between 0 and 1 with an interval of 0.1 are used. For lambda a range between 0.001 and 10.000 is used in 80 exponentially increasing steps. This resulted in 880 combinations of different alpha and lambda values.

The Random Forest model (RF) is an ensemble of multiple decision trees. Each step in the decision tree construction uses a selection of the input features (\(\sqrt {total\;number\;of\;variables}\)) and per tree a subset of the training data. The splitting criteria is the Gini-index. We let the algorithm infer the optimal number of trees itself such that the misclassification rate on the out-of-bag samples is minimized.

For the neural network (NN) a fully connected feed forward network was used. We explored multiple architectures with a varying number of hidden layers and nodes per layer. Networks with 1–3 hidden layers are optimized with each layer having 10 up to 100 hidden nodes. Only the results of the best architecture optimized on the validation set are presented in the results section.

Cut-off values for all four computational techniques, for the additional performance metrics such as sensitivity and misclassification were derived from the Youden index, which is the sum of sensitivity and specificity minus one [29]. The total number of input features was used for all statistical techniques.

For NN variable importance (VI) is hard to establish because of its “black box” nature [30, 31]. We therefore computed VI only for RF, EN and LR. VI in the RF was calculated based on Random Branch Assignments Variable Importance (RBA). The RBA is evidently much less influenced by correlations [32]. For EN we used the absolute values of the coefficients rank the features in the order of variable importance and for LR we used Multivariate Coefficients Score. For this score we simply calculated the magnitude of the marginal effect of each (non-standardized) predictor, by fitting multivariate LR models and filtering out insignificant coefficients according to the P-value “stay threshold”. We report the top 10 features, and their number of patients and univariate odds ratio and discussed the outcome of VI with clinicians.

Output data were analysed using R Statistical Software version 3.4.2 (Vienna, Austria), Caret and GLMNET were used to conduct the EN analysis and SAS Enterprise Guide 7.1 for LR, RF and NN, see Table 2 for software and model information.

Table 2 Model information


Baseline characteristics

Our study population consists of 25,776 CHF patients (median age 74 years (Interquartile Range [IQR] 66–80 years) and 43.7% women) including 1446 patients with HF hospitalisation in 2015 and 3220 in 2015–2017 and all-cause mortality 2434 and 7882, respectively. Baseline characteristics of the overall study sample are described in Table 3.

Table 3 Baseline characteristics

Performance metrics and relevant features

AUC rates ranged from 0.710 to 0.732 for 1-year HF hospitalisation, 0.705–0.733 for 3-years HF hospitalisation, 0.765–0.787 for 1-year mortality and 0.764–0.791 for 3-years mortality. Elastic Net performed best for all endpoints (Table 4). Differences between techniques were small and only statistically significant between EN and LR compared with RF for 3-years HF hospitalisation, based on the confidence intervals.

Table 4 Confusion matrix

Sensitivity ranged from 61.4 to 71.9 for 1-year HF hospitalisation, 67.3–71.5 for 3-years HF hospitalisation, 76.5–78.7 for 1-year mortality and 64.4–79.2 for 3-years mortality. Specificity ranged from 66.3 to 73.6 for 1-year HF hospitalisation, 57.8–68.9 for 3-years HF hospitalisation, 62.6–80.1 for 1-year mortality and 63.1–78.1 for 3-years mortality. For sensitivity rates NN outperformed for 1-year HF hospitalisation and 1-year mortality, while EN outperformed for 3-years HF hospitalisation and LR for 3-years mortality. For specificity LR outperformed for 1-year HF hospitalisation, NN for 3-years HF hospitalisation, RF for 1-year mortality and EN for 3-years mortality.

Top-10 features of importance in our study are shown in Tables 5, 6, 7 and 8. For HF hospitalisation, previous HF hospitalisation for CHF or acute HF, comorbidities as COPD, diabetes or oncology and visit to the GP are the most common among the trained models. For mortality, age, sex and marital status are also often used in the models. Features from 2014 were more common compared to 2012–2013.

Table 5 Variable importance 1 year HF hospitalisation
Table 6 Variable importance 3 years HF hospitalisation
Table 7 Variable importance 1 year all-cause mortality
Table 8 Variable importance 3 years all-cause mortality


In this analysis, based on a HIC database of > 25,000 patients with CHF, we have shown that the use of traditional and novel techniques indeed have clear predictive value for predicting long-term HF hospitalisation and all-cause mortality for CHF patients, with AUC’s between 0.7 and 0.8 [33, 34].

For our main performance metric, the AUC, EN outperformed other statistical methods in predicting 1 and 3 years HF hospitalisation and 1 and 3 years mortality, although with only minor differences compared to traditional LR and only statistically significant between EN and LR compared with RF for 3-years HF hospitalisation.

Our results are comparable with earlier reported findings. Angraal et al. reported in a recent study for 3-years mortality and HF hospitalisation that RF was the best performing model with a mean AUC of 0.72 (95% confidence interval [CI] 0.69–0.75) for predicting 3-years mortality, and 0.76 (95% CI 0.71–0.81) for 3-years HF hospitalisation [35]. This study was based on a cohort with 1,767 patients in HF with preserved ejection fraction. Chicco et al. [36] analysed 9 months mortality in 299 patients with HF. RF outperformed all the other methods, by obtaining the top ROC AUC (0.800) The Artificial Neural Network perceptron, instead, obtained the top value on the Precision-Recall AUC (0.750). The AUC outcomes on our HIC database are in line with these 2 recent studies, both based on clinical data, although RF did not outperform in our study. Mortzavi et al. found more pronounced differences (10–25%) for 30-days HF hospitalisation and 180-days HF hospitalisation outcome between ML compared to traditional LR, but they only used the 5 most important features as identified previously (blood urea nitrogen, glomerular filtration rate, sex, waist-to-hip ratio, and history of ischemic cardiomyopathy) in LR and the remaining techniques were created using the full raw data of 472 inputs [37]. It should be noted that these studies had small datasets, which may limit the generalizability of their conclusions. A meta-analysis and meta-regression study of 117 prognostic models revealed only a moderate accuracy of models predicting mortality, whereas models designed to predict the combined endpoint of death or HF hospitalisation, or only HF hospitalisation, had an even poorer discriminative ability. The highest AUC-statistic values were achieved in a clinical setting, predicting short-term mortality with the use of models derived from prospective cohort/registry studies with many predictor variables. The mean AUC-statistic was 0.66 ± 0.0005 for the models reporting a standard error. Models using data from medical records had significantly better AUC-statistic values than models using claims data. Also, models using more predictor variables had better predictive values; AUC-statistic increased 0.0036 (SE = 0.0005) with each added predictor variable. There was no significant difference in AUC-statistic values between patients diagnosed with either CHF or acute HF [38].

Most currently existing prognostic models in patients with CHF are based on data from randomized controlled trials or extracted from administrative datasets, such as medical insurance claims [39]. To our knowledge, this was the first study that applied machine learning techniques to a (Dutch) HIC database for CHF outcome prediction. A great advantage of a HIC database in the Netherlands is that it covers the entire healthcare utilization since over 99% of the population has basic health insurance as mandated by law [40]. Most studies are based on data of 1 or a limited number of hospitals, but in a HIC database we have data of all the hospitals and patient visits. Moreover, a HIC database also includes General Practitioners (GP), pharmaceutical and other healthcare-related data. The relevance of covering a patient’s full healthcare usage was demonstrated by our feature importance analysis (Tables 5, 6, 7, 8) that shows that GP and pharmaceutical data are related to the endpoints in our study. Primary care plays a central role in many countries, such as the Netherlands and the UK in the diagnosis, long-term management, and end-of-life care for these patients. While there is specialist support available from nurses and cardiologists especially after admission and later on at a regular basis once or twice a year, GPs remains responsible for overseeing most patient care once a diagnosis is made including management to delay progression, recognition of HF decompensations, and patient follow-up in the vulnerable period following an HF hospitalisation [41, 42].

Most machine learning techniques for adverse outcomes in CHF focus on a short time period, mainly on 30-days HF hospitalisation [6]. Although the risk for HF hospitalisation declines over time, patients with a CHF hospitalisation have a significantly elevated risk of HF hospitalisation for at least 1 year [4]. From a public health and patient perspective long-term adverse outcomes are as important and a HIC database has adequate information to examine this long-term perspective.

The most important features in our study are well known and reported in earlier studies, for instance age, sex, comorbidities as diabetes or COPD, and living in buildings built between the years 1965 and 1974 as a proxy for socio-economic status. HF Hospitalisation in the baseline period was unsurprisingly an import predictor of HF hospitalisation in the future, as well as Acute HF. Visit of the GP was an import predictor, but we do not know the reason of this visit. Most likely it was related to comorbidities. Importantly, most data-driven machine learning techniques, including the ones we used, are correlational in nature and not causal, so caution while interpreting these results is advised [43].

As the current study was only utilizing HIC data, potentially important clinical features are not included in the current models. For instance, Parenica et al. [44] found higher age, LV dysfunction, comorbidities and high levels of natriuretic peptides as the most powerful predictors of worse prognosis in long-term survival. Reduced ejection fraction is a powerful predictor of long-term mortality, especially after the 6th year [45]. Ouwerkerk et al. [38] found 3 variables with a high predictive value: sodium; blood urea nitrogen; and systolic blood pressure. Clinical features such as ejection fraction and natriuretic peptides are not available in a HIC database, but age and comorbidities based on pharmacy-based cost groups or DRGs are.

We have given an overview of model performance of several machine learning algorithms and traditional statistics in predicting risk for HF hospitalisation and all-cause mortality in a representative sample of the Dutch population from a HIC database. Our findings are therefore generalisable to all CHF patients in the Netherlands. However, several limitations should also be acknowledged. First, models based on administrative claims data lack certain clinically relevant features, such as New York Health Association Class [46], left ventricular ejection faction, intoxications such as smoking, blood pressure and physical activity [47]. Enriching HIC data through clinical data coupling would be preferred. Strict general data protection regulation (GDPR) rules limit these possibilities. Using a trusted third party or novel techniques like Multi Party Computation could provide a good solution to overcome the legal burdens to clinical and HIC data coupling [16]. Second, due to GDPR regulations, patient data history is limited to 7 years in HIC databases, hence, relevant historical data may be missing. Third, the ability to explain and interpret ML models is limited, especially NN. Hence, it is difficult to embrace these models and apply them in a clinically relevant way. More research is needed to explore the causal relationship of features that could be of importance in medical practice. In general, we found poor or moderate overlap between methods in their assessment of feature importance for the top10 features, even when their performance is comparable and relatively good. Most overlap was between EN and LR and least with RF, because RF was heavily nonlinear. The ability to explain and interpret RF is most elaborate, because RF has an integrated procedure of producing variable importance’s [48]. However, for LR we used the multivariate method. This is the simplest feature importance measure tested, and unsurprisingly has strong assumptions, namely that a predictor’s importance is independent of all other factors. It is also important to note that significant predictors in LR may not make useful predictions [49]. Finally, we reduced the input claims features using prevalence prioritization and Lasso Regression, due to performance reasons. By doing so we perchance excluded features which could have been relevant.

In this study based on a Health Insurance Claims database we have shown clear predictive value for predicting long-term HF hospitalisation and mortality of CHF patients. Novel machine learning techniques like RF and NN can obviate more redundant HF hospitalisation or mortality, because they allow for non-linear relations or in the case of EN can reduce irrelevant features. In the long run, we hope that applying state-of-the-art machine learning on clinical data combined with HIC data can improve risk stratification and prognosis by offering high-risk patients timely intervention through for example cardiac rehabilitation and by optimizing medical therapy and stimulating medical adherence.

Availability of data and materials

The data generated and analysed during the current study are not publicly available due to the privacy of individuals that participated in the study, but are available from the corresponding author on reasonable request.



Anatomical therapeutical chemical classification


The area under the receiver operating characteristic curve


Chronic heart failure


Diagnose behandeling combinatie or diagnosis-treatment combination


Defined daily dose


Diagnosis related groups


Elastic net regression


European society of cardiology


General data protection regulation


General practitioner


Heart failure


Health insurance claims databases


International classification of diseases


Interquartile range


Logistic regression


Machine learning


Medication possession ratio


Neural networks


Prescribed daily doses


Random branch assignments variable importance


Random forest


The area under the receiver operating characteristic


Variable importance


  1. 1.

    BigMedilytics, pilot heart failure. 2020.

  2. 2.

    Tavazzi L, Maggioni AP, Borer JS. Should we revise our approach to ‘optimal medical therapy’? The case of chronic heart failure. Eur Heart. 2013;34(36):2792–4.

    Article  Google Scholar 

  3. 3.

    Mamas MA, Sperrin M, Watson MC, Coutts A, Wilde K, Burton C, Kadam UT, Kwok CS, Clark AB, Murchie P, et al. Do patients have worse outcomes in heart failure than in cancer? A primary care-based cohort study with 10-year follow-up in Scotland. Eur J Heart Fail. 2017;19(9):1095–104.

    Article  PubMed  Google Scholar 

  4. 4.

    Ziaeian B, Fonarow GC. The prevention of hospital readmissions in heart failure. Prog Cardiovasc Dis. 2016;58(4):379–85.

    Article  PubMed  Google Scholar 

  5. 5.

    Ambrosy AP, Fonarow GC, Butler J, Chioncel O, Greene SJ, Vaduganathan M, Nodari S, Lam CSP, Sato N, Shah AN, et al. The global health and economic burden of hospitalisations for heart failure: lessons learned from hospitalized heart failure registries. J Am Coll Cardiol. 2014;63(12):1123–33.

    Article  PubMed  Google Scholar 

  6. 6.

    Dharmarajan K, Hsieh AF, Kulkarni VT, Lin Z, Ross JS, Horwitz LI, Kim N, Suter LG, Lin H, Normand ST, et al. Trajectories of risk after hospitalisation for heart failure, acute myocardial infarction, or pneumonia: retrospective cohort study. BMJ. 2015;350:h411.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    He D, Mathews SC, Kalloo AN, Hutfless S. Mining high-dimensional administrative claims data to predict early hospital hospitalisations. J Am Med Inform Assoc. 2014;21(2):272–9.

    Article  PubMed  Google Scholar 

  8. 8.

    Yang T, Yang Y, Jia Y, Li X. Dynamic prediction of hospital admission with medical claim data. BMC Med Inform Decis Mak. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Banu NKS, Swamy S. Prediction of heart disease at early stage using data mining and big data analytics: a survey. In: International conference on electrical, electronics, communication, computer and optimization techniques (ICEECCOT), Mysuru; 2016. pp. 256–261.

  10. 10.

    Zolfaghar K, Meadem N, Teredesai A, Roy SB, Chin S, Muckian B. Big data solutions for predicting risk-of-readmission for congestive heart failure patients. In: IEEE international conference on big data, Silicon Valley; 2013. pp. 64–71.

  11. 11.

    Maragatham G, Devi S. LSTM model for prediction of heart failure in big data. J Med Syst. 2019;43:111.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Meng F, Zhang Z, Hou X, Qian Z. Machine learning for prediction of sudden cardiac death in heart failure patients with low left ventricular ejection fraction: study protocol for a retroprospective multicentre registry in China. BMJ Open. 2019;9:e023724.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Smeets H, de Wit N, Hoes A. Routine health insurance data for scientific research: potential and limitations of the Agis Health Database. J Clin Epidemiol. 2010;64:424–30.

    Article  Google Scholar 

  14. 14.

    Eijkenaar F, van Vliet RCJA. Improving risk equalization using information on physiotherapy diagnoses. Eur J Health Econ. 2018;19:203–11.

    Article  PubMed  Google Scholar 

  15. 15.

    B. Goldman. King of the mountain: digging data for a healthier world, Stanford Medicine Summer. 2012.

  16. 16.

    Veeningen M, Chatterjea S, Horvath AZ, Spindler G, Boersma E, van der Spek P, van der Galien O, Gutteling J, Kraaij W, Veugen T. Enabling analytics on sensitive medical data with secure multi-party computation. European Federation for Medical Informatics and IOS Press. 2018. PMID: 29677926.

  17. 17.

    Gürgöze MT, van der Galiën OP, Limpens MAM, Roest S, Hoekstra RC, Ijpma AS, Brugts JJ, Manintveld OC, Boersma E. Impact of sex differences in co-morbidities and medication adherence on outcome in 25,776 heart failure patients. ESC Heart Fail. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Prevention CfDCa. International classification of diseases, Tenth Revision. Clinical Modification (ICD-10-CM). 2016.

  19. 19.

    WHO Collaborating Centre for Drug Statistics Methodology. Guidelines for anatomical therapeutic chemical (ATC) classification index and defined daily doses (DDDs) assignment 2019. Oslo, Norway. 2018.

  20. 20.

    Ponikowski P, Voors AA, Anker SD, Bueno H, Cleland JGF, Coats AJS, Falk V, Gonzalez-Juanatey JR, Harjola VP, Jankowska EA, et al. Group ESCSD. ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: the Task Force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC) Developed with the special contribution of the Heart Failure Association (HFA) of the ESC. Eur Heart J. 2016;37(27):2129–200y.

  21. 21.

    WHO Collaborating Centre for Drug Statistics Methodology. 2020.

  22. 22.

    Krueger K, Griese-Mammen N, Schubert I, Kieble M, Botermann L, Laufs U, Kloft C, Schulz M. In search of a standard when analyzing medication adherence in patients with heart failure using claims data: a systematic review. Heart Fail Rev. 2018;23:63–71.

    Article  PubMed  Google Scholar 

  23. 23.

    Fuller RH, Perel P, Navarro-Ruan T, Nieuwlaat R, Haynes RB, Huffman MD. Improving medication adherence in patients with cardiovascular disease: a systematic review. Heart. 2018;104:1238–43.

    Article  PubMed  Google Scholar 

  24. 24.

    Ooi CH, Chetty M, Teng SW. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data. BMC Bioinform. 2006;7:320.

    CAS  Article  Google Scholar 

  25. 25.

    Fonti V, Belitser E. Paper in business analytics feature selection using LASSO. 2017.

  26. 26.

    Cohen R. Applications of the GLMSELECT Procedure for Megamodel selection, proceedings of the SAS global forum. 2009; Conference, Paper 259–2009.

  27. 27.

    Awan SE, Bennamoun M, Sohel F, Sanfilippo FM, Dwivedi G. Machine learning-based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics. ESC Heart Fail. 2019;6:428–35.

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019.

    Article  Google Scholar 

  29. 29.

    Fluss R, Faraggi D, Reiser B. Estimation of the Youden index and its associated cutoff point. Biometrical J. 2005.

    Article  Google Scholar 

  30. 30.

    Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1:206–15.

    Article  Google Scholar 

  31. 31.

    Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H. Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Transl Med. 2018;6(11):216.

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Neville PG, Tan PY. A Forest measure of variable importance resistant to correlations. In: Proceedings of the 2014 joint statistical meetings. 2014; Alexandria, VA: American Statistical Association.

  33. 33.

    Hosmer DW, Lemeshow S. Applied logistic regression; 2013. p. 177.

  34. 34.

    Safari S, Baratloo A, Elfil M, Negida A. Evidence based emergency medicine; part 5 receiver operating curve and area under the curve. Emerg (Tehran). 2016;4(2):111–113.

  35. 35.

    Angraal S, Mortazavi BJ, Gupta A, Khera R, Ahmad T, Desai NR, Jacoby DL, Masoudi FA, Spertus JA, Krumholz HM. Machine learning prediction of mortality and hospitalisation in heart failure with preserved ejection fraction. JACC Heart Fail. 2020;8(1):12–21.

    Article  PubMed  Google Scholar 

  36. 36.

    Chicco D, Jurman G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Inform Decis Mak. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Mortazavi BJ, Downing NS, Bucholz EM, Dharmarajan K, Manhapra A, Li S, Negahban SN, Krumholz HM. Analysis of machine learning techniques for heart failure readmissions. Circ Cardiovasc Qual Outcomes. 2016;9(6):629–40.

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Ouwerkerk W, Voors AA, Zwinderman AH. Factors influencing the predictive power of models for predicting mortality and/or heart-failure hospitalisation in patients with heart failure. JACC Heart Fail. 2014;2:429–36.

    Article  PubMed  Google Scholar 

  39. 39.

    Voors AA, Ouwerkerk W, Zannad F, van Veldhuisen DJ, Samani NJ, Ponikowski P, Ng LL, Metra M, ter Maaten JM, Lang CC, et al. Development and validation of multivariable models to predict mortality and hospitalisation in patients with heart failure. Eur J Heart Fail. 2017;19:627–34.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Nederlandse Zorgautoriteit. Kerncijfers zorgverzekeraars.‐zorgverzekeraars. Accessed 24 Sept 2021.

  41. 41.

    Jones NR, Hobbs FR, Taylor CJ. Prognosis following a diagnosis of heart failure and the role of primary care: a review of the literature. BJGP Open. 2017;1(3):bjgpopen40101013.

    Article  Google Scholar 

  42. 42.

    Rutten FH, Gallagher J. What the general practitioner needs to know about their chronic heart failure patient. Card Fail Rev. 2016;2(2):79–84.

    Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Alber M, Buganza Tepole A, Cannon WR, De S, Dura-Bernal S, Garikipati K, Karniadakis G, Lytton WW, Perdikaris P, Petzold L, Kuhl E. Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. npj Digit Med. 2019;2:115.

    Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Parenica J, Spinar J, Vitovec J, Widimsky P, Linhart A, Fedorco M, Vaclavik J, Miklik R, Felsoci M, Horakova K, et al. Long-term survival following acute heart failure: the acute heart failure database main registry (AHEAD Main). Eur J Intern Med. 2013;24(2):151–60.

    Article  PubMed  Google Scholar 

  45. 45.

    Mouriño López VM, Cid Conde L, Alves Pérez MT, López CJ. Long-term survival of a cohort of patients with heart failure: perspective from the real world. Eur Geriatr Med. 2017;8(4):304–9.

    Article  Google Scholar 

  46. 46.

    Russell SD, Saval MA, Robbins JL, Ellestad H, Gottlieb SS, Handberg EM, Zhou Y, Chandler B. New York Heart Association functional class predicts exercise parameters in the current era. Am Heart J. 2009;158(4 Suppl):S24–30.

    Article  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Rahimi K, Bennett D, Conrad N, Williams TM, Basu J, Dwight J, Woodward M, Patel A, McMurray J, MacMahon S. Risk prediction in patients with heart failure: a systematic review and analysis. JACC Heart Fail. 2014;2(5):440–6.

    Article  PubMed  Google Scholar 

  48. 48.

    Grömping U. Variable importance assessment in regression: linear regression versus random forest. Am Stat. 2009;63(4):308–19.

    Article  Google Scholar 

  49. 49.

    Cava W, Bauer C, Moore JH, Pendergrass SA. Interpretation of machine learning predictions for patient outcomes in electronic health records. In: AMIA annu symp proc; 2020, pp. 572–581.

  50. 50.

    Zilveren Kruis, Policy conditions. 2021."

  51. 51.

    AVG, privacy regulation. 2021."

Download references


The authors would like to thank all the members of the BigMedilytics consortium. All authors have read and approved the final version of the manuscript.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme [780495]. Any dissemination of results here presented reflects only the author’s view. The Commission is not responsible for any use that may be made of the information it contains.

Author information




OPG, RCH, MTG, OCM, EB and CJV conceived the presented idea. OPG, RCH, MRB and CJV developed the theory and performed the analysis. OPG, MTG and EB led the drafting of the manuscript. EB, MTG and CJV supervised the findings of this work. OPG, RCH and MRB contributed to data collection. All authors have read, critically revised and approved the final manuscript.

Corresponding author

Correspondence to Eric Boersma.

Ethics declarations

Ethics approval and consent to participate

Given the use of anonymized data, medical ethical approval was not required for this study according to the applicable rules and regulations. Patients whose data are included in the claims database used in this study gave consent through the policy conditions [50] for their data to be used for research purposes and in accordance with Dutch National regulations [51] (article 9 paragraph 2 sub 2 AVG).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Overview of the number of features in the final set, definitions used and additional performance metrics.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van der Galiën, O.P., Hoekstra, R.C., Gürgöze, M.T. et al. Prediction of long-term hospitalisation and all-cause mortality in patients with chronic heart failure on Dutch claims data: a machine learning approach. BMC Med Inform Decis Mak 21, 303 (2021).

Download citation


  • Heart failure
  • Health insurance claims
  • Prognosis
  • Outcomes
  • Machine learning