Prediction of acute kidney injury risk after cardiac surgery: using a hybrid machine learning algorithm

Background Acute kidney injury (AKI) is a serious complication after cardiac surgery. We derived and internally validated a Machine Learning preoperative model to predict cardiac surgery-associated AKI of any severity and compared its performance with parametric statistical models. Methods We conducted a retrospective study of adult patients who underwent major cardiac surgery requiring cardiopulmonary bypass between November 1st, 2009 and March 31st, 2015. AKI was defined according to the KDIGO criteria as stage 1 or greater, within 7 days of surgery. We randomly split the cohort into derivation and validation datasets. We developed three AKI risk models: (1) a hybrid machine learning (ML) algorithm, using Random Forests for variable selection, followed by high performance logistic regression; (2) a traditional logistic regression model and (3) an enhanced logistic regression model with 500 bootstraps, with backward variable selection. For each model, we assigned risk scores to each of the retained covariate and assessed model discrimination (C statistic) and calibration (Hosmer–Lemeshow goodness-of-fit test) in the validation datasets. Results Of 6522 included patients, 1760 (27.0%) developed AKI. The best performance was achieved by the hybrid ML algorithm to predict AKI of any severity. The ML and enhanced statistical models remained robust after internal validation (C statistic = 0.75; Hosmer–Lemeshow p = 0.804, and AUC = 0.74, Hosmer–Lemeshow p = 0.347, respectively). Conclusions We demonstrated that a hybrid ML model provides higher accuracy without sacrificing parsimony, computational efficiency, or interpretability, when compared with parametric statistical models. This score-based model can easily be used at the bedside to identify high-risk patients who may benefit from intensive perioperative monitoring and personalized management strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01859-w.


Background
Acute kidney injury (AKI) is a serious complication after cardiac surgery with an incidence of 5-30% depending upon procedure type and definitions used [1][2][3][4][5]. It is associated with an increased rate of mortality, hospital length of stay, and healthcare cost [6,7]. As the incidence of AKI is higher after cardiac surgery as compared to medical and noncardiac surgical populations [8], much research has been dedicated to the identification of modifiable risk factors and/or derivation of AKI risk prediction models in this group [9][10][11][12].

Open Access
*Correspondence: lsun@ottawaheart.ca Recent research demonstrates that there is no standard approach to AKI prediction for patients undergoing cardiac surgery. Existing predictive models are based on different combinations of risk factors and rely heavily on intra-and post-operative events to achieve predictive accuracy [12,13], while preoperative risk stratification is most important and remains challenging. In addition, most existing predictive models were developed to identify patient at risk of severe AKI requiring renal replacement therapy [5,12], despite mild AKI being associated with up to a threefold increase in the risk of short-and long-term mortality after cardiac surgery [3,14].
Renal function has long been held as a surrogate for systemic perfusion, and accurate preoperative prediction can help to identify patients who may benefit most from intensive monitoring and personalized management strategies throughout the perioperative period. In the advent of artificial intelligence (AI) in medicine, Machine learning (ML) methods such as Random Forests have successfully been applied to create accurate and reliable predictive models in several fields of study [15,16]. Moreover, hybrid ML algorithms offer improved performance, [17] interpretability and ease of use, making the AI "explainable" to clinicians.
We performed a case study to: (1) derive and internally validate a preoperative model to predict AKI of any severity after cardiac surgery, using a hybrid ML approach, consisting of Random Forests, followed by high-performance logistic regression, and (2) compare the performance of this ML model with traditional and enhanced regression models. We hypothesized that the ML model will outperform traditional models, both in terms of performance and parsimony.

Design and selection criteria
The study protocol was approved by the University of Ottawa Heart Institute Research Ethics Board, which waived the requirement for individual patient consent. We conducted a retrospective study of adult patients (age ≥ 18 years) who underwent major cardiac surgery requiring cardiopulmonary bypass between November 1st, 2009 and March 31st, 2015 at the University of Ottawa Heart Institute. Patients who underwent offpump or thoracic aortic procedures, cardiac transplantation and insertion of ventricular assist devices, as well as those who were dialysis-dependent at baseline, were excluded from the study.

Data sources
We performed a retrospective analysis of prospectively collected data from Cardiocore. Cardiocore is a multimodular data reservoir that captures detailed demographics, comorbidities, physiologic and procedural details, and perioperative outcomes for all patients who undergo cardiac procedures at the University of Ottawa Heart Institute, a university-affiliated tertiary cardiac care referral center that performs the full scope of cardiac procedures. It is formally managed by a multidisciplinary committee and undergoes regularly scheduled quality assurance audits [18].

Study outcome
Postoperative AKI was defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) criteria as a serum creatinine increase ≥ 26 μmol/l within 48 h following surgery or an increase of ≥ 50% from baseline within 7 postoperative days [19].
We created three AKI risk prediction models in the derivation samples: (1) a hybrid ML algorithm, consisting of Random Forests, followed by high-performance logistic regression, (2) a traditional statistical model that employed backward variable selection, and (3) an enhanced statistical model that used 500 bootstrap samples for backward variable selection [30]. A data analysis and statistical plan was written and filed with a private entity (institutional review board) before data were accessed.

Derivation using a hybrid ML algorithm
Details of the Random Forests method have been described elsewhere [31][32][33]. In short, we used a bootstrap sample of the data to build each of the classification trees. A random subset of variables was selected at each split, thereby constructing a large collection of decision trees with controlled variation. The Random Forests trees are not pruned, so as to obtain low-bias trees (Additional file 2: Figure S1). Every tree in the forest casts a "vote" for the best classification for a given observation, and the class receiving most votes results in the prediction for that specific observation. The derivation dataset was first sampled to create an in-bag partition-(2/3 of derivation sample) to construct the decision tree, and a smaller our-of bag partition (1/3 of derivation sample) to test the constructed tree to evaluate its performance by computing (Additional file 3: Figure S2 ). Then, we performed tenfold cross validation to evaluate the model. The optimal number of trees and a subset of variables at each node was selected using the "tuneRF" package in R (version 3.2.3) to minimize the misclassification error. Random Forests calculates estimates of variable importance for classification using permutation variable importance measure (VIM) [31], which is based on the decrease of a classification accuracy when values of a variable in a node of a tree are permuted randomly. In our cohort, optimal misclassification rate was achieved by using 700 classification trees and 10 variables available for splitting at each tree node.
In this analysis, we converted all categorical variables into a set of binary variables to indicate the absence or presence of a given categorical effect, to increase the computational complexity for tree creation and to mitigate the inherent bias of Random Forests that favors categorical variables with multiple degrees of freedom [34]. We identified a subset of top 30 predictor variables out of the 43 candidate variables and incorporated them into a high-performance logistic model (SAS 9.4, SAS Institute, USA) to identify the best parsimonious model [35]. We used the Schwarz Bayesian Criterion (SBC) as a penalized measure of fit for the logistic regression model to avoid over-fitting [36]. A model with smaller SBC value is preferred over a model with a larger SBC value.

Derivation using traditional and enhanced statistical approaches
The traditional model employed logistic regression with an automated backward variable selection algorithm and generalized linear model. To prevent overfitting, the association of covariates with postoperative AKI had to have a significance level ≤ 0.001 to remain in the model [37].
The enhanced statistical approach employed backward variable selection for logistic regression models within 500 random bootstrap samples drawn with replacement from the original cohort [30], using a significance level ≤ 0.001 for backward stepwise selection to prevent overfitting [37]. We selected variables that were significant in predicting AKI in 50% or more of the bootstrap samples. We then averaged the regression coefficients for each variable across the 500 bootstrap samples.

Point score assignment and internal validation
For each of the three models, we assigned integer scores to retained covariates using the method described by Sullivan et al. [38] (Additional file 4). We then assessed the discrimination (C statistics or AUC) and calibration (Hosmer-Lemeshow (H-L) goodness-of-fit test and a decile-decile calibration plot of the observed and predicted outcome) of each model using the validation datasets.
The Random Forests analyses were performed in R statistical software (version 3.2.3) using the "random-Forest" package [32]. All methods were performed in accordance with the international guidelines for developing and reporting predictive models in biomedical research. The traditional and enhanced statistical models, as well as point score assignment and internal validation, were performed using SAS 9.4 (SAS Institute, USA).

Results
Of 6522 patients who met the selection criteria, 1760 (27.0%) developed AKI within 7 days of surgery. The baseline characteristics of patients with and without postoperative AKI are reported in Additional file 5: Table S2. These baseline characteristics were similarly distributed across the derivation and validation datasets (Additional file 6: Table S3). Compared to those without AKI, patients who developed AKI were more likely to have undergone complex, emergent surgery, to have higher overall preoperative risk (CARE score ≥ 3), and to have a history of atrial fibrillation, cerebrovascular disease, anemia, and endocarditis. The crude and adjusted odds ratios representing the relationship between candidate risk factors and AKI are presented in Additional file 7: Table S4.

Hybrid ML algorithm
The accuracy of the Random Forests model was 92.8% in derivation sample, and 75.5% after tenfold cross-validation. The resulting top 30 predictor variables are summarized in Fig. 1.
The model performance in the derivation sample is presented in Table 2.
The mean of the total risk score was 10.16 (SD = 5.54) across retained covariates. The total risk score was strongly associated with postoperative AKI (OR = 1.20, 95% 1.18-1.22) in univariate logistic regression. The predicted probability threshold with the optimal operating characteristics (e.g., the square of distance between the point (0, 1) on the upper left hand corner of ROC space and any point on ROC curve) [ Figure S3).

Enhanced statistical model using bootstrapping methods
The final enhanced model consisted of 10 predictor variables, including: CARE score, hypertension, atrial fibrillation, HF, smoking status, BMI, surgery type, redo sternotomy, and preoperative intra-aortic balloon pump use ( Table 4).

Discussion
To our knowledge, this study is the first to date that uses a hybrid ML approach to derive and validate a model to predict cardiac surgery-associated AKI of any severity, using only preoperative variables. Our findings suggest that a hybrid ML algorithm predicts better, and is computationally more efficient, than traditional and enhanced techniques for risk modeling. Previous research has shown that the use of automated variable selection methods could result in the selection of non-reproducible sets of independent variables, thus biasing the estimated regression coefficients [40]. Because of this, the use of backward variable selection in repeated bootstrap samples would likely result in improved estimation of regression coefficients with narrower confidence intervals [30]. Our hybrid ML approach benefits form its ability to accommodate inter-correlation between multiple explanatory variables and providing protection from over-fitting the data [15], and thus, outperforms both traditional and enhanced regression models.
Several cardiac surgery-associated AKI risk models have been proposed to date, with the models predicting renal replacement therapy being most robust [9][10][11]. Despite the clinical importance of renal replacement therapy, its low incidence rate (2-3%), late occurrence [41], and end stage physiology limit the practical benefit of these risk models. In contrast, mild AKI is very common (pooled incidence rate of 22.3%) [42] and contributes to considerable perioperative and long-term morbidity and mortality [14]. The kidneys are sensitive to unfavorable physiologic processes in the setting of cardiac surgery, which include hypotension, low cardiac output syndrome, systemic inflammation resulting from the mechanical trauma of extracorporeal red blood cell in contact with artificial surfaces [43,44], as well as the catecholamine surge, decreased vasomotor reactivity and the mismatch of medullary blood flow and renal oxygen consumption that occur during the post-bypass period. Taken together, accurate preoperative prediction of AKI of any severity, prior to exposure to intra-and post-operative stresses, affords clinicians Table 2 Performance of the risk models in the derivation dataset  the greatest window of opportunity to proactively intensify physiologic monitoring, personalizing fluid management and hemodynamic goals to optimize systemic and renal perfusion in at-risk patients [18].
We used KDIGO to define AKI [19], which enables standardization of reporting and compatibility with similar studies. Our high quality, comprehensive clinical databases provided a large number of standardized candidate variables for ML and statistical modeling. Our ML risk model contains 11 variables that are etiologically associated with AKI after cardiac surgery [12]. We found that our ML model was more accurate than the traditional and enhanced statistical models (AUC = 0.75 vs. 0.70 and 0.73, respectively).
In addition, the ML and enhanced statistical models were well calibrated, while the traditional statistical model was not. From a practical perspective, the ML model was more computationally efficient than the enhanced backward selection algorithm using 500 bootstrap samples. Our findings are consistent with the literature, where recent medical applications of ML have shown a high degree of accuracy in predicting various outcomes across a spectrum of clinical settings and diseases [45,46].
Few published studies to date predicted cardiac surgery-associated AKI of any severity. Our ML risk model had a higher predictive ability and was more parsimonious (AUC = 0.75, H-L p = 0.804) than a recent preoperative model for cardiac surgery-associated AKI of any severity (AUC = 0.73, H-L p = 0.490) [20], which was derived using a traditional statistical approach and consisted of 15 risk factors. This model was developed using prospectively collected data from over 30,000 subjects undergoing cardiac surgery at three hospitals in the UK and was externally validated. Our ML model also had similar predictive accuracy and better calibration compared to another contemporary preoperative risk score [22] for any-stage AKI consisted of 10 risk factors (AUC = 0.77, H-L p = 0.06), that was derived using bootstrapping methods and was validated internally. It is to be noted that in the latter model, AKI was defined as that occurring within 30 days of cardiac surgery. This definition likely captures events occurring during surgical readmissions or during complicated and prolonged postoperative stays. These events may be unrelated to the index surgery and may thus be impractical for informing preventative therapy in the intraoperative setting.
Two other published risk models for predicting AKI of any severity after cardiac surgery combined various pre-, intra-and postoperative factors [13,47]. These studies demonstrate that the addition of perioperative factors could improve model performance (AUC = 0.84, and AUC = 0.81, respectively). Further research could be aimed to investigate the additive predictive value of key perioperative variables such as hypotension and low cardiac output, to produce "staged models". Such models would inform preoperative AKI risk stratification for the planning and personalization of pre-and intraoperative management, as well as to enhance prognostication based on intra-and post-operative events. Clinical prediction models and associated risk-scoring systems are popular statistical methods as they permit a rapid assessment of patient risk without the use of computers or other electronic devices [48]. The additive point score assigned to each predictor in the developed models to predict AKI of any severity was derived from wellfit logistic regression models, and can readily be applied at the bedside. These validated scores to predict AKI of any severity following cardiac surgery will aid in clinical decision-making, patient counseling and informed decision-making, resource utilization, and preoperative medical optimization [12]. Future research is recommended to prospectively assess the efficacy of these models to enhance personalized fluid and hemodynamic management, as well as minimizing exposure to nephrotoxins, in preventing perioperative AKI.
Our findings should be interpreted in light of several limitations. First, our study was conducted in the setting of a single tertiary care hospital. Therefore, our ML model needs to be externally validated before it can confidently be used at other institutions and geographic regions. Second, a relatively small number of covariates was included in this study. The performance of the Random Forests approach may be improved in the presence of a larger distribution of covariates [49]. Third, our risk model is tailored to patients undergoing procedures involving cardiopulmonary bypass and may not be applicable in the setting of off-pump CABG [50]. Forth, we did not incorporate urine output criteria in identifying patients with AKI, because this information was not available in our databases. Finally, unmeasured confounding characteristics are an important consideration in any retrospective analysis.

Conclusions
In summary, we derived and internally validated an accurate and well-calibrated preoperative risk model for cardiac surgery-associated AKI of any severity. We found in this study that risk modeling using a hybrid ML approach led to better model performance than parametric statistical approaches, without sacrifice of computational efficiency. Further studies are needed to externally validate this model, as well as to derive and validate staged models to better inform management and prognostication.