Skip to main content

Prediction of postoperative infectious complications in elderly patients with colorectal cancer: a study based on improved machine learning



Infectious complications after colorectal cancer (CRC) surgery increase perioperative mortality and are significantly associated with poor prognosis. We aimed to develop a model for predicting infectious complications after colorectal cancer surgery in elderly patients based on improved machine learning (ML) using inflammatory and nutritional indicators.


The data of 512 elderly patients with colorectal cancer in the Third Affiliated Hospital of Anhui Medical University from March 2018 to April 2022 were retrospectively collected and randomly divided into a training set and validation set. The optimal cutoff values of NLR (3.80), PLR (238.50), PNI (48.48), LCR (0.52), and LMR (2.46) were determined by receiver operating characteristic (ROC) curve; Six conventional machine learning models were constructed using patient data in the training set: Linear Regression, Random Forest, Support Vector Machine (SVM), BP Neural Network (BP), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost) and an improved moderately greedy XGBoost (MGA-XGBoost) model. The performance of the seven models was evaluated by area under the receiver operator characteristic curve, accuracy (ACC), precision, recall, and F1-score of the validation set.


Five hundred twelve cases were included in this study; 125 cases (24%) had postoperative infectious complications. Postoperative infectious complications were notably associated with 10 items features: American Society of Anesthesiologists scores (ASA), operation time, diabetes, presence of stomy, tumor location, NLR, PLR, PNI, LCR, and LMR. MGA-XGBoost reached the highest AUC (0.862) on the validation set, which was the best model for predicting postoperative infectious complications in elderly patients with colorectal cancer. Among the importance of the internal characteristics of the model, LCR accounted for the highest proportion. Conclusions: This study demonstrates for the first time that the MGA-XGBoost model with 10 risk factors might predict postoperative infectious complications in elderly CRC patients.

Peer Review reports


Due to the progress of population aging and the characteristics of intestinal cell susceptibility in the elderly [1], the proportion of colorectal cancer patients aged 65 or over is as high as 70% [2]. At present, surgery is the cornerstone of colorectal cancer treatment. Relevant data show that the age of patients undergoing intestinal surgery is gradually increasing [3]. However, with the aging process, the organ function and immune function of elderly people over 65 years old decrease, accompanied by more basic diseases. Moreover, elderly patients often have poor nutritional absorption after surgery, poor recovery after invasive treatment, and weak resistance to pathogens, so they are prone to postoperative infection. Therefore, in this study, we pay particular attention to the elderly population to improve the prediction accuracy of this population.

Postoperative infectious complications will increase patient costs, and length of hospital stay, and delay the start time of postoperative adjuvant therapy [4]. It is more important that many pieces of evidence show that postoperative infectious complications are significantly associated with poor prognosis of CRC [5, 6]. If the postoperative infectious complications of elderly patients can be predicted early, the survival quality and prognosis of patients can be improved by the timely use of prophylactic antibiotics and early goal-directed therapy. At present, most of the studies only focus on the effect of individual markers on the prediction of postoperative infection. In this paper, we comprehensively consider the influence of various predictive factors of infectious complications: peripheral blood platelet/ peripheral blood lymphocyte (PLR) [7], peripheral blood lymphocyte/peripheral monocytes (LMR) [8], peripheral blood neutrophil/peripheral blood lymphocyte (NLR) [9] lymphocyte/C-reactive protein (LCR) [10], prognostic nutritional index (PNI) [11] on postoperative infection. It has been reported that these factors can predict the incidence of infectious complications in different types of cancer.

Many researchers have attempted to predict the infectious complications following colorectal surgery by using prediction models, which include various clinicopathological factors. These models rely on traditional statistical analysis, such as logistic risk regression, Cox risk regression, and nomogram. Compared with traditional statistical analysis, the advantage of machine learning is that it can capture complex nonlinear relationships from a series of complex medical data sets, and use data to continuously adapt to improve the accuracy, sensitivity, and specificity of the prediction model [12, 13]. However, some medical personnel may not realize that the traditional ML model has overfitting. Therefore, this study improves the XGBoost algorithm (MGA-XGBoost) based on the moderate greedy (MGA) algorithm to improve the accuracy of the prediction model.


Data sources

In this paper,’ colon cancer ‘and ‘rectal cancer ‘as keywords to retrieve the medical record system of the Third Affiliated Hospital of Anhui Medical University. The clinical data of patients with colorectal cancer confirmed by postoperative pathology after radical operations in gastrointestinal surgery from March 2018 to April 2022 were retrospectively collected. Inclusion criteria:1)Age ≥ 65 years old;2)The patient was diagnosed with colorectal cancer and underwent radical resection of colorectal cancer;3)There was no history of radiotherapy and chemotherapy before the operation, no distant organ metastasis, and the postoperative pathological stage was 0, I, II and III;4)No other malignant tumors were found; 5)Complete clinicopathological data; Excluded criteria:1)Age < 65 years; 2)Incomplete clinical data; 3)Patients with acute and chronic infectious diseases and long-term use of immunosuppressive agents before operation;4)Preoperative radiotherapy and chemotherapy or with distal metastasis;5)Postoperative new non-surgical related diseases;6)Emergency surgery for colorectal cancer with intestinal obstruction;7)Patients who cannot accurately assess postoperative complications without doctor’s advice discharge; Preoperative and intraoperative variables were collected for screening of risk factors. Information on the following 25 variables was obtained: age, sex, body mass index (BMI), ASA, smoking status, Previous comorbidities (chronic lung disease, diabetes), surgical methods, intraoperative blood transfusion, presence of stomy, laboratory examination data: within 7 days before surgery(lymphocytes, C-reactive protein, soterocyte, albumin, monocytes, white blood cells, hematocrit, international normalized ratio, fibrinogen, total bilirubin, direct bilirubin, aspartate aminotransferase (AST), blood urea nitrogen (BUN), creatinine, uric acid, Na + 、Ca+), Tumor information: pathological T-stage(T-stage), pathological N-stage (N-stage), pathological stage, tumor location, tumor size; Operation information: intraoperative bleeding, operation time.

Postoperative infection

The common postoperative infectious complications were observed, including respiratory and pulmonary infection, incision infection, anastomotic leakage, abdominal abscess, urinary tract infection, etc. The diagnostic criteria of infection refer to the corresponding guidelines and standard references [5, 14, 15]. Briefly, as follows:1) Incision infection: Skin and subcutaneous tissue infection within 30 days after surgery, wound redness, swelling, heat, pain, local incision drainage pus;2) anastomotic leakage: Clinical signs of peritonitis such as tenderness, rebound pain, and muscle tension were observed. Color Doppler ultrasound showed gas and liquid around the anastomosis, or CT showed anastomotic disconnection;3) Abdominal abscess: Abdominal space infection occurred within 30 days after the operation, manifested as abdominal pain, persistent fever, and other symptoms, confirmed by puncture or B-ultrasound and improved after surgical drainage or anti-infective treatment;4) Uinary tract infection: Cystitis and urethritis occurred within 30 days after the operation. Bladder irritation symptoms such as frequent urination, urgency, and dysuria occurred clinically. A routine urine examination showed pyuria and hematuria. Pathogenic bacteria were cultured in urine;5) Pulmonary infection: The patient presented with body temperature > 38.0 °C, elevated white blood cell count, cough, expectoration, and other clinical symptoms. Dry and wet rales were heard in the lungs, and a chest X-ray showed new invasive lesions.

Conventional statistical analysis

SPSS24.0 software was used to process and analyze the data. The optimal cutoff values of NLR, PLR, PNI, LCR, and LMR were determined by the receiver operating characteristic curve, as shown in Table 1. There is no uniform standard for the study of the five optimal cut-off values determined by AUC. The cut-off values of PNI ranged from 40.1 [16] to 51.26 [17], the cut-off values of LCR ranged from 0.34 [18] to 0.84 [19], the cut-off values of NLR ranged from 1.93 [17] to 4.8 [20], the cut-off values of PLR ranged from 190.83 [21] to 645.22 [17], and the cut-off values of LMR ranged from 2 [22] to 3.6 [23], which were consistent with the results of this study. In the univariate analysis, Continuous variables (such as Body Mass Index) were reported as mean ± standard deviation and analyzed using the U test to assess the significance level between the infected group and the non-infected group. Count data in univariate analysis were expressed by rate or the number of cases, and the χ2 test was used between groups. Factors with statistical significance in single factors were included in the machine learning model. P < 0.05 was considered statistically significant. We continuous variables were normalized based on the mean and SD of the training set. Categorical variables were encoded into binary variable, 1 represents having an incident, 0 represents not having an incident. Gender was also encoded, 1 represents male, 0 represents female. Overfitting may occur in the process of model training, thus destroying the performance of the model. Therefore, we first perform single factor analysis to filter out features that are not statistically significant, and then introduce the recursive feature elimination (RFE) method of random forest. This method first trains all features, then recursively removes the least important features, and selects the feature set with the highest recall score [24].

Table 1 The best cut-off values of the five indicators

Machine learning

In this paper, python3.9 was used to construct various machine learning models (Linear Regression, Random Forest, SVM, BP, LGBM, XGBoost) to predict postoperative infectious complications in elderly patients with colorectal cancer. Except for XGBoost, the other five models are built by installing the scikit-learn package in python3.9. The data of 512 elderly patients were randomly divided into a 70% training set and a 30% validation set. The training set data is used to develop the prediction model, and the validation set data is used to verify the performance of the model. The performance of the model was evaluated by the AUC, ACC, recall, F1-score, and precision.

Development of optimization algorithm

The use of the XGBoost model often faces two major problems:1) When the XGBoost model is used for prediction, there are many parameters to be adjusted, and the process of parameter adjustment is tedious. It is difficult to select the best parameters for the current problem;2) The XGBoost model applied to the idea of Gradient Boosting has the risk of overfitting; Therefore, this paper uses Greedy Algorithm (GA) to adjust the parameters; However, the GA algorithm also has some shortcomings in the context of the current problem. For example, the result of the previous iteration will directly affect the result of the next iteration, resulting in a fallacy. Then the greedy algorithm will cause a large error in the final result. Therefore, this paper proposes a Moderate Greedy Algorithm (MGA) to remedy and correct this. MGA is consistent with GA in solving the problem and will make a better choice in the current state and gradually construct the optimal solution. MGA is actually to introduce the principle of moderation based on GA thought, restrain the greedy range, avoid excessive greed, and lead to the accumulation of errors, resulting in a large error in the final result. The optimal results can be obtained by selecting the appropriate moderate principle, and the weighted ensemble learning method is used to increase its robustness.

In this paper, the MGA algorithm is used to adjust the max_depth, min_child_weight, gamma, subsample, colsample_bytree, reg_alpha, reg_lambda parameters of XGBoost. The parameters are grouped in a greedy idea and optimized step by step, and e-ach time does not only depend on the optimal parameter subset but select several optimal parameter subsets. If the seven parameters are optimized by grid search, it not only has a large amount of calculation, but also limits the range of each parameter. Therefore, we use a greedy method to group the parameters and optimize them step by step, and each time we do not only depend on the optimal parameter subset, but also select several optimal parameter subsets (so the algorithm is called ‘MGA’). The main operation details are shown in Table 2.

Table 2 The value process of MGA

The value range of parameter adjustment is shown in Table 3:

Table 3 Range of XGBoost parameters

Based on the idea of greedy algorithm, we divide the parameter adjustment process of XGBoost into six steps. Under the condition of local optimal parameters obtained after each step of parameter adjustment, the next step is to optimize other parameters. And so on until all parameters are adjusted.

The main idea of boosting algorithm is to combine multiple weak learners with high deviation to reduce the overall deviation and form a strong learner. we worry that if a single XGBoost is used, it will perform poorly in modeling. In order to avoid the risk of overfitting due to inconsistent data distribution and small data sample size, we use the integration of XGBoost to increase the robustness of the model. In the process of parameter adjustment, not only the optimal set of parameters is taken, but several sets of better parameter models are selected. Steps of parameter adjustment:

  1. First adjust the two sets of parameters of max_depth and min_child_weight, and select the two sets of parameters with the best score.

  2. Secondly, the gamma parameter is adjusted to retain the optimal two sets of data.

  3. Then adjust the two sets of parameters of subsample and colsample_bytree, and select the optimal two sets of data.

  4. Then, the parameters of the two sets of regular coefficients reg_alpha and reg_lambda are adjusted to select the optimal set of data.

  5. Therefore, there are now 2*2*2*1 = 8 sets of data. Finally, the parameters of learning_rate and num_boost_round are adjusted to select the optimal set of parameters.

Here in the tuning step also consider divided into different ‘step group ‘, The so-called step group is the nine parameters listed above, which can be randomly divided into several steps, and adjust one or two parameters in each step. For example, the above five-step adjustment can be used as a step group; “the first step: max_depth; the second step: min_child_weight; the third step: gamma; step 4: subsample, colsample_bytree; step 5: reg_alpha, reg_lambda; the sixth step: learning_rate, num_boost_round”, such parameter group adjustment can be said to be another “step group”. In summary, I finally got a total of 8 sets of optimal XGBoost experimental parameters as follows (Table 4):

Table 4 8 groups of XGBoost model parameter results

The eight sets of XGBoost experimental parameter models obtained by the above methods are compared and sorted according to the optimal and sub-priority of the parameters, and then weighted ensemble learning is performed. The optimal allocation weight is 2/3, and the suboptimal allocation weight is 1/3. The number of iterations is set to 500. Therefore, the proportion of the eight groups of parametric models obtained is 0.296, 0.148, 0.148, 0.074, 0.148, 0.074,0.074, 0.074, 0.038(See the source code for details: Therefore, the final MGA-XGBoost model is Model = 0.296*model1 + 0.148*model2 + 0.148*model3 + 0.074*model4 + 0.148*model5 + 0.074*model6 + 0.074*model7 + 0.038*model8.


Patient characteristics

From March 2018 to April 2022,563 elderly patients underwent radical resection of colorectal cancer in the gastrointestinal surgery department of our hospital. After exclusion and inclusion criteria screening,512 patients were included in the study. In this completed data set, no variables had missing percentage higher than 1%. We employed mean imputation, which imputed missing value with the mean of each feature, to fill in missing values. Patients with postoperative infectious complications accounted for 24% (n = 125), 70% (n = 358) in the training sets, and 30% (n = 154) in the validation set. There were 295 male patients (57.62%) and 217 female patients (42.38%). The characteristics of the data set are shown in Table 5.

Table 5 characteristics of the study patients(n = 512)

Feature selection using univariate and recursive feature elimination methods

To better understand the data characteristics of the model, the patients were divided into an infected group and a non-infected group according to the training set and validation set, and then the data were analyzed by single factor analysis. Since less relevant features may have a negative impact on the performance of machine learning models, we further use the recursive feature elimination (RFE) method to select features and rank the importance of features. The univariate and RFE methods are used for feature selection to reduce 36 features to 10 features. These 10 features were ASA, operation time, diabetes, presence of stomy, tumor location, NLR, PLR, PNI, LCR, and LMR (P < 0.05). The results of single factor analysis are shown in Table 6, and the feature ranking of RFE method is shown in Fig. 1.

Table 6 Data characteristics analysis of the infected group and non-infected group(n = 512)
Fig. 1
figure 1

Feature importance ranking of the selected 10 features illustrated by random forest

Correlation analysis between risk factors

To better see whether there is a correlation between risk factors, this paper analyzes the correlation of statistically significant indicators in RFE methods. The results showed that there was a high correlation between PNI and LMR (0.71)、NLR and PLR (0.35). The detailed results are shown in Fig. 2.

Fig. 2
figure 2

Correlation analysis between risk factors

Performance evaluation of machine learning models for predicting postoperative infectious complications

To evaluate the predictive effect of seven machine learning models on postoperative infectious complications in elderly patients. The results showed that the AUC value of the MGA-XGBoost prediction model was the highest (0.862), and Linear Regression, SVM, and BP all showed general predictive ability (the AUC range was 0.6 ~ 0.73). The AUC value of each model is shown in Fig. 3.

Fig. 3
figure 3

ROC curve for predicting postoperative infectious complications on the validation set

In addition to AUC, this paper also introduces ACC, Recall, F1-score, and Precision to evaluate the performance of various prediction models. It can be seen from Table 7 that MGA-XGBoost, LGBM and XGBoost all show good accuracy and precision.

Table 7 The performance of 7 ML models in the validation set

Feature importance analysis of MGA-XGBoost model

In this paper, the importance of internal features in the verification data set of the MGA-XGBoost prediction model with the highest accuracy is visually displayed by three methods of cover, weight and gain. Visualized mathematical publicity is:

$$S=\left(\frac{i_{cover}}{\sum_i cover}+\frac{i_{weight}}{\sum_i weight}+\frac{i_{gain}}{\sum_i gain}\right)\times 100$$

Where S is the total score of the three methods of each feature, i is the score of each independent feature, icover, iweight and igain are the scores of each independent feature.) As shown in Fig. 4, LCR, diabetes and operation time ranked first, second and third respectively.

Fig. 4
figure 4

Feature importance analysis of MGA-XGBoost model in the validation set


This research based on clinical data and machine learning methods has the following main contributions:1) The first study found that 10 factors were significantly associated with infectious complications after colon cancer surgery: ASA, operation time, diabetes, tumor location, presence of stomy, NLR, PLR, PNI, LCR, and LMR;2) The second study constructed a conventional predictive model for postoperative infectious complications in elderly patients with colorectal cancer. The results showed that the LGBM model performed best in predicting postoperative infection compared with the other five machine learning models. The AUC was 0.833, the accuracy was 0.844, and the precision was 0.708;3) The third research work is extended based on the second work, focusing on improving the XGBoost model to improve the accuracy of the model. The results show that the MGA-XGBoost prediction model has the highest AUC value (0.862), the accuracy is 0.877, and the precision is 0.731, showing great potential for future application in the field of intelligent medical care;4) The fourth research work visualizes the importance of internal features of the MGA-XGBoost prediction model, overcomes the shortcomings of opaque and unexplainable machine learning, and greatly improves the future clinical application prospects of machine learning;5) Finally, in the case of early diagnosis of postoperative infection by the model, antibiotics can be used in time for treatment, rather than treatment based on late symptoms and clinical deterioration. Avoid unnecessary and excessive use of antibiotics in low-risk patients. At the same time, postoperative care should be strengthened for high-risk patients, such as actively encouraging patient activity, promoting sputum coughing, and improving clinical outcomes in elderly patients.

The role of systemic inflammatory response and nutritional status in cancer patients is increasingly recognized [25]. For example, systemic inflammatory response indicators and nutritional indicators can be used to predict infectious complications after malignant tumor surgery [11, 26]. Okugawa [10] found that low preoperative LCR was an independent risk factor for surgical site infection in patients with colorectal cancer. Because cancer status usually activates systemic inflammatory responses, invasive surgery triggers abnormally enhanced inflammatory responses that reduce patient immunity [27]. Consistent with our study, preoperative NLR and PLR levels increased, and LCR and LMR levels decreased, suggesting a higher risk of postoperative infectious complications. It is worth noting that LCR ranks first in the importance ranking of internal features of MGA-XGBoost model. Okita [28] pointed out that low PNI may be a significant predictor of postoperative infectious complications in patients with ulcerative colitis undergoing proctectomy with ileal pouch-anal anastomosis. Cancer patients occasionally have impaired nutritional intake during the perioperative period [29]. Malnutrition can also lead to the decline of immune function in cancer patients [30], especially hypoproteinemia has a significant effect on humoral immunity, which can cause pathogen translocation, conditional pathogen transformation, and fungal reproduction [31]. Studies have shown that immune nutrition and special enteral formula can reduce the incidence of postoperative infectious complications in patients with colorectal cancer surgery [32]. The results of this study showed that with preoperative PNI < 48.48, the incidence of postoperative infectious complications increased. Therefore, this study showed that inflammatory response and nutritional indicators were significantly associated with postoperative infection. At the same time, this study determined five comprehensive inflammatory indicators related to postoperative infection of colorectal cancer by single factor analysis and RFE method. According to different literature reports, these risk factors were significantly associated with postoperative infection [33, 34]. In the era of rapid rehabilitation surgery, it is important to use these markers for early prediction of infection, and early diagnosis to avoid readmission and reduce medical costs.

ML refers to the iterative and automatic optimization of mathematical models to gradually and accurately fit available data [35]. There are thousands of machine learning algorithms, but each model has its limitations and the best algorithm is uncertain in different situations [36]. The best model usually depends on the sample data set and analysis purpose in a specific scenario [37]. For example, the BP model in this study has the lowest accuracy, which may be because BP transforms the characteristics of all problems into numbers and all reasoning into numerical calculations, resulting in the loss of information in its results [38]. Therefore, in this study, we calculated the prediction accuracy of six conventional machine learning models and compared their performance, among which the LGBM model showed the best prediction ability. LGBM prevents the model from falling into the local optimal solution by pruning and uses the second derivative to use the sampling method in each iteration to prevent overfitting [39]. Therefore, LGBM has the best overall performance in the conventional machine learning model for predicting postoperative infection of colorectal cancer, with an AUC of 0.833, an accuracy of 0.844, and a precision of 0.708.

Most clinicians usually use standard statistical software packages (such as R) to develop some machine learning methods, but standard software packages cannot make up for the shortcomings of machine learning itself. For example, XGBoost performs well in various ML competitions, but it usually has problems with many parameters and cumbersome adjustments. Therefore, some scholars have studied the improvement of XGBoost. In 2021, Peng [40] constructed a new model for predicting hypertension based on hybrid feature selection and standard XGBoost. The new model is about 7% higher than the AUC and the accuracy of the model is without improvement. Zhang [41] proposed a GA-XGBoost model for diabetes risk prediction. The experimental results show that the prediction accuracy of the GA-XGBoost model is better than that of linear regression, decision tree, support vector machine, and neural network, and the parameter adjustment time is less than that of grid search and random walk. In this study, Python3.9 uses the greedy idea to group the parameters and tune them step by step. Each time several parameter subsets are selected, and the final model is obtained by weighting. After training on 70% of the full data set, MGA-XGBoost increased AUC by 7.4% in the 30% data set test. Therefore, the improved XGBoost model established in this study can help clinicians make the best prediction. This study shows that advances in artificial intelligence and machine learning will positively improve the performance of clinical predictive models.

Although complex algorithms such as XGBoost, support vector machine, and artificial neural network are increasingly popular and widely used in predictive modeling, they are based on a ‘black box ‘design and are difficult to explain and apply in clinical practice [42]. Clinicians should require the transparency and interpretability of the algorithm so that artificial intelligence can be responsible for its predictions and recommendations. However, the improvement of model interpretability cannot be at the expense of accuracy. Our main goal is to construct a more accurate, interpretable, and robust ML model for postoperative infection in elderly patients with colorectal cancer. Therefore, this study the importance of internal features in the verification data set of the MGA-XGBoost prediction model with the highest accuracy is visually displayed by three methods of cover, weight and gain. By opening the internal structure of the MGA-XGBoost model, the priority of these features in this study is distinguished. This method is superior to other previously published opaque machine learning models. This provides an important basis for the clinical perioperative management of elderly patients. For controllable risk factors, clinicians can consider taking intervention measures to solve these risk factors or control them within a certain range before surgery to optimize the patient’s condition.

In this study, the interior of the MGA-XGBoost model shows the importance of blood glucose indicators. In the comparative correlation analysis, the correlation between blood glucose and postoperative infectious complications was only 0.23, so it may be missed in routine analysis. Postoperative hyperglycemia is a common perioperative stress response [43]. Marks [44] believed that perioperative blood glucose in diabetic patients should be stable at 6.67–10.0 mmol/L, and blood glucose greater than 13.9 mmol/L and less than 4.8 mmol/L are unfavorable to patients. At the same time, Nakamura et al. [45] pointed out that even under strict perioperative blood glucose control, diabetes is directly related to the increased risk of surgical site infection. It shows that there is an internal relationship between diabetes and surgical partial infection, not just diabetes-related hyperglycemia. It may be due to metabolic disorders such as sugar and protein in diabetic patients, resulting in reduced white blood cell bactericidal capacity and reduced production of immunoglobulins and antibodies, resulting in low immunity. In addition, elderly patients with a longer duration of diabetes are prone to vascular neuropathy, resulting in slow blood flow and reduced tissue oxygen supply, which is conducive to the growth of fungi and anaerobic bacteria, so they are more prone to postoperative infection than non-diabetic patients. Therefore, medical staff should strictly control the blood glucose level of diabetic patients, and continue to use insulin during the perioperative period to avoid excessive blood glucose fluctuations.

In the study of inflammatory response indicators, Okugawa [10] compared the predictive ability of LCR, CAR, NLR, PLR, and other inflammatory indicators. The results showed that LCR had the highest correlation with colorectal cancer recurrence and was a more reliable biomarker. It may be because preoperative CRP is associated with lymphopenia and T lymphocyte reaction cell damage in patients with colorectal cancer [46], and lymphocytes play a key role in the host’s cytotoxic immune response to tumors, which impairs cell-mediated immunity in patients with colorectal cancer. In the MGA-XGBoost model, LCR is the best predictor of postoperative infection in colorectal cancer compared with other inflammatory indicators.

For patients with longer operation time, the operation will increase the exposure time of the surgical site tissue, so the more chance of contamination. Mik et al. [47] found that a total operation time of more than 180 minutes increases the risk of surgical site infection in deep incisions and organ spaces. At the same time, the longer the operation time, the greater the possible trauma and the more blood loss, which further reduces the patient ‘s resistance and makes the patient more prone to infection. It is suggested that for patients with long expected operation time, a detailed surgical plan should be formulated before operation, so as to shorten the operation time as much as possible while ensuring the quality of operation, and at the same time, second-generation antibiotics should be given appropriately for prevention and control. This model can not only explain the relationship between features and risk factors but also predict the importance of features for individuals. If the model is prospectively validated, it can help clinicians determine which part of the intervention is most important, thus providing an interpretable and powerful tool for preventing postoperative infection.

This study has several remarkable limitations. First of all, the training sample size is limited, because the queue only comes from one center, which may lead to over-fitting of the model. In the future, multi-center research is needed for external verification. Secondly, this study is retrospective, and there may be collection and input bias and inevitable selection bias. For example, the incidence of postoperative anastomotic leakage is extremely low in our study. To improve the performance of artificial intelligence models, the models established in this study will eventually be applied to other medical sites to verify their scalability.

In summary, our study demonstrates for the first time that the MGA-XGBoost model with 10 risk factors can predict postoperative infectious complications in elderly CRC patients. At the same time, combining risk prediction with feature importance analysis allows clinicians to assess postoperative risks and potentially modifiable drivers.

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon reasonable request.


  1. Risques RA, Lai LA, Brentnall TA, Li L, Feng Z, Gallaher J, Rabinovitch PS. Ulcerative colitis is a disease of accelerated colon aging: evidence from telomere attrition and DNA damage. Gastroenterology. 2008;135(2):410–8.

    Article  PubMed  CAS  Google Scholar 

  2. Vallribera Valls F, Landi F, Espín Basany E, Sánchez García JL, Jiménez Gómez LM, Martí Gallostra M, Armengol CM. Laparoscopy-assisted versus open colectomy for treatment of colon cancer in the elderly: morbidity and mortality outcomes in 545 patients. Surg Endosc. 2014;28:3373–8.

    Article  PubMed  Google Scholar 

  3. Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly LF, Anderson JC, Jemal A. Colorectal cancer statistics. CA: a Cancer J Clinic. 2020;70(3):145–64.

    Google Scholar 

  4. Ramanathan ML, MacKay G, Platt J, Horgan PG, McMillan DC. The impact of open versus laparoscopic resection for colon cancer on C-reactive protein concentrations as a predictor of postoperative infective complications. Ann Surg Oncol. 2015;22:938–43.

    Article  PubMed  Google Scholar 

  5. Lawler J, Choynowski M, Bailey K, Bucholc M, Johnston A, Sugrue M. Meta-analysis of the impact of postoperative infective complications on oncological outcomes in colorectal cancer surgery. BJS open. 2020;4(5):737–47.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Watt DG, McSorley ST, Park JH, Horgan PG, McMillan DC. A postoperative systemic inflammation score predicts short-and long-term outcomes in patients undergoing surgery for colorectal cancer. Ann Surg Oncol. 2017;24:1100–9.

    Article  PubMed  Google Scholar 

  7. Gl Z, Chen J, Wang J, Wang S, Xia J, Wei Y, Huang X. Predictive value of postoperative NLR, PLR and LMR for early periprosthetic joint infection after total joint arthroplasty: a pilot study; 2020.

    Book  Google Scholar 

  8. Kamonvarapitak T, Matsuda A, Matsumoto S, Jamjittrong S, Sakurazawa N, Kawano Y, Yoshida H. Preoperative lymphocyte-to-monocyte ratio predicts postoperative infectious complications after laparoscopic colorectal cancer surgery. Int J Clin Oncol. 2020;25:633–40.

    Article  PubMed  CAS  Google Scholar 

  9. Wang C, Huang HZ, He Y, Yu YJ, Zhou QM, Wang RJ, Han SL. A new nomogram based on early postoperative NLR for predicting infectious complications after gastrectomy. Cancer Manag Res. 2020;12:881.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Okugawa Y, Toiyama Y, Yamamoto A, Shigemori T, Ide S, Kitajima T, Kusunoki M. Lymphocyte-C-reactive protein ratio as promising new marker for predicting surgical and oncological outcomes in colorectal cancer. Ann Surg. 2020;272(2):342–51.

    Article  PubMed  Google Scholar 

  11. Matsuda T, Umeda Y, Matsuda T, Endo Y, Sato D, Kojima T, Fujiwara T. Preoperative prognostic nutritional index predicts postoperative infectious complications and oncological outcomes after hepatectomy in intrahepatic cholangiocarcinoma. BMC Cancer. 2021;21(1):1–12.

    Article  Google Scholar 

  12. Xue B, Li D, Lu C, King CR, Wildes T, Avidan MS, Abraham J. Use of machine learning to develop and evaluate models using preoperative and intraoperative data to identify risks of postoperative complications. JAMA Netw Open. 2021;4(3):e212240.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Chen C, Yang D, Gao S, Zhang Y, Chen L, Wang B, Zhou S. Development and performance assessment of novel machine learning models to predict pneumonia after liver transplantation. Respir Res. 2021;22(1):1–12.

    Article  Google Scholar 

  14. Okano K, Hirao T, Unno M, Fujii T, Yoshitomi H, Suzuki S, Suzuki Y. Postoperative infectious complications after pancreatic resection. J British Surg. 2015;102(12):1551–60.

    Article  CAS  Google Scholar 

  15. Berríos-Torres SI, Umscheid CA, Bratzler DW, Leas B, Stone EC, Kelz RR. Healthcare infection control practices advisory committee. Centers for disease control and prevention guideline for the prevention of surgical site infection, 2017. JAMA Surg. 2017;152(8):784–91.

    Article  PubMed  Google Scholar 

  16. Mirili C, Yılmaz A, Demirkan S, et al. Clinical significance of prognostic nutritional index (PNI) in malignant melanoma. Int J Clin Oncol. 2019;24:1301–10.

    Article  PubMed  CAS  Google Scholar 

  17. Hua X, Long ZQ, Huang X, et al. The value of prognostic nutritional index (PNI) in predicting survival and guiding radiotherapy of patients with T1-2N1 breast cancer. Front Oncol. 2020;9:1562.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Yildirim M, Koca B. Lymphocyte C-reactive protein ratio: a new biomarker to predict early complications after gastrointestinal oncologic surgery. Cancer Biomarkers. 2021;31(4):409–17.

    Article  PubMed  CAS  Google Scholar 

  19. Iseda N, Itoh S, Yoshizumi T, et al. Lymphocyte-to-C-reactive protein ratio as a prognostic factor for hepatocellular carcinoma. Int J Clin Oncol. 2021;26:1890–900.

    Article  PubMed  CAS  Google Scholar 

  20. Suppiah A, Malde D, Arab T, et al. The prognostic value of the neutrophil–lymphocyte ratio (NLR) in acute pancreatitis: identification of an optimal NLR. J Gastrointest Surg. 2013;17:675–81.

    Article  PubMed  Google Scholar 

  21. Eren T. Prognostic significance of the preoperative lymphocyte to C-reactive protein ratio in patients with stage III colorectal cancer. ANZ J Surg. 2022;92(10):2585–94.

    Article  PubMed  Google Scholar 

  22. Romano A, Parrinello NL, Vetro C, et al. Prognostic meaning of neutrophil to lymphocyte ratio (NLR) and lymphocyte to monocyte ration (LMR) in newly diagnosed Hodgkin lymphoma patients treated upfront with a PET-2 based strategy. Ann Hematol. 2018;97:1009–18.

    Article  PubMed  Google Scholar 

  23. Jiang P, Li X, Wang S, et al. Prognostic significance of PNI in patients with pancreatic head cancer undergoing laparoscopic pancreaticoduodenectomy. Front Surg. 2022;9:897033.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Darst BF, Malecki KC, Engelman CD. Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018;19(1):1–6.

    Google Scholar 

  25. Geng Y, Qi Q, Sun M, Chen H, Wang P, Chen Z. Prognostic nutritional index predicts survival and correlates with systemic inflammatory response in advanced pancreatic cancer. Eur J Surg Oncol. 2015;41(11):1508–14.

    Article  PubMed  CAS  Google Scholar 

  26. Duran H, Alpdemir M, Çeken N, Alpdemir MF, Kula AT. Neutrophil/lymphocyte and platelet/lymphocyte ratios as a biomarker in postoperative wound infections. Turk J Biochem. 2022;47(6):756–62.

    Article  CAS  Google Scholar 

  27. Moyes LH, Leitch EF, McKee RF, Anderson JH, Horgan PG, McMillan DC. Preoperative systemic inflammation predicts postoperative infectious complications in patients undergoing curative resection for colorectal cancer. Br J Cancer. 2009;100(8):1236–9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Okita Y, Araki T, Okugawa Y, Kondo S, Fujikawa H, Hiro J, Kusunoki M. The prognostic nutritional index for postoperative infectious complication in patients with ulcerative colitis undergoing proctectomy with ileal pouch-anal anastomosis following subtotal colectomy. J Anus Rectum Colon. 2019;3(2):91–7.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Zhang X, Chen X, Yang J, Hu Y, Li K. Effects of nutritional support on the clinical outcomes of well-nourished patients with cancer: a meta-analysis. Eur J Clin Nutr. 2020;74(10):1389–400.

    Article  PubMed  Google Scholar 

  30. Hayashi H, Shimizu A, Kubota K, Notake T, Masuo H, Yoshizawa T, Soejima Y. Combination of sarcopenia and prognostic nutritional index to predict long-term outcomes in patients undergoing initial hepatectomy for hepatocellular carcinoma. Asian J Surg. 2023;46(2):816–23.

    Article  PubMed  Google Scholar 

  31. Li F, Yuan MZ, Wang L, Wang XF, Liu GW. Characteristics and prognosis of pulmonary infection in patients with neurologic disease and hypoproteinemia. Expert Rev Anti-Infect Ther. 2015;13(4):521–6.

    PubMed  CAS  Google Scholar 

  32. Martos-Benítez FD, Gutiérrez-Noyola A, Soto-García A, González-Martínez I, Betancourt-Plaza I. Program of gastrointestinal rehabilitation and early postoperative enteral nutrition: a prospective study. Updat Surg. 2018;70(1):105–12.

    Article  Google Scholar 

  33. Wen J, Pan T, Yuan YC, Huang QS, Shen J. Nomogram to predict postoperative infectious complications after surgery for colorectal cancer: a retrospective cohort study in China. World J Surg Oncol. 2021;19(1):1–9.

    Article  Google Scholar 

  34. Xu Z, Qu H, Kanani G, Guo Z, Ren Y, Chen X. Update on risk factors of surgical site infection in colorectal cancer: a systematic review and meta-analysis. Int J Color Dis. 2020;35:2147–56.

    Article  Google Scholar 

  35. Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, Eastwood CA. Performance of machine learning algorithms for surgical site infection case detection and prediction: a systematic review and meta-analysis. Ann Med Surg. 2022:104956.

  36. Scardoni A, Balzarini F, Signorelli C, Cabitza F, Odone A. Artificial intelligence-based tools to control healthcare associated infections: a systematic review of the literature. J Infect Public Health. 2020;13(8):1061–77.

    Article  PubMed  Google Scholar 

  37. Roth JA, Battegay M, Juchler F, Vogt JE, Widmer AF. Introduction to machine learning in digital healthcare epidemiology. Infect Control Hospital Epidemiol. 2018;39(12):1457–62.

    Article  Google Scholar 

  38. Wang J. Analysis of sports performance prediction model based on GA-BP neural network algorithm. Comput Intell Neurosci. 2021;2021

  39. Lee KH, Chu YC, Tsai MT, Tseng WC, Lin YP, Ou SM, Tarng DC. Artificial intelligence for risk prediction of end-stage renal disease in sepsis survivors with chronic kidney disease. Biomedicines. 2022;10(3):546.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Peng Y, Xu J, Ma L, Wang J. Prediction of hypertension risks with feature selection and XGBoost. J Mechan Med Biol. 2021;21(05):2140028.

    Article  Google Scholar 

  41. Zhang CF, Wang S, Wu YD, Wang Y, Zhang HY. Diabetes risk prediction based on GA-Xgboost model. Chin Comput Eng. 2020;46(03):315–20.

    Google Scholar 

  42. Nudel J, Bishara AM, de Geus SW, Patil P, Srinivasan J, Hess DT, Woodson J. Development and validation of machine learning models to predict gastrointestinal leak and venous thromboembolism after weight loss surgery: an analysis of the MBSAQIP database. Surg Endosc. 2021;35(1):182–91.

    Article  PubMed  Google Scholar 

  43. Yuan J, Liu T, Zhang X, Si Y, Ye Y, Zhao C, Shen X. Intensive versus conventional glycemic control in patients with diabetes during enteral nutrition after gastrectomy. J Gastrointest Surg. 2015;19:1553–8.

    Article  PubMed  Google Scholar 

  44. Marks JB. Perioperative management of diabetes. Am Fam Physician. 2003;67(1):93–100.

    PubMed  Google Scholar 

  45. Nakamura T, Sato T, Takayama Y, Naito M, Yamanashi T, Miura H, Watanabe M. Risk factors for surgical site infection after laparoscopic surgery for colon cancer. Surg Infect. 2016;17(4):454–8.

    Article  Google Scholar 

  46. Kwon KA, Kim SH, Oh SY, Lee S, Han JY, Kim KH, Lee JH. Clinical significance of preoperative serum vascular endothelial growth factor, interleukin-6, and C-reactive protein level in colorectal cancer. BMC Cancer. 2010;10(1):1–8.

    Article  CAS  Google Scholar 

  47. Mik M, Berut M, Trzcinski R, et al. Preoperative oral antibiotics reduce infections after colorectal cancer surgery. Langenbeck's Arch Surg. 2016;401:1153–62.

    Article  Google Scholar 

Download references


The authors thank all of the patients who kindly participated in this study.


Major Project of Science and Technology Department of Anhui Province-Life and Health (No.2022e07020060).

Author information

Authors and Affiliations



Yuan Tian, Lei He and Hongxia Li participated in the research design. Yuan Tian and Lei He participated in the writing of the paper. Yuan Tian, Rui Li, Guanlong Wang and Kai Xu analyzed and explained the data. All authors collected and helped analyze the research data. All authors read and approve the final manuscript.

Corresponding author

Correspondence to Lei He.

Ethics declarations

Ethical approval and consent to participate

This study protocol was approved by the Ethics Committee of the Third Affiliated Hospital of Anhui Medical University (The First People’s Hospital of Hefei) (No. PJ2023–03–01, approval on 19/03/2023) The study took place at the Third Affiliated Hospital of Anhui Medical University (The First People’s Hospital of Hefei). Each patient provided written informed consent before participation in the study. All the methods were carried out in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, Y., Li, R., Wang, G. et al. Prediction of postoperative infectious complications in elderly patients with colorectal cancer: a study based on improved machine learning. BMC Med Inform Decis Mak 24, 11 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: