Patients
Data on consecutive patients who had AP were retrospectively collected from the Renmin Hospital of Wuhan University (RM) between January 6, 2016, and October 22, 2020, and from the Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology (TJ) between 2018 and 2019. The study was approved by the Institutional Ethics Committee of the Renmin Hospital of Wuhan University (2021-RM-02106) and the Central Hospital of Wuhan (2021ks06109). Informed consent was waived from all patients for their data to be used for research. The methods and reporting of results adhering to Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines: Explanation and Elaboration guidelines [18, 19]. Inclusion criteria: (1) Patients admitted to the hospital with a diagnosis of AP by using international consensus [2]; (2) Patients who admitted for the first occurrence of AP; (3) Patients with complete clinical, radiological and laboratory findings within 48 h after admission; (4) Patients with complete clinical course record. Exclusion criteria: (1) Chronic pancreatitis patients with recurrent acute attacks; (2) Patients who have lost follow-up; (3) incomplete clinical data within the first 48 h after admission; (4) pancreatitis cancer; (5) AP caused by endoscopic surgery, developed organ failure, infected pancreatic necrosis or both before hospital admission. Organ failure is defined as a score of two or more for any one of three organ systems (respiratory, cardiovascular, or renal) using the modified Marshall scoring system [20]. Patients were stratified into high-risk or low-risk groups based on the likelihood who will suffer from critical illness or not. The workflow of patient selection is illustrated in Fig. 1.
Endpoints
We defined the admission of ICU as the endpoint of the follow-up. Patient admission to the ICU was at the discretion of the medical or surgical team based on physiologic variables, laboratory criteria according to the guidelines for ICU admission, discharge, and triage issued by the American College of Critical Care Medicine [21] and the Revised Atlanta Classification of Acute Pancreatitis [22]. The ICU admission criteria included: (1) moderate AP patients with transient organ failure or local or systemic complications; (2) systemic complications without persistent organ failure (< 48 h); (3) pancreatic and peripancreatic abscesses; (4) digestive tract fistula; (5) systemic infection; (6) intra-abdominal hypertension; (7) abdominal compartment syndrome; (8)pancreatic encephalopathy; (9)sepsis; (10)moderate SAP; (11) SAP or critical AP patients, including persistent one or multiple organ failure, infected pancreatic necrosis, or both.
Potential predictive variables
Clinical variables associated with intensive care unit risk were assessed a priori based on clinical importance, scientific knowledge, and predictors identified in previously literatures [23,24,25,26]. Variables with more than 20% missing values were excluded in our study. A total of 59 variables were collected as potential predictive factors, including sex, age, temperature, heart rate, systolic blood pressure, diastolic blood pressure, mental status, BMI, pathogenesis, alcohol, comorbid diseases, pleural effusions, pulmonary infiltration, Epidermal Growth Factor Receptor (eGFR), Urea/Serum Creatinine (Ur/Cr), Total Protein (TP), Total Bilirubin (TBIL), Serum Total Cholesterol (TC), Direct Bilirubin (DBIL), Anion Gap (AG), Aspartate Amino Transferase (AST), Triglyceride (TG), Globulin (GLB), Prealbumin (PA), Glucose (Glu), Uric Acid (UA), Urea, Serum Sodium (Na), Serum Magnesium (Mg), Serum Chlorine (Cl), Serum Phosphate (IP), Alkaline Phosphatase (ALP), Serum Potassium (K), Serum Creatinine (Cr), Serum Calcium (Ca), Total Carbon Dioxide (TCO2), Cholinesterase (CHE), Alanine Aminotransferase (ALT), Albumin (ALB), ratio of albumin and globulin (A/G), γ-glutamyl transpeptidase (GGT), ALT/AST, Neutrophil (Neu), Percentage of Neutrophilic Granulocyte (Neu%), Mean Platelet Volume (MPV), Platelet (PLT), Platelet Volume (PLV), Hemoglobin (Hb), Lymphocyte (LYM), Percentage of Lymphocyte (LYM%), Procalcitonin (PCT), Mean Corpuscular Volume (MCV), Hematocrit (HCT), Red Blood Cell (RBC), Percentage of Monocytes (Mono%), White Blood Cell (WBC), CRP, Serum Lipase (LIPA) and Serum Amylase (AMY). The laboratory indicators, abbreviations, and normal ranges are summarized in Additional file 1: Table S2. And all data used for analysis were the first examination results within the first 48 h after admission. Imputation for missing variables was taken into consideration if the missing values were less than 20%. And the missing data were imputed using R package ‘DMwR’.
Feature selection
The least absolute shrinkage and selection operator (LASSO) on the logistic regression model with bootstrap method was employed to select the most important variables for constructing prediction models [27], compared with minimum redundancy maximum relevance (MRMR) and Boruta feature selection methods. L1-penalized absolute shrinkage with 20-fold cross validation was conducted for LASSO variable selection process. The most predictive variables with the minimum λ were reported using R package ‘glmnet’. Notable, λ is the optional user-supplied lambda sequence and glmnet chooses its own sequence, aiming to get better convergence. AP risk score was constructed using the coefficients of statistically significant variables weighted by the multivariable logistic regression model in the training cohort. Backward stepwise selection with Akaike’s information criterion was applied to select statistically significant factors for the multivariable logistic regression model; the P value threshold was 0.05 (P < 0.05) for including the significant variables from the analysis.
Models construction
The whole cohort was split into 70% training and 30% validation sets. This was to optimize the tradeoff between the robustness of the training sample and the number of events in the test set. Training cohort was used to build prediction models with fivefold cross-validation, whereas the validation cohort was used to validate the models performance. In the training cohort, five machine learning models, including support vector machines with linear kernel (SVM-linear), support vector machines with sigmoid kernel (SVM-sigmoid), support vector machines with radial basis kernel (SVM-radial), logistic regression and xgboost [28], were constructed, using variables identified by LASSO regression analysis. We follow the TRIPOD guidelines [18, 19] to construct the prediction models using identified variables by LASSO. The R packages ‘e1071’, ‘glmnet’ and ‘xgboost’ were employed to build SVM-linear, SVM-sigmoid, SVM-radial, logistic regression and xgboost models, respectively. The Hosmer–Lemeshow test was used to test the goodness of fit for the constructive models.
Xgboost algorithm
eXtreme Gradient Boosting (Xgboost) is a machine learning technique with gradient boosting method that combines the regression tree [28]. Xgboost has been widely recognized in the machine learning literature [29,30,31], data mining challenges and disease outcome prediction. By adjusting the hyper-parameters, the xgboost could assemble weak prediction models to an optimal and accurate classifier, with the most predictive features. Additionally, the xgboost could handle missing clinical values effectively, which is common in live clinical work [14].
Models assessment
The models performances were evaluated by the predictive accuracy (ACC) for individual outcomes (discriminating ability), sensitivity (SEN), specificity (SPC), and the area under the curve (AUC). The Youden index (i.e., sensitivity + specificity − 1) was used to identify the optimal cutoff value in the training cohort and validation cohort, as the equal importance of sensitivity and specificity for AP. The patients will be stratified into high-risk group and low-risk group based on the best cut-off value. We also used the AUC, sensitivity and specificity to compare the accuracy of different types of models and risk scores (i.e., RANSON, SIRS). DeLong test was used to compare AUCs of different models.
The decision curve analysis was employed to evaluate the standardized net benefit of the probability threshold used to categorize observations as 'high risk. The decision curve analysis incorporates consequences and therefore informs the decision of whether to use a model at all, or which of several models is optimal [32]. In the decision curve, the x-axis represents the threshold probability, and the y-axis measures the net benefit. The net benefit was calculated by summing the benefits (true-positive results) and subtracting the harms (false-positive results), weighted by the relative harm of a false-positive and false-negative result. The R package ‘rmda’ was employed to conduct the decision curve analysis.
Statistical analysis
Continuous variables are reported as mean (SD) or medians with interquartile ranges (IQRs) for skewed distributed variables and compared using an unpaired, 2-tailed t-test or Mann–Whitney U test. Categorical variables were reported as whole numbers and proportions (n [%]) and compared using the χ2 test or Fisher exact test. Shapiro–Wilk normality test was performed to compute the data normality. Imputation for missing variables was taken into consideration if the missing values were less than 20%. The k-nearest neighbors were used to fill in the unknown (NA) values. For NA value, it will impute for k most similar cases and use the values of these cases to fill in the unknowns. The NA values were filled using R package ‘DMwR’. Continuous predictors (i.e., age [33], obesity [26]) were categorized according to the previous researches before analyzing, APACHE II [33], RASON [34], SIRS [35] and NEWS [36] were used as categorical variable. Different types of risk scores were compared using multivariate analysis and visualized with a forestplot, using R package ‘forestplot’.
In all data analyses, P < 0.05 was considered statistically significant. Odds ratios (ORs) were reported with their 95% confidence intervals (95% CIs) to evaluate the effect size of important clinical factors. All analyses were performed using R software (version 4.0.4, http://www.r-project.org).