Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2021): medical informatics and decision making
Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants
BMC Medical Informatics and Decision Making volume 22, Article number: 156 (2022)
Sepsis is one of the most life-threatening circumstances for critically ill patients in the United States, while diagnosis of sepsis is challenging as a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning.
We analyzed a cohort of critical care patients from the Medical Information Mart for Intensive Care (MIMIC)-III database. Disparities in social determinants, including race, sex, marital status, insurance types and languages, among patients identified by six available sepsis criteria were revealed by forest plots with 95% confidence intervals. Sepsis patients were then identified by the Sepsis-3 criteria. Sixteen machine learning classifiers were trained to predict in-hospital mortality for sepsis patients on a training set constructed by random selection. The performance was measured by area under the receiver operating characteristic curve (AUC). The performance of the trained model was tested on the entire randomly conducted test set and each sub-population built based on each of the following social determinants: race, sex, marital status, insurance type, and language. The fluctuations in performances were further examined by permutation tests.
We analyzed a total of 11,791 critical care patients from the MIMIC-III database. Within the population identified by each sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. On the 5783 sepsis patients identified by the Sepsis-3 criteria statistically significant performance decreases for mortality prediction were observed when applying the trained machine learning model on Asian and Hispanic patients, as well as the Spanish-speaking patients. With pairwise comparison, we detected performance discrepancies in mortality prediction between Asian and White patients, Asians and patients of other races, as well as English-speaking and Spanish-speaking patients.
Disparities in proportions of patients identified by various sepsis criteria were detected among the different social determinant groups. The performances of mortality prediction for sepsis patients can be compromised when applying a universally trained model for each subpopulation. To achieve accurate diagnosis, a versatile diagnostic system for sepsis is needed to overcome the social determinant disparities of patients.
Sepsis, one of the most life-threatening circumstances for critically ill patients in the United States, is the culmination of complex interactions between the infecting microorganism and the host immune, inflammatory, and coagulation responses [1, 2]. Each year, more than 1.7 million adults in the United States develop sepsis, and approximately 270,000 die because of sepsis. The prevalence of sepsis is around one-third among hospitalized patients . With a few identification methods currently available, a standardized criteria is still under development 
Disparities in critical care can be induced by multi-factored causes [5,6,7,8]. Biases are observed in healthcare for patients from different social status groups [9, 10]. With more data-driven and artificial intelligence (AI) involved in healthcare, disparities among sub-populations are more frequently observed and attracted more attention [11,12,13,14,15]. Machine learning applications for risk prediction in healthcare are becoming more powerful with the development of electronic health records (EHRs) [16,17,18,19,20]. Risk predictions for sepsis patients using machine learning techniques have been studied [21,22,23,24]. However, the discussions over how the disparities and biases interact with risk prediction models for sepsis patients remain undefined. In this study, we revealed the disparities in the proportions of sepsis in subpopulations of social determinants groups from a cohort of patients admitted for critical care services and examined the fluctuations in the performances of mortality prediction for subpopulations of sepsis patients when using machine learning classifiers.
Medical Information Mart in Intensive Care (MIMIC)-III v1.4 is an open-sourced large scale database of critical care patients with enriched features . From a total of 23,620 intensive care unit (ICU) admission records, 11,791 patients with their initial admission records were identified and utilized in this study. Selection criteria were applied to filter out nonadults, patients with suspected infection more than 24 h before the ICU admission or more than 24 h after the ICU admission, patients with missing data, and patients admitted for cardiothoracic surgery services. The data selection algorithms were elaborated in a previous study .
Five social determinants were studied, including race, sex, insurance type, marital status, and language. Race of all subjects was re-leveled into five categories, Asian, Black or African American, Hispanic or Latino, White and other, where the “other” category covers American Indian and Alaska Native, Native Hawaiian or other Pacific Islander, multi-race, unspecified race, and other races not mentioned above. Dichotomous sex, female and male, was considered. Insurance types were taken directly from the MIMIC-III database, which includes government, Medicaid, Medicare, private, and self-pay. Marital status was re-factorized into the following categories: significant other, single, separated, widowed, and unknown, where the “significant other” category covers the situations if life partner or married was indicated in the MIMIC-III database, the “separated” category covers the circumstances if divorced or separated was displayed in the database, the “unknown” category covers the situation if unknown (default) was indicated in the database and was coded for those patients did not specify the marital status. Languages were re-grouped into English, Spanish and other, where the “other” category covers any languages documented in the database other than the two stated.
Disparities in social determinants across various sepsis criteria
We compared the disparities between each sub-category of social determinants in the sepsis population detected by the six identification methods for sepsis: (1) explicit criteria: two codes explicitly mentioning sepsis (995.92 and 785.52 for severe sepsis and septic shock, respectively) defined by International Classification of Diseases, 9th version (ICD-9); (2) Angus methodology ; (3) Martin methodology ; (4) criteria presented by Centers for Medicare & Medicaid Services (CMS) ; (5) the complete surveillance criteria presented by Center of Disease Control and Prevention (CDC) ; (6) Sepsis-3 . Forest plots were generated for the proportion of each subpopulation that was identified as sepsis by each method. For example, a proportion of 0.274 for Asian and “Angus” represents 27.4% of the Asians in the dataset were identified as sepsis by the Angus criteria. A 95% confidence interval was constructed by bootstrapping (1000 simulations) and shown in the forest plots for each proportion.
Mortality prediction for sepsis patients using machine learning
We built machine learning classifiers to predict mortality for sepsis patients. The sepsis patient population was constructed using the Sepsis-3 identification method since it is the latest and most conservative among the six methods being discussed . The entire cohort of patients was split into training and testing sets to a proportion of 7:3. Sixteen machine learning configurations were built and trained to predict in-hospital mortality for the sepsis patients, that include Ridge classifier, perceptron, passive-aggressive classifier, k-nearest neighbors (kNN), random forest, support vector machine with linear kernel (linearSVC) and L1 or L2 regularization, support vector machine with linear kernel and L2 regularization, stochastic gradient descent (SGD) classifier with L1, L2, or elastic net regularization, multinomial naïve Bayes, Bernoulli naïve Bayes, logistic regression, support vector machine (SVM) with rbf, polynomial, or sigmoid kernel. Sequential organ failure assessment (SOFA) score  during the first 24 h of admission, systemic inflammatory response syndrome (SIRS) score  during the first 24 h of admission, and age were employed as features. Before training the machine learning configurations, each feature was scaled to 0 to 1 to avoid the impact of different magnitudes. Five-fold cross-validation was employed to find the optimal hyper-parameters for each machine learning configuration. The best-suited thresholds for each classifier were set according to Youden’s J statistics. The performances of the machine learning configurations were measured by the area under the receiver operating characteristic curve (AUC).
Statistical analysis for disparities in performances on sub-populations of social determinants
The training procedures were carried out on the entire training set, after which trained configurations and evaluation metrics on the entire cohort were saved. In the next step, we tested the performance on every sub-population of each of the five social determinants. To detect the disparities in performances, we compared the AUCs on the entire cohort with those on the subpopulations by permutation tests (1000 times). A one-tailed permutation test was employed to determine if the decrease or increase of the performance is significant statistically when testing on sub-groups of patients. To further illustrate the disparities, we conducted pairwise permutation tests (1000 times) among each pair of the sub-populations. A two-tailed permutation test was used to show if there are significant disparities in performances among each pair. The entire workflow can be found in Fig. 1.
The analysis was conducted using Python 3.6.8. Machine learning classifiers, cross-validation, and evaluation metrics were conducted using Sci-kit Learn 0.23.2.
Disparities in social determinants across various sepsis criteria
Forest plots for the disparities in social determinants across various sepsis criteria are shown in Fig. 2. Proportions of sepsis patients identified by different methods showed significant discrepancies, with the Sepsis-3 criteria as the most conservative one. Within the population identified by the same sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. Numeric values of the proportions and 95% confidence interval can be found in the Table S1 in the Additional file 1.
Mortality prediction for sepsis patients using machine learning
In total, 5783 patients were identified as sepsis by the Sepsis-3 criteria. Statistics of this cohort of sepsis patients are shown in Table 1. The detailed testing performances on the entire testing set are shown in Table 2.
We compared the performances (AUC) for each of the sixteen classifiers on the entire testing set and every sub-population by permutations tests. Significant results at a confidence level of 0.05 were found for race and languages. The observed differences and the corresponding p-values yielded from the permutation tests are shown in Tables 3 and 4. Among all the racial groups, we observed significant decreases in the performances of most of the classifiers for Asian and Hispanic patients (Table 3). Interestingly, significant performance drops were observed when applying the classifiers for the group of patients that speak Spanish (Table 4). We put the results of the social determinants associated with very few to no significant findings in the Table S2-S4 in the Additional file 1. For a further illustration of the disparities, we showed the pairwise comparison results in Tables 5 and 6. Among all the pairs of the racial groups, discrepancies were observed between Asian and White, as well as Asian and other races in most of the classifiers. Significant differences were also observed between Asian and Black sepsis patients in a few classifiers. The disparities between patients speaking various languages were majorly detected between the English-speaking patients and the Spanish-speaking patients. The pairwise comparison results with very few to no significant findings in the Table S5-S7 in the Additional file 1.
Currently, the “gold standard” for sepsis diagnosis is still absent. Among those available criteria, we observed different sensitivities in identifying patients. Meanwhile, we observed disparities in the proportions of population identified by each criteria across various social determinant groups. This brings us the concern that a universal diagnostic system might not work equally on each sub-population. By systematically examining the discrepancies, we hope to provide evidence for a more versatile detection system that takes the disparities in social determinants into consideration. In this study, we have excluded patients with missing data and performed complete case analysis. In future study, we plan to apply advanced missing data imputation techniques [33,34,35] to relax this exclusion criteria and investigate the potential links between missing data and social determinants of health.
The discrepancies among subpopulations of social determinants groups hinder the performance of a machine learning model trained on the entire population. In a previous study, racial disparities  and region disparities  in sepsis-related mortality were revealed by retrospective studies. Prediction of mortality using machine learning has been well-discussed during recent years. However, more effort was devoted to improving the overall performances on the entire given population. While what was being less discussed was the fairness of applying trained machine learning algorithms on various groups of patients. It is by nature that patients are of various social status and it is essential not to underestimate such discrepancies. In this current study, we tested the performance fluctuations when applying the same trained model on patients from each social determinant groups and revealed statistically significant shifts in the performance. Even though the overall performance of a given classifier is descent, it should be kept in mind that there are still sub-populations not benefitting from the model as others. On the one hand, we hope such evidence provides a perspective on the impacts of social determinants for not only the medical society that is working diligently towards a fairer diagnostic method but also the artificial intelligence researchers trying to improve the predictive algorithms one more step towards clinically ready. Additionally, in future studies, we would take the interaction between features into consideration for a more thorough perspective.
Disparities in social determinants were observed in the groups of sepsis patients identified by various currently available diagnostic criteria. The performance of risk prediction tasks for sepsis patients can be compromised when applying a universally trained model for each sub-population. To achieve more accurate identification, a more versatile diagnostic system for sepsis is in need to overcome the social determinant disparities of patients.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the freely accessible database MIMIC-III through PhysioNet (https://physionet.org/content/mimiciii-demo/1.4/).
Medical Information Mart for Intensive Care-III
Area under the receiver operating characteristic curve
Electronic health record
Intensive care unit
International Classification of Diseases, 9th version
Centers for Medicare & Medicaid Services
Center of Disease Control and Prevention
Support vector machine with linear kernel
Stochastic gradient descent
Support vector machine
Sequential organ failure assessment
Systemic inflammatory response syndrome
Hotchkiss RS, Karl IE. The pathophysiology and treatment of sepsis. N Engl J Med. 2003;348(2):138–50.
Russell JA. Management of sepsis. N Engl J Med. 2006;355(16):1699–713.
Novosad SA, Sapiano MR, Grigg C, Lake J, Robyn M, Dumyati G, Felsen C, Blog D, Dufort E, Zansky S. Vital signs: epidemiology of sepsis: prevalence of health care factors and opportunities for prevention. Morb Mortal Wkly Rep. 2016;65(33):864–9.
Johnson AE, Aboab J, Raffa JD, Pollard TJ, Deliberato RO, Celi LA, Stone DJ. A comparative analysis of sepsis identification methods in an electronic database. Crit Care Med. 2018;46(4):494.
Kent JA, Patel V, Varela NA. Gender disparities in health care. Mount Sinai J Med: J Transl Personal Med. 2012;79(5):555–9.
Orlovic M, Smith K, Mossialos E. Racial and ethnic differences in end-of-life care in the United States: Evidence from the Health and Retirement Study (HRS). SSM-Popul Health. 2019;7: 100331.
Quindemil K, Nagl-Cupal M, Anderson KH, Mayer H. Migrant and minority family members in the intensive care unit. A review of the literature. HeilberufeSCIENCE. 2013;4(4):128–35.
Soto GJ, Martin GS, Gong MN. Healthcare disparities in critical illness. Crit Care Med. 2013;41(12):2784.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53.
Wiens J, Price WN, Sjoding MW. Diagnosing bias in data-driven algorithms for healthcare. Nat Med. 2020;26(1):25–6.
Ahmad MA, Patel A, Eckert C, Kumar V, Teredesai A. Fairness in machine learning for healthcare. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining: 2020; 2020: 3529–3530.
Chen IY, Szolovits P, Ghassemi M. Can AI help reduce disparities in general medical and mental health care? AMA J Ethics. 2019;21(2):167–79.
Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics. 2020;46(3):205–11.
Wang H, Li Y, Ning H, Wilkins J, Lloyd-Jones D, Luo Y. Using machine learning to integrate sociobehavioral factors in predicting cardiovascular-related mortality risk. Stud Health Technol Inform. 2019;264:433–7.
Bhavani SV, Luo Y, Miller WD, Sanchez-Pinto LN, Han X, Mao C, Sandıkçı B, Peek ME, Coopersmith CM, Michelson KN. Simulation of ventilator allocation in critically ill patients with COVID-19. Am J Respir Crit Care Med. 2021;204(10):1224–7.
Ahmad MA, Eckert C, Teredesai A. Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics: 2018; 2018: 559–560.
Callahan A, Shah NH. Machine learning in healthcare. In: Key advances in clinical informatics. Elsevier; 2017: 279–291.
Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017;5:8869–79.
Luo Y, Xin Y, Joshi R, Celi L, Szolovits P. Predicting ICU mortality risk by grouping temporal trends from a multivariate panel of physiologic measurements. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence: 2016; 2016: 42–50.
Sanchez-Pinto N, Stroup E, Pendergrast T, Pinto N, Luo Y. Derivation and validation of novel phenotypes of multiple organ dysfunction syndrome in critically ill children. JAMA Netw Open. 2020;3(8):e209271–e209271.
Scott H, Colborn K. Machine learning for predicting sepsis in-hospital mortality: an important start. Acad Emerg Med. 2016;23(11):1307–1307.
Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, Hall MK. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad Emerg Med. 2016;23(3):269–78.
Kong G, Lin K, Hu Y. Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU. BMC Med Inform Decis Mak. 2020;20(1):1–10.
Ding M, Luo Y. Unsupervised phenotyping of sepsis using nonnegative matrix factorization of temporal trends from a multivariate panel of physiological measurements. BMC Med Inform Decis Mak. 2021;21(5):1–15.
Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1–9.
Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Read Online: Crit Care Med|Soc Crit Care Med. 2001, 29(7):1303–1310.
Martin GS, Mannino DM, Eaton S, Moss M. The epidemiology of sepsis in the United States from 1979 through 2000. N Engl J Med. 2003;348(16):1546–54.
Medicare Cf, Services M. Implementation of severe sepsis and septic shock: management bundle measure (NQF# 0500). In: National Quality Forum: 2012; 2012.
Seymour CW, Coopersmith CM, Deutschman CS, Gesten F, Klompas M, Levy M, Martin GS, Osborn TM, Rhee C, Warren D. Application of a framework to assess the usefulness of alternative sepsis criteria. Crit Care Med. 2016;44(3): e122.
Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, Rubenfeld G, Kahn JM, Shankar-Hari M, Singer M. Assessment of clinical criteria for sepsis: for the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315(8):762–74.
Vincent J-L, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart C, Suter P, Thijs LG. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. New York.: Springer-Verlag; 1996.
Rangel-Frausto MS, Pittet D, Costigan M, Hwang T, Davis CS, Wenzel RP. The natural history of the systemic inflammatory response syndrome (SIRS): a prospective study. JAMA. 1995;273(2):117–23.
Luo Y: Evaluating the state of the art in missing data imputation for clinical data. Brief Bioinform. 2022; 23(1):bbab489.
Luo Y, Szolovits P, Dighe AS, Baron JM. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data. J Am Med Inform Assoc (JAMIA). 2017;25(6):645–53.
Cao W, Wang D, Li J, Zhou H, Li L, Li Y: Brits: Bidirectional recurrent imputation for time series. arXiv preprint arXiv:180510572 2018.
Jones JM, Fingar KR, Miller MA, Coffey R, Barrett M, Flottemesch T, Heslin KC, Gray DT, Moy E. Racial disparities in sepsis-related in-hospital mortality: using a broad case capture method and multivariate controls for clinical and hospital variables, 2004–2013. Crit Care Med. 2017;45(12):e1209–17.
Ogundipe F, Kodadhala V, Ogundipe T, Mehari A, Gillum R. Disparities in sepsis mortality by region, urbanization, and race in the USA: a multiple cause of death analysis. J Racial Ethn Health Disparities. 2019;6(3):546–51.
All those who meaningfully contributed to the manuscripts are listed as an author.
About this Supplement
This article has been published as part of BMC Medical Informatics and Decision Making Volume 22 Supplement 2, 2022: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2021): medical informatics and decision making. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-22-supplement-2.
Dr. Luo reports funding from R01 LM013337. Hanyin Wang reports funding from UL1 TR001422. Dr. Naidech reports funding from R01 NS110779 and U01 NS110772. The funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Publication costs are funded by Northwestern University.
Ethics approval and consent to participate
Consent for publication
Authors have no competing interests except for receiving funding from NIH.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Supplementary tables for proportions of subpopulations, observed differences for sex, marital status, and insurance type groups, and pairwise comparisons among sex, marital status, and insurance type groups.
About this article
Cite this article
Wang, H., Li, Y., Naidech, A. et al. Comparison between machine learning methods for mortality prediction for sepsis patients with different social determinants. BMC Med Inform Decis Mak 22 (Suppl 2), 156 (2022). https://doi.org/10.1186/s12911-022-01871-0