Machine learning prediction for COVID-19 disease severity at hospital admission
BMC Medical Informatics and Decision Making volume 23, Article number: 46 (2023)
Early prognostication of patients hospitalized with COVID-19 who may require mechanical ventilation and have worse outcomes within 30 days of admission is useful for delivering appropriate clinical care and optimizing resource allocation.
To develop machine learning models to predict COVID-19 severity at the time of the hospital admission based on a single institution data.
Design, setting, and participants
We established a retrospective cohort of patients with COVID-19 from University of Texas Southwestern Medical Center from May 2020 to March 2022. Easily accessible objective markers including basic laboratory variables and initial respiratory status were assessed using Random Forest’s feature importance score to create a predictive risk score. Twenty-five significant variables were identified to be used in classification models. The best predictive models were selected with repeated tenfold cross-validation methods.
Main outcomes and measures
Among patients with COVID-19 admitted to the hospital, severity was defined by 30-day mortality (30DM) rates and need for mechanical ventilation.
This was a large, single institution COVID-19 cohort including total of 1795 patients. The average age was 59.7 years old with diverse heterogeneity. 236 (13%) required mechanical ventilation and 156 patients (8.6%) died within 30 days of hospitalization. Predictive accuracy of each predictive model was validated with the 10-CV method. Random Forest classifier for 30DM model had 192 sub-trees, and obtained 0.72 sensitivity and 0.78 specificity, and 0.82 AUC. The model used to predict MV has 64 sub-trees and returned obtained 0.75 sensitivity and 0.75 specificity, and 0.81 AUC. Our scoring tool can be accessed at https://faculty.tamuc.edu/mmete/covid-risk.html.
Conclusions and relevance
In this study, we developed a risk score based on objective variables of COVID-19 patients within six hours of admission to the hospital, therefore helping predict a patient's risk of developing critical illness secondary to COVID-19.
The COVID-19 pandemic began as an outbreak of the SARS-CoV2 virus in the Wuhan province in China in December 2019. As of July 2022, there have been over 547,000,000 confirmed cases of COVID-19 worldwide . The illness manifests itself variably, ranging from mild viral symptoms, including fever, cough, congestion, and sore throat to life-threatening illness defined by sepsis, respiratory failure, venous thromboembolism (VTE), shock, and death . As COVID-19 is likely to continue affecting populations across the world, with novel virulent strains regularly emerging, it is critical to determine which patients are at risk for the more severe manifestations. While developed nations are more prepared for larger spikes associated with these emerging new variants, developing counties may still be at risk for shortages of resources, such as hospital and ICU beds . These countries would uniquely benefit from newer, more robust COVID-19 scoring tools to best allocate their limited resources.
Artificial intelligence (AI) models have been developed to assist with COVID-19 risk stratification methods that use electronic health records (EHR) and laboratory results. These models use various types of data, such as demographic information, disease history, laboratory results, and clinical symptoms, to predict the likelihood of a patient developing severe COVID-19. In a retrospective study with 3988 patients, Grasselli et al.  reported that the survival rate of critically ill patients with COVID-19, particularly older men who require noninvasive mechanical ventilation and have preexisting comorbidities, is low. Hypertension was the most common comorbidity among patients, and those with hypertension had a significantly lower survival rate. Another study  investigated 4997 patients and performed a retrospective review of medical records of demographics, comorbidities and laboratory tests at the initial presentation of patients to develop a prediction model and risk scores of ICU admission and mortality in COVID-19. Similar to our research, they set ICU admission and death as the primary outcomes. The top five predictor reported as lactate dehydrogenase (LDH), procalcitonin, smoking history, oxygen saturation (SpO2), and lymphocyte count. Initially the use of lab values, such as c-reactive peptide (CRP) or d-dimer aided clinical decision-making in conjunction with clinical findings. The novel marker of immature platelet fraction % (IPF%) has been shown to be a predictor of clinical outcomes in COVID-19 [5, 6]. In our prior study, IPF% was predictive of hospital length of stay and intensive care unit (ICU) admission, the two crucial outcomes needed to help determine resource allocation. Since then, a variety of COVID-19 scoring tools have supplanted the original, more limited methods. These earlier scoring tools have a couple limitations. First, many have bias for high-risk patients . Second, different virus variants, such as delta and omicron, were not integrated in the model.
The development of newer COVID-19 scoring tools should consider novel evidence and objective biomarkers of disease. Furthermore, this ideal scoring system should use admission standard laboratory values to be feasible in any clinical setting. This study aims to utilize an artificial intelligence (AI) method to predict severity of illness of COVID-19 patients by using initially obtained laboratory values, helping clinicians to identify patients at risk for disease progression, morbidity, and mortality.
This study describes SARS-COV-2-infected patients evaluated at the University of Texas Southwestern Medical Center (UTSW) between May 2020 and March 2022. Patients were enrolled from the institutional COVID-19 patient registry, a local institutional review board approved registry comprised of COVID-19 patients designed to study the natural history of the disease. The current study included 1,795 adult patients with SARS-COV-2 infection (captured all major variants seen at Texas state) who also had IPF% measurements (Table 1).
Two predictive models were designed: one for 30-day mortality (30DM) (n = 156) and one for mechanical ventilation (n = 236). For these models, 120 demographic and pretreatment variables were collected. From this group, 25 variables (Table 2) were selected using Random Forest’s feature importance score. Selection of the optimal subset of 25 variables for the best-performing prediction model requires an exhaustive search and is computationally prohibitive. Random Forest (RF) algorithms are run repeatedly with different random settings and parameter sets to make sure significant variables rank higher in average during this iterative process. The domain experts approved the list of variables for the prediction task based on (1) clinical relevance when deciding mortality and mechanical ventilation risks, and (2) availability in emergent admission (hence, practically used in an early decision-making tool). Using weighted averages from feature importance rankings, we used a unified list of variables in both predictive models. Note that experimental accuracies using separate lists of variables for each model do not improve significantly.
The prediction task entails correct identification of patients at risk of mortality within 30 days of admission. The significant advantages of an RF model include measurability of variable importance for prediction, handling of a mixture of numerical and categorical variables, and accuracy that is comparable to other prominent methods [9, 10]. Random Forest is an ensemble method that crowdsources predictions from multiple trained decision trees for a more accurate prediction. A decision tree, the constituent machine learning algorithm in an RF framework, produces the probability of a class by hierarchically splitting nodes based on independent variables into buckets of values (e.g., a split could be “Platelet Count” < 145 109/L) until a leaf node with a class label prediction is reached. The collection of splits used in reaching the leaf node constitutes the rule for a final probability assessment of the outcome variable, the prediction. The choice and order of nodes and splits used in a decision tree led to variation in the collection of rules, the associated prediction, and the overall performance.
The RF model was trained within an open-source software kit, Scikit-learn , to identify 30DM and ventilation candidates at risk. Because the training of an RF requires first determining the number of iterations (i.e., number of embedded decision trees), number of randomly selected variables in each tree, and the depth of the decision trees (e.g., the number of splits), an optimization approach and performance validation is required to produce the final model to obtain the best performance. Our final RF model was optimized over a prediction search space of multiple parameters. The search space optimization aimed to achieve higher and lower performance bounds on sensitivity (≥ 0.75) and specificity (≥ 0.70), respectively. The relatively larger lower bound on specificity may permit higher false positive predictions resulting in unwanted predictions. However, the inverse approach, misclassification of patients at risk, would mean a higher mortality or misallocation of ventilation rooms. After an initial parameter tuning, the search was performed by selecting the best-performing model by training RFs with 64 to 256 decision trees (in increments of 32), 10 to 30 predictor variables, 1 to 5 features to consider when looking for the best split, 3 to 9 tree height (in increments of 1) and using Gini index as a splitting criterion . Because the data has a low positive rate (8.6% for 30DM, and 13% for ventilated patients), known as unbalanced classification, more weight is assigned to the positive class. In the ensemble step, models with high accuracy were given more weight in deciding the final prediction. The feature importance score is calculated and reported for the final RF model variables.
Cross-validation was used in measuring the performance of models created when searching for the parameters of an optimal model. In this study, the standard ten-fold cross-validation was employed . Under ten-fold cross-validation, the dataset was divided into ten non-overlapping cohorts, with each cohort having a similar proportion of positive subjects. Based on cross-validated test subjects, the area under the curve (AUC) for Receiver Operating Characteristics (ROC) or c-statistic was calculated. Our prediction model assumes that all variables are presented for each patient. One exception is made for IPF since it is not a routinely measured lab value. If an IPF value is missing, it is predicted based on the patient's other lab results and demographic data using the K-nearest neighbors method (K = 50) in the Scikit-learn library .
Of a total of 1795 patients, 52.6% were males. The average age was 59.7 years old. 38.3% of the study cohort were white, 30.7% were black, 24.0% were Hispanic. 58.6% of the patients had existing hypertension, while 38.2% had diabetes mellitus type 2. Of the 1795 hospitalized patients, 236 (13%) required mechanical ventilation and 156 patients (8.6%) died within 30 days of hospitalization.
Predictive accuracy of each predictive model is validated with the 10-CV method. Random Forest classifier for 30DM model has 192 sub-trees, and obtained 0.72 sensitivity and 0.78 specificity, and 0.82 AUC. The model used to predict MV has 64 sub-trees and returned obtained 0.75 sensitivity and 0.75 specificity, and 0.81 AUC. Tables 1 and 2 summarize important scores and correlation coefficients for each variable used in predictions. Figures 1 and 2 displays AUCs for 30DM and mechanical ventilation predictions, respectively.
We were then able to assign categories to score ranges. For each subject, the classifiers assign a score between 0 and 1 with a positive/negative threshold of 0.5. The higher the score, the more severe the risk. A negative subject with a score of 0.35 is healthier than a subject with 0.47, for example. Likewise, a subject with 0.75 poses higher risks than another positive subject who received a score of 0.67. We categorized each negative subject (0.0–0.49) as “insignificant risk.” Positive range of 0.50 to 1.00 categorized as (a) low risk for scores from 0.5 to 0. 65, (b) moderate risk for 0.66–0.80, and (c) high risk for 0.81–1.00. In cross-validation experiments, 30D mortality predictor identified 352 low risk, 122 moderate risk, and 15 high risk subjects. The risk predictor for mechanical ventilation returned 456 low risk, 153 moderate risk, and 24 high risk subjects in the dataset. Our scoring tool can be accessed at https://faculty.tamuc.edu/mmete/covid-risk.html.
Resource allocation has been a major challenge for all healthcare facilities. Laboratory values shown to be predictive of COVID-19 severity have been identified, and their feasibility of utilization is varied . Existing scoring systems using more subjective and less objective markers predicting severity and outcomes in COVID‐19 patients need to be improved to mitigate clinician’s subjectivity and provide quick severity assessment within six hours of admission.
In our current study of a large, hospitalized cohort of COVID‐19 patients, we have found 25 markers to be effective in predicting COVID-19 severity of illness via 30DM and need for mechanical ventilation by using AI technology.
The advantages of our model include classification of scores into easy-to-understand risk categories. This allows easy application for allocation decisions in situations of resource shortages, when a patient’s severity of illness or predicted outcome determines where said resources must be deployed to prevent poor outcomes. Categories can also be used to determine the nature of post-discharge care and follow up. There are also potential research applications, as these risk stratification categories can be used to identify patients for focused clinical trials. Further, our prediction models easily can categorize patients from readily available objective variables which can be utilized at the time of the early admission. Our scoring tool does use novel IPF, and although this is not a commonly obtained test, it can be added on to a regular complete blood count in peripheral blood samples. We have also incorporated vaccination status which earlier scoring tools have not used . Though the AUC score (predictive value) is higher than many of the other COVID-19 scoring models , the advantage of our score is in the inclusion of a larger number of prognosticating markers compared to other tools as well as a larger and more diverse patient database with all major COVID-19 variants. These strengths allow for a greater degree of generalizability to populations within the United States [14, 16].
Our model could be strengthened further with the addition of imaging data, such as chest x-rays or computed tomography scans . Additionally, with vaccination status there are differences in rates of breakthrough infections between commercially available vaccines, and between patients at various stages of their vaccination series (partially vaccinated versus fully vaccinated versus boosted) .
By using AI, we identified 25 prognostic markers that were significantly associated with 30DM and mechanical ventilation in hospitalized COVID-19 patients. Our scoring system using these markers showed 0.75 sensitivity and 0.75 specificity, and 0.81 AUC in predicting our primary outcomes. This score offers a novel way of prognosticating hospitalized COVID-19 patients by risk category in the United States and is therefore helpful for resource allocation and anticipation of the level of care these patients will need.
Availability of data and materials
The data can be made available upon reasonable request from the corresponding author.
“Centers for Disease Control and Prevention,” COVID Data Tracker Cent. Dis. Control Prev. Publ. March, vol. 28, 2020, [Online]. Available: https://covid.cdc.gov/covid-data-tracker/#datatracker-home
Tsai PH, Lai WY, Lin YY. Clinical manifestation and disease progression in COVID-19 infection. J Chin Med Assoc. 2020;84(1):3–8. https://doi.org/10.1097/jcma.0000000000000463.
Grasselli G, et al. Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy. JAMA Intern Med. 2020;180(10):1345–55. https://doi.org/10.1001/jamainternmed.2020.3539.
Zhao Z, et al. Prediction model and risk scores of ICU admission and mortality in COVID-19. PLoS ONE. 2020. https://doi.org/10.1371/journal.pone.0236618.
Welder D, Jeon-Slaughter H, Ashraf B. Immature platelets as a biomarker for disease severity and mortality in COVID-19 patients. Br J Haematol. 2021;194(3):530–6. https://doi.org/10.1111/bjh.17656.
Lee NCJ, Demir YK, Ashraf B, Ibrahim I, Bat T, Dickerson KE. Immature platelet fraction as a biomarker for disease severity in pediatric respiratory coronavirus disease. J Pediatr. 2019. https://doi.org/10.1016/j.jpeds.2022.07.035.
Petersen E, Ntoumi F, Hui DS. Emergence of new SARS-CoV-2 Variant of Concern Omicron (B.1.1.529)—highlights Africa’s research capabilities, but exposes major knowledge gaps, inequities of vaccine distribution, inadequacies in global COVID-19 response and control efforts. Int J Infect Dis. 2022;114:268–72. https://doi.org/10.1016/j.ijid.2021.11.040.
Ouyang SM, Zhu HQ, Xie YN. Temporal changes in laboratory markers of survivors and non-survivors of adult inpatients with COVID-19. BMC Infect Dis. 2020. https://doi.org/10.1186/s12879-020-05678-0.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.
Deo RC. Machine learning in medicine. Circulation. 2015;132(20):1920–30. https://doi.org/10.1161/CIRCULATIONAHA.115.001593.
Pedregosa F. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinform Oxf Engl. 2005;21(15):3301–7. https://doi.org/10.1093/bioinformatics/bti499.
Wolff D, Nee S, Hickey NS, Marschollek M. Risk factors for Covid-19 severity and fatality: a structured literature review. Infection. 2021;49(1):15–28. https://doi.org/10.1007/s15010-020-01509-1.
Liang W, Liang H, Ou L. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. 2020;180(8):1081–9. https://doi.org/10.1001/jamainternmed.2020.2033.
Knight SR, Ho A, Pius R. Risk stratification of patients admitted to hospital with COVID-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4C Mortality Score. BMJ. 2020. https://doi.org/10.1136/bmj.m3339.
Garibaldi BT, Fiksel J, Muschelli J. Patient trajectories among persons hospitalized for COVID-19: a cohort study. Ann Intern Med. 2021;174(1):33–41. https://doi.org/10.7326/M20-3905.
Ali SA, Bhattacharyya S, Ahmad FN. COVID infections breakthrough post-vaccination. Syst Rev J Pharm Bioallied Sci. 2022. https://doi.org/10.4103/jpbs.jpbs_132_22.
Ethical approval and consent to participate
The study was approved by the institution’s institutional review board (IRB) without requirement for consent. Data is available on request from the authors. The study protocol was reviewed and an informed consent waiver was obtained from the Institutional Review Board of University of Texas Southwestern Medical Center on August 18, 2020 (IRB No: STU-2020-0832). Authors confirm that study was conducted in accordance with the Declaration of Helsinki We also confirm that all experimental protocols were approved by Institutional Review Board of University of Texas Southwestern Medical Center.
Consent to Publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Raman, G., Ashraf, B., Demir, Y.K. et al. Machine learning prediction for COVID-19 disease severity at hospital admission. BMC Med Inform Decis Mak 23, 46 (2023). https://doi.org/10.1186/s12911-023-02132-4