Skip to main content

Predicting clinical outcomes among hospitalized COVID-19 patients using both local and published models



Many models are published which predict outcomes in hospitalized COVID-19 patients. The generalizability of many is unknown. We evaluated the performance of selected models from the literature and our own models to predict outcomes in patients at our institution.


We searched the literature for models predicting outcomes in inpatients with COVID-19. We produced models of mortality or criticality (mortality or ICU admission) in a development cohort. We tested external models which provided sufficient information and our models using a test cohort of our most recent patients. The performance of models was compared using the area under the receiver operator curve (AUC).


Our literature review yielded 41 papers. Of those, 8 were found to have sufficient documentation and concordance with features available in our cohort to implement in our test cohort. All models were from Chinese patients. One model predicted criticality and seven mortality. Tested against the test cohort, internal models had an AUC of 0.84 (0.74–0.94) for mortality and 0.83 (0.76–0.90) for criticality. The best external model had an AUC of 0.89 (0.82–0.96) using three variables, another an AUC of 0.84 (0.78–0.91) using ten variables. AUC’s ranged from 0.68 to 0.89. On average, models tested were unable to produce predictions in 27% of patients due to missing lab data.


Despite differences in pandemic timeline, race, and socio-cultural healthcare context some models derived in China performed well. For healthcare organizations considering implementation of an external model, concordance between the features used in the model and features available in their own patients may be important. Analysis of both local and external models should be done to help decide on what prediction method is used to provide clinical decision support to clinicians treating COVID-19 patients as well as what lab tests should be included in order sets.

Peer Review reports


The coronavirus disease 2019 (COVID-19) caused by the SARS-CoV-2 has been devastating compared to other viruses (seasonal, avian and swine influenza), in regard to both the morbidity and mortality and its economic impact, despite advancements in medical care since the Spanish Flu of 1918 [1]. COVID-19 has had a dramatic impact on health systems globally and the US economy despite assistance from the US Federal government, via the CARES Act [2] and other funding programs.

The COVID-19 pandemic occurred quickly and was rapidly followed by a massive production of academic output, including prediction models for a variety of clinical outcomes; the initial models for hospital outcomes came from the city of Wuhan in the Hubei province of China, where the initial cases were discovered. From there, models around the globe surged and were likely integrated into many hospital guidelines. However, it is unclear if those models could be applied to local cohorts. Having a rapidly available and accurate prediction model for COVID-19 patients being admitted from the emergency department (ED) would be useful for making accurate triage and prognostic assessments to inform decisions regarding treatment and resource allocation. While knowledge of the likelihood of death in those sent home from the ED would also be of interest, this requires longitudinal data which is often not as readily available. The value of appropriate triage decisions is important, especially in time when resources are stretched.

The growth in the volume of readily available healthcare data has facilitated the development of artificial intelligence-based models; however, a significant factor limiting the utility of dissemination of such models is the issue of generalizability. For example, the earliest computer-aided decision models evaluating abdominal pain were not able to be replicated in different institutions [3]. A mortality prediction tool in acute alcoholic pancreatitis (Ranson’s criteria) [4] developed in a small cohort has a wide acceptance compared to superior scoring tools [5].

One of the most popular predictions tools in clinical use today is the 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk [6]. This risk tool uniformly overestimated risk in non-diabetic patients in a large, multi-ethnic, socioeconomically group of patients in California [7].

We performed an analysis of how well published and self-developed models would predict clinical outcomes after admission on a cohort of diverse urban patients in Chicago. Our self-developed models were trained using data from our local patient cohort. Published, external models were not re-trained with our cohort’s data. We aim to close the gap in the understanding if COVID-19 prediction models on mortality and criticality could be potentially used in local cohorts despite ethnic, geographic and timeline differences. We postulate that due to our incomplete understanding of the pathophysiology, ethnic, racial and socioeconomic differences by location, and improving treatment over time, that models may not predict well in a cohort different than their validation and development cohorts.


University of Illinois Hospital (UIH) Cohort

UIH is a tertiary, academic teaching hospital in Chicago. The UIC Institutional Review Board approved this study. All admissions to UIH for COVID-19 positive patients were reviewed for the time of the first COVID-19 positive test and the date of admission. If the first positive COVID-19 test was performed greater than 14 days prior to admission or greater than 48 h after admission, the patient was excluded. Patients transferred from another institution were reviewed for prior COVID-19 testing. If the COVID-19 test was greater than 14 days before transfer, the patient was excluded. If the transfer was not related to any possible COVID-19 symptoms, the patient was excluded. If the patient was discharged and then readmitted less than 14 days after the first positive COVID-19 test, the encounter was included. Patients were discharged or expired prior to 8/18/20. Pregnant patients were included.

Since our goal was to assess the predictive power of our own prediction model as well as some of those in the literature, we partitioned our data into a training cohort consisting of the first 60% of patients admitted prior to 5/9/20 and a test cohort consisting of patients admitted and discharged from 5/9/20 through 8/18/20.

Variable selection was based on a review of the extant literature and expert opinion. The variables selected are shown in Table 1. Admission vital signs, laboratory values and clinical and radiological features were assessed. The results were the first available up to 24 h after admission. Two outcomes were evaluated, mortality (death during hospitalization), and “criticality”, defined as mortality or admission to an ICU.

Table 1 Characteristics of the development and test cohorts

Literature search

We searched for articles published in PubMed, Embase, Arxiv and medRxiv using the search string: [Prediction] AND [Human] AND [COVID-19] OR [SARS-COV2] AND [Clinical Trial] OR [Observational Trial] which were published before 8/27/2020. Articles were reviewed to determine whether the models described predicted our outcomes of interest and whether there was sufficient concordance and detail provided to implement the model using our cohort’s data.

Model development

The objective of our model development was to accurately predict patient outcomes using a reduced number of key input features. A variety of popular machine learning algorithms were evaluated to classify mortality and criticality. These algorithms include Linear Regression [8], Decision Tree [9], Random Forest [10], XGBoost [11], LightGBM [12], and CatBoost [13]. The training process uses a combination of step forward feature selection and parametric grid search. Step forward feature selection is the process of starting with a single feature and iteratively adding one additional feature until there is no increase in model performance. For each step in the feature selection, a parametric grid search is performed to determine the optimal parameter set for each model. We use the area under the receiver operating characteristic curve (AUC) as the evaluation metric.

Statistical analysis of models

No missing data were imputed in our test cohort. External models were included in our analyses if predictions could be generated for greater than 60% of the patients based on this missingness. If odds or a point scale was available, a receiver operator curve was developed and the area under the curve (AUC) was calculated.

Confidence interval and comparison of ROCs were performed using DeLong’s method [14]. The training and test cohorts were compared using Chi-Square tests for categorical variables and two-sided t-tests for continuous variables using a significance level of P < 0.05. The fraction of missingness for each variable was compared between the cohorts using the Bonferroni correction to control the family-wise error rate.

Descriptive statistics were performed using Stata 12 SE version (StataCorp, TX). Model development was conducted using the Python libraries sklearn (v.0.23.1), TensorFlow (v.2.2.0), XGBoost (v0.90), LightGBM (v.2.3.1) and Catboost (v.0.23.1). Statistical analysis was performed using the pROC package in R. This study was approved by the UIC Institutional Review Board.


UIH cohort characteristics model compilation

A description of the UIH cohorts is shown in Table 1. There was a total of 516 patients. The training cohort included the first 309 patients (60%), and the test cohort was the subsequent 207 patients (40%). The test cohort was slightly younger, 53.3 vs 56.5 years [P = 0.008]. Though the whole racial distribution was not significantly different between the cohorts, the proportion of self-declared black patients was 49% in the training cohort and 42% in the test cohort. The lymphocyte, white blood cell and neutrophil counts were significantly higher in the test cohort.

Though some lab tests were performed on almost all patients, many tests were performed in a more discretionary fashion. The missingness of some of the more discretionary tests was higher in the test cohort than in the training cohort: ferritin 7.1–17.4%, Lactate Dehydrogenase (LDH) 16.5–27.5%, Procalcitonin 16.8–32.9%, Interleukin 6 (IL-6) 75.1–87.9%. D-dimer was missing less frequently in the test cohort, 40.8–27.5%.

Model compilation summary

Ninety-one abstracts were reviewed. After applying our inclusion criteria, 41 articles remained. The models and references are shown in Table 2.

Table 2 Prediction models of inpatients matching outcomes of interest

Over 60% of the models (n = 26) were derived in China, 11 in Europe, 3 in the US and 2 were multinational. The most common methods were logistic regression (n = 25) and Cox Regression (n = 12).A small number of models used neural networks and decision trees. Among models which published an AUC, the AUC’s ranged from 0.74 to 0.98.

UI health internal model development

Multiple methods of machine learning were assessed to develop the best prediction model of the training (60%) cohort. The best models for both mortality and criticality were random forest models, based on the AUC values. Table 3 lists the key modeling parameters and covariates for the mortality and criticality models. The covariates are listed in the order of importance generated by the step forward regression. The key parameters for the random forest models were determined during the grid search of the development data set. The AUC for the mortality model in the training cohort was 0.98, and for criticality it was 0.97.

Table 3 Internal Model Fit on first 60% of admissions for mortality and criticality

If model coefficients in the papers in Table 2 were sufficiently described and the model variables were available for more than 60% of admissions, the model was used to predict outcomes in the UIH test cohort. Results are shown in Table 4.

Table 4 Prediction models fit to UIH test cohort

A total of 10 models were assessed using the test cohort, 8 from the literature and 2 internal. Seven of the external models used logistic regression and one used a decision tree. One external model predicted criticality; the remainder predicted mortality. The most common variables used in the models were the age (7 models), lymphocyte count or lymphocytes/WBC ratio (6 models), C-reactive protein (CRP) and LDH (4 models), D-dimer (3 models) and BUN (2 models). The number of features used in each model ranged from 2 to 11, with a median of 3.5. These models assessed clinical features and laboratory testing upon admission. In addition, 1 model explicitly included pregnant patients [19], 2 excluded pregnant patients [28, 42], and 5 were undetermined [20, 26, 44, 46, 51].

Three of the models, B, G and H [19, 46, 51], had open access web-based calculators to predict outcomes for individual patients. One model used a decision tree of only three variables which is easy for a clinician to use (A) [42]. Two models used a nomogram to try to simplify use (D and F) [26, 44].

All external models were trained using cohorts of Chinese patients. Though there were non-Chinese cohort models in Table 2, none of them provided sufficient description of their models to be implemented on our test cohort without retraining.

Common reasons why models were not used were the lack of availability of the coefficients needed to calculate a prediction score, lack of concordance between the features used in the model and features available in our test cohort, and outcome data not available in our test cohort (e.g., mortality).

Figure 1 shows the confidence intervals of the AUC’s obtained on the test cohort. Table 4 and Fig. 1 show that the best estimate for the AUC ranges from 0.68 for model G to 0.89 for model C. The internal models have an AUC of 0.84 for mortality and 0.83 for criticality. The mortality model with the highest AUC, C, was not statistically different than the UIH mortality model, 0.89 (0.82–0.96) vs AUC 0.84 (0.74–0.94), [P > 0.5].

Fig. 1
figure 1

Area under the curve (AUC) confidence intervals for Table 4 models

The confidence intervals range from 0.13 to 0.30. The difference in performance between the published fit and that of its performance on our test set varied significantly. For model B this difference in AUC was only 0.04 and for model E it was 0.26. The UI Health models were in the middle with a 0.14 AUC difference.

For all 8 models, the mean values for lab results and those of the UI Health test cohort are shown in Table 5. The variables shown were used in at least one model and were available in five or more of the model cohorts. Age and CRP were reported in all papers. The creatinine was reported in seven papers. Though rigorous statistical testing cannot be performed due to the inability to obtain the raw data, some of the variables are clinically significantly different between the cohorts from China and UIH. The mean CRP at UIH is more than three times higher than in the external model average, the creatinine is two-fold higher and the LDH is roughly 1/3 higher.

Table 5 Values of the most common variables in the 8 external models and the test cohort


All the models in Table 1 could not be used to make predictions on our test cohort for multiple reasons. Without chart review, symptomology and its duration are difficult to obtain, excluding some models. Unusual imaging grading schemes or mandatory CT scans were not available in our cohort. Some studies used labs that were not ordered frequently in our hospital. Lack of longitudinal follow up limited the use of timed mortality, i.e., 30-day etc. These issues, along with the lack of well described coefficients of models produced the inability to use models except for the 8 models in Table 1.

The features used in the models were surprisingly diverse. The number of variables in each model ranged from 2 through 11, with 19 different variables across the studies. The most common variables used were age, lymphocyte count, CRP and LDH. It is surprising that only 7 of the 10 models used age as a predicting variable, and the 3 models that did not use it did not perform well. In large multi-site cohorts examined in Britain [56], the US [57] and internationally [58], age was a strong predictor of mortality.

Three of the external models performed very well, with AUC’s of 0.84–0.89. This demonstrates that although the patients were geographically distant, ethnically different, in different health systems and cultures, and at different times during the pandemic, reasonable prediction was possible. Our initial hypothesis was that these models would not work well, but this was not the case in all the models.

It is likely that some of the models may have had better performance if retrained using our local cohort, but this was not done as the purpose was to see how they worked “out of the box”. This appeared to be the intent of many of the authors of the published models as evidenced by the publishing of web calculators, nomograms and decision trees. One of the issues which may cause worse or better performance in a model is that the outcomes have been found to be a function of time during the pandemic, not just patient factors, with improving outcomes more recently [59, 60].

Models A, F, G and H were also evaluated in a review and cohort prediction comparison by Gupta et al. [61] using their cohort of 440 patients from London with a mortality rate of around 28%. For Models F, G and H, the AUCs in our cohort were slightly different than in the London Cohort [61] respectively, model F, 0.84 vs. 0.76, model G, 0.68 vs. 0.74 and model H, 0.72 vs. 0.69.

Review of the characteristics of the cohorts in Table 5 is instructive in understanding why some of the models did not perform well. Model A is a decision tree based on only 3 features, CRP, LDH and the percentage of lymphocytes. The first decision node suggests mortality if the LDH is greater than 365 U/L. In their cohort, the average LDH was 274 U/L. The average LDH in our test cohort was 386 however, thus a large portion were predicted to die at the first node, causing a poor positive predictive value (PPV). In the London cohort the average LDH was about the same as ours, 395 U/L, and this model performed poorly in that cohort also [61].

The average LDH was roughly 1/3 higher in our test cohort than in the average of the cohorts from China. It is not clear what the reason for this is. In a healthy multiethnic cohort from Hawaii [62], there were at most minor differences between black, Hispanic, White and Asian patients in their LDH, suggesting that the differences in LDH are not likely due to racial factors. It is possible that a difference in the time of infection to presentation might explain the difference. The other models which used LDH predicted well, but this might be in part related to use of a logistic regression instead of a decision tree.

The average CRP in our cohort is roughly 350% of the average in the external models, 99 mg/L vs. 27 mg/L. Four models used the CRP and only one model performed well, model C. The creatinine was significantly higher in our cohort than in any of the derivation cohorts and as well as the average of the studies, 0.84 mg/dL. Only one model used the creatinine, model H. Its derivation cohort average creatinine was 0.72 mg/dL. Thus, model H used both the CRP and creatinine, helping explain its poor performance. For creatinine, there are studies showing socioeconomic and ethnic variations in chronic kidney disease [63] with one systematic review showing the prevalence of chronic kidney disease in China was less than a fourth of the rate in the US [64]. The higher creatinine in the test cohort may not be related only to differences in illness at presentation, but rather differences in the prevalence of CKD.

It is not fully clear why the models produced at UIH using our training cohort did not perform better on our test cohort, though there are some likely factors. The AUC for mortality decreased from 0.98 to 0.84 and for criticality, from 0.97 to 0.83. In analysis of the entire cohort, we were able to determine that the mortality and criticality were associated with the admission date. This is consistent with publications showing an improved mortality rate over time [59, 60]. The WBC, lymphocytes and neutrophils were not all used in each model and all went up in the test cohort. Thus, it is possible that some variable not in the models changed over time, producing a worse fit compared to the first 60% of patients.

The number of cases for which a model is unable to generate a prediction due to missing data is an important practical consideration for model implementation. The fraction of the test cohort for which predictions could not be generated due to missingness ranged between 17 and 31% for external models. The UI Health models could not generate predictions in 27% of the patients. Though retrospectively missing data can be imputed, this is not so easy in real time by clinicians during patient care, so was not done. This demonstrates non standardized test ordering, which is not surprising as our understanding of what is useful and necessary for testing in suspected COVID patients has evolved.

It is interesting to note that many of the tests which have been used commonly in these and other models were missed more frequently in the test cohort than the earlier development cohort. Ferritin 17.4% from 7.1%, LDH 27.5% from 16.5%. It is not clear why these tests were ordered less over time, particularly LDH with many publications demonstrated its prognostic power [15, 16, 21, 22, 25, 27, 42,43,44,45, 50, 52, 53]. It is possible that the ordering of these inflammatory prognostic markers [65] decreased as clinicians’ confidence with clinical prognosis improved.

D-dimer on the other hand was missing less frequently in the test cohort, 27.5% from 40.8%. This difference may be due to an increased concern for venous thromboembolism in COVID 19 infections [66] which developed over time.

An important question is what model to use to provide prognostic information to clinicians. Using your own data to inform future care is consistent with a learning health system [67]. The ideal situation is that clinical decision support (CDS) could supply the best prediction for a patient based on the most recent trends at the time. Another reason to use your own data, especially with COVID-19, is that the disease, treatment and outcomes are likely to change over time [59, 60], while the models in the literature are static. An additional benefit of using your own data and predictive models is the ability to see which diagnostic tests are most useful prognostically, but are not ordered enough, leading to more evidenced based order sets.

Our literature search has limitations due to the inability to ensure that all possible synonyms were used along with other reasons that the search strategy may have missed articles. As related to COVID-19, the rate of discovery and publication is so rapid that many models were likely published between the time of study completion and study publication.

Limitations related to our cohort and analysis are first that this is a single site study, and these models may have performed differently at other sites. The size of the test cohort contributed to the relatively large confidence intervals of the AUC’s, making statistical significance difficult to prove. We were unable to follow patients consistently after discharge, thus could not measure timed outcomes like 30-day mortality. Lastly, we could not control for changes in treatment which have occurred over time.


Both internal and some external models were found to work well at predicting mortality in our test cohort. The 3 best external models used at least age, LDH and lymphocytes. Inconsistent ordering of lab tests led to the inability to generate predictions for 27–31% of our cohort using the 3 best external models and the 2 UIH models.

As not all the external models worked well, it would be difficult to know which model to use for future admissions at a particular time during the pandemic as treatment and patient mix can change. As an institution’s own prior patients are most similar to their next group of patients, using models from local data should be considered.

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due privacy but are available from the corresponding author on reasonable request.



Absolute lymphocyte count


Absolute neutrophil count


Aspartate transaminase


Blood urea nitrogen


Coronary heart disease


Creatinine kinase


Chronic kidney disease




C reactive peptide


Chronic lung disease


Chest Xray


Diabetes mellitus


Estimated glomerular filtration rate



IL 6:

Interleukin 6


Lactate dehydrogenase


National Early Warning Score 2


Neutrophil to lymphocyte ratio




Prothrombin time


Partial thromboplastin time


Red cell distribution width


Oxygen saturation

Stan Dev:

Standard deviation


University of Illinois Hospital


White blood cell count


  1. Taubenberger JK, Morens DM. 1918 Influenza: the mother of all pandemics. Rev Biomed. 2006;17(1):69–79.

    Article  Google Scholar 

  2. Bhutta N, Blair J, Dettling LJ, Moore KB. COVID-19, the CARES Act, and families' financial security. SSRN 2020.

  3. De Dombal F, Leaper D, Staniland JR, McCann A, Horrocks JC. Computer-aided diagnosis of acute abdominal pain. Br Med J. 1972;2(5804):9–13.

    PubMed  PubMed Central  Article  Google Scholar 

  4. Ranson J, Rifkind K, Roses D, Fink S, Spencer F. Prognostic signs and the role of operative management in acute pancreatitis. Surg Gynecol Obstet. 1974;139(1):69–81.

    CAS  PubMed  Google Scholar 

  5. Kuo DC, Rider AC, Estrada P, Kim D, Pillow MT. Acute pancreatitis: what’s the score? J Emerg Med. 2015;48(6):762–70.

    PubMed  Article  Google Scholar 

  6. Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’donnell CJ. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2014;63(25 Pt B):2935–59.

    PubMed  Article  Google Scholar 

  7. Rana JS, Tabada GH, Solomon MD, Lo JC, Jaffe MG, Sung SH, Ballantyne CM, Go AS. Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol. 2016;67(18):2118–30.

    PubMed  PubMed Central  Article  Google Scholar 

  8. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression, vol. 398. Hoboken: Wiley; 2013.

    Book  Google Scholar 

  9. Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern Syst. 1991;21(3):660–74.

    Article  Google Scholar 

  10. Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18–22.

    Google Scholar 

  11. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; San Fransisco, CA. 2016. p. 785–94.

  12. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. LightGBM: a highly efficient gradient boosting decision tree. In: Presented at Advances in Neural Information Processing Systems, NIPS, Long Beach, CA, vol. 30. 2017. p. 3146–54.

  13. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems; 2018. p. 6638–48.

  14. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.

    CAS  PubMed  Article  Google Scholar 

  15. Bousquet G, Falgarone G, Deutsch D, Derolez S, Lopez-Sublet M, Goudot F-X, Amari K, Uzunhan Y, Bouchaud O, Pamoukdjian F. ADL-dependency, D-Dimers, LDH and absence of anticoagulation are independently associated with one-month mortality in older inpatients with Covid-19. Aging (Albany NY). 2020;12(12):11306–13.

    CAS  Article  Google Scholar 

  16. Toraih EA, Elshazli RM, Hussein MH, Elgaml A, Amin MN, El-Mowafy M, El-Mesery M, Ellythy A, Duchesne J, Killackey MT. Association of cardiac biomarkers and comorbidities with increased mortality, severity, and cardiac injury in COVID-19 patients: a meta-regression and decision tree analysis. J Med Virol. 2020;92(11):2473–88.

    CAS  PubMed  Article  Google Scholar 

  17. Ji D, Zhang D, Xu J, Chen Z, Yang T, Zhao P, Chen G, Cheng G, Wang Y, Bi J. Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score. Clin Infect Dis. 2020;71(6):1393–9.

    CAS  PubMed  Article  Google Scholar 

  18. Wang K, Zuo P, Liu Y, Zhang M, Zhao X, Xie S, Zhang H, Chen X, Liu C. Clinical and laboratory predictors of in-hospital mortality in patients with COVID-19: a cohort study in Wuhan, China. Clin Infect Dis. 2020;71(16):2079–88.

    CAS  PubMed  Article  Google Scholar 

  19. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, Li Y, Guan W, Sang L, Lu J. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. 2020;180(8):1–9.

    PubMed Central  Article  CAS  Google Scholar 

  20. Ma X, Ng M, Xu S, Xu Z, Qiu H, Liu Y, Lyu J, You J, Zhao P, Wang S. Development and validation of prognosis model of mortality risk in patients with COVID-19. Epidemiol Infect. 2020;148:e168.

    CAS  PubMed  Article  Google Scholar 

  21. Bonetti G, Manelli F, Patroni A, Bettinardi A, Borrelli G, Fiordalisi G, Marino A, Menolfi A, Saggini S, Volpi R, et al. Laboratory predictors of death from coronavirus disease 2019 (COVID-19) in the area of Valcamonica, Italy. Clin Chem Lab Med. 2020;58(7):1100–5.

    CAS  PubMed  Article  Google Scholar 

  22. Zhao Z, Chen A, Hou W, Graham JM, Li H, Richman PS, Thode HC, Singer AJ, Duong TQ. Prediction model and risk scores of ICU admission and mortality in COVID-19. PLoS ONE. 2020;15(7):e0236618.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. Hu L, Chen S, Fu Y, Gao Z, Long H, Ren H, Zuo Y, Li H, Wang J, Xv Q. Risk factors associated with clinical outcomes in 323 coronavirus disease 2019 (COVID-19) Hospitalized Patients in Wuhan, China. Clin Infect Dis. 2020;71(16):2089–98.

    CAS  PubMed  Article  Google Scholar 

  24. Chen R, Liang W, Jiang M, Guan W, Zhan C, Wang T, Tang C, Sang L, Liu J, Ni Z. Risk factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in China. Chest. 2020;158(1):97–105.

    CAS  PubMed  Article  Google Scholar 

  25. Wu G, Yang P, Xie Y, Woodruff HC, Rao X, Guiot J, Frix A-N, Louis R, Moutschen M, Li J. Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: an international multicentre study. Eur Respir J. 2020;56(2):2001104.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. Cheng A, Hu L, Wang Y, Huang L, Zhao L, Zhang C, Liu X, Xu R, Liu F, Li J. Diagnostic performance of initial blood urea nitrogen combined with D-Dimer levels for predicting in-hospital mortality in COVID-19 patients. Int J Antimicrob Agents. 2020;56(3):106110.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Laguna-Goya R, Utrero-Rico A, Talayero P, Lasa-Lazaro M, Ramirez-Fernandez A, Naranjo L, Segura-Tudela A, Cabrera-Marante O, de Frias ER, Garcia-Garcia R. IL-6-based mortality risk model for hospitalized patients with COVID-19. J Allergy Clin Immunol. 2020;146(4):799–807.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Liu Q, Song N, Zheng Z, Li J, Li S. Laboratory findings and a combined multifactorial approach to predict death in critically ill patients with COVID-19: a retrospective study. Epidemiol Infect. 2020;148:e129.

    CAS  PubMed  Article  Google Scholar 

  29. Ok F, Erdogan O, Durmus E, Carkci S, Canik A. Predictive values of blood urea nitrogen/creatinine ratio and other routine blood parameters on disease severity and survival of COVID-19 patients. J Med Virol. 2021;93(2):786–93.

    CAS  PubMed  Article  Google Scholar 

  30. Abdulaal A, Patel A, Charani E, Denny S, Mughal N, Moore L. Prognostic modeling of COVID-19 using artificial intelligence in the United Kingdom: model development and validation. J Med Internet Res. 2020;22(8):e20259.

    PubMed  PubMed Central  Article  Google Scholar 

  31. Qin J-J, Cheng X, Zhou F, Lei F, Akolkar G, Cai J, Zhang X-J, Blet A, Xie J, Zhang P. Redefining cardiac biomarkers in predicting mortality of inpatients with COVID-19. Hypertension. 2020;76(4):1104–12.

    CAS  PubMed  Article  Google Scholar 

  32. Turcotte JJ, Meisenberg BR, MacDonald JH, Menon N, Fowler MB, West M, Rhule J, Qureshi SS, MacDonald EB. Risk factors for severe illness in hospitalized Covid-19 patients at a regional hospital. PLoS ONE. 2020;15(8):e0237558.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Shang Y, Liu T, Wei Y, Li J, Shao L, Liu M, Zhang Y, Zhao Z, Xu H, Peng Z. Scoring systems for predicting mortality for severe patients with COVID-19. EClinicalMedicine. 2020;24:100426.

    PubMed  PubMed Central  Article  Google Scholar 

  34. Zeng Z, Ma Y, Zeng H, Huang P, Liu W, Jiang M, Xiang X, Deng D, Liao X, Chen P. Simple nomogram based on initial laboratory data for predicting the probability of ICU transfer of COVID-19 patients: multicenter retrospective study. J Med Virol. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Yu C, Lei Q, Li W, Wang X, Liu W, Fan X, Li W. Clinical characteristics, associated factors, and predicting COVID-19 mortality risk: a retrospective study in Wuhan, China. Am J Prev Med. 2020;59(2):168–75.

    PubMed  PubMed Central  Article  Google Scholar 

  36. Sun L, Liu G, Song F, Shi N, Liu F, Li S, Li P, Zhang W, Jiang X, Zhang Y. Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J Clin Virol. 2020;128:104431.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Wu S, Du Z, Shen S, Zhang B, Yang H, Li X, Cui W, Chen F, Huang J. Identification and validation of a novel clinical signature to predict the prognosis in confirmed COVID-19 patients. Clin Infect Dis. 2020;71:3154–62.

    CAS  PubMed  Article  Google Scholar 

  38. Lorente-Ros A, Ruiz JMM, Rincón LM, Pérez RO, Rivas S, Martínez-Moya R, Sanromán MA, Manzano L, Alonso GL, Ibáñez B. Myocardial injury determination improves risk stratification and predicts mortality in COVID-19 patients. Cardiol J. 2020;27(5):489–96.

    PubMed  PubMed Central  Google Scholar 

  39. Tatum D, Taghavi S, Houghton A, Stover J, Toraih E, Duchesne J. Neutrophil-to-lymphocyte ratio and outcomes in Louisiana COVID-19 patients. Shock. 2020;54(5):652–8.

    CAS  PubMed  Article  Google Scholar 

  40. Liu X, Shi S, Xiao J, Wang H, Chen L, Li J, Han K. Prediction of the severity of Corona Virus Disease 2019 and its adverse clinical outcomes. Jpn J Infect Dis. 2020;73(6):404–10.

    CAS  PubMed  Article  Google Scholar 

  41. Galloway JB, Norton S, Barker RD, Brookes A, Carey I, Clarke BD, Jina R, Reid C, Russell MD, Sneep R. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: an observational cohort study. J Infect. 2020;81(2):282–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. Yan L, Zhang H-T, Goncalves J, Xiao Y, Wang M, Guo Y, Sun C, Tang X, Jing L, Zhang M. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. 2020;2(5):283–8.

    Article  Google Scholar 

  43. Su M, Yuan J, Peng J, Wu M, Yang Y, Peng YG. Clinical prediction model for mortality of adult diabetes inpatients with COVID-19 in Wuhan, China: a retrospective pilot study. J Clin Anesth. 2020;66:109927.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. Xie J, Hungerford D, Chen H, Abrams ST, Li S, Wang G, Wang Y, Kang H, Bonnett L, Zheng R. Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19.

  45. Zhang S, Guo M, Duan L, Wu F, Hu G, Wang Z, Huang Q, Liao T, Xu J, Ma Y. Development and validation of a risk factor-based system to predict short-term survival in adult hospitalized patients with COVID-19: a multicenter, retrospective, cohort study. Crit Care. 2020;24(1):438–51.

    PubMed  PubMed Central  Article  Google Scholar 

  46. Hun C, Liu Z, Jiang Y, Shi O, Zhang X, Xu K, Suo C, Wang Q, Song Y, Yu K, et al. Early prediction of mortality risk among severe COVID-19 patients using machine learning. Int J Epidemiol. 2020;69:343.

    Google Scholar 

  47. Caramelo F, Ferreira N, Oliveiros B. Estimation of risk factors for COVID-19 mortality-preliminary results. medRxiv. 2020;382:727.

    Google Scholar 

  48. Carr E, Bendayan R, Bean D, Stammers M, Wang W, Zhang H, Searle T, Kraljevic Z, Shek A, Phan HT. Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study. medRxiv. 2020;46:357.

    Google Scholar 

  49. Shi Y, Yu X, Zhao H, Wang H, Zhao R, Sheng J. Host susceptibility to severe COVID-19 and establishment of a host risk score: findings of 487 cases outside Wuhan. Crit Care. 2020;24(1):1–4.

    Article  Google Scholar 

  50. Huang H, Cai S, Li Y, Li Y, Fan Y, Li L, Lei C, Tang X, Hu F, Li F, et al. Prognostic factors for COVID-19 pneumonia progression to severe symptom based on the earlier clinical features: a retrospective analysis. medRxiv. 2020;2:113.

    Google Scholar 

  51. Zhang H, Shi T, Wu X, Zhang X, Wang K, Bean D, Dobson R, Teo JT, Sun J, Zhao P. Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London, UK. 2020.

  52. Jans M, Kuijper T, den Hollander J, Bisoendial R, Pogany K, van den Dorpel M, Zirkzee E, Kok M, Waverijn G, Ruiter R. Predicting severe COVID-19 at presentation, introducing the COVID severity score. SSRN 2020;

  53. Pérez FM, Del Pino JL, García NJ, Ruiz EM, Méndez CA, Jiménez JG, Romero FN, Rodríguez MN. Comorbidity and prognostic factors on admission in a COVID-19 cohort of a general hospital. Rev Clin Esp. 2020.

  54. Regina J, Papadimitriou-Olivgeris M, Burger R, Le Pogam M-A, Niemi T, Filippidis P, Tschopp J, Desgranges F, Viala B, Kampouri E. Epidemiology, risk factors and clinical course of SARS-CoV-2 infected patients in a Swiss university hospital: an observational retrospective study. PLoS ONE. 2020;15(11):e0240781.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. Abdulaal A, Patel A, Charani E, Denny S, Alqahtani SA, Davies GW, Mughal N, Moore LS. Comparison of deep learning with regression analysis in creating predictive models for SARS-CoV-2 outcomes. BMC Med Inform Decis Mak. 2020;20(1):1–11.

    Article  Google Scholar 

  56. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, Curtis HJ, Mehrkar A, Evans D, Inglesby P. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. Ioannou GN, Locke E, Green P, Berry K, O’Hare AM, Shah JA, Crothers K, Eastment MC, Dominitz JA, Fan VS. Risk factors for hospitalization, mechanical ventilation, or death among 10 131 US veterans with SARS-CoV-2 infection. JAMA Netw Open. 2020;3(9):e2022310.

    PubMed  PubMed Central  Article  Google Scholar 

  58. O’Driscoll M, Dos Santos GR, Wang L, Cummings DA, Azman AS, Paireau J, Fontanet A, Cauchemez S, Salje H. Age-specific mortality and immunity patterns of SARS-CoV-2. Nature. 2020;SS2:1708.

    Google Scholar 

  59. Dennis JM, McGovern AP, Vollmer SJ, Mateen BA. Improving survival of critical care patients with coronavirus disease 2019 in England: a national cohort study, March to June 2020. Crit Care Med. 2020;49(2):209–14.

    PubMed Central  Article  Google Scholar 

  60. Horwitz L, Jones SA, Cerfolio RJ, Francois F, Greco J, Rudy B, Petrilli CM. Trends in Covid-19 risk-adjusted mortality rates in a single health system. J Hosp Med. 2020;16(2):90–2.

  61. Gupta RK, Marks M, Samuels TH, Luintel A, Rampling T, Chowdhury H, Quartagno M, Nair A, Lipman M, Abubakar I. Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study. Eur Respir J. 2020;56(6):2003498.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. Lim E, Miyamura J, Chen JJ. Racial/ethnic-specific reference intervals for common laboratory tests: a comparison among Asians, Blacks, Hispanics, and White. Hawaii J Med Public Health. 2015;74(9):302–10.

    PubMed  PubMed Central  Google Scholar 

  63. Brück K, Stel VS, Gambaro G, Hallan S, Völzke H, Ärnlöv J, Kastarinen M, Guessous I, Vinhas J, Stengel B. CKD prevalence varies across the European general population. J Am Soc Nephrol. 2016;27(7):2135–47.

    PubMed  Article  Google Scholar 

  64. McCullough K, Sharma P, Ali T, Khan I, Smith WC, MacLeod A, Black C. Measuring the population burden of chronic kidney disease: a systematic literature review of the estimated prevalence of impaired kidney function. Nephrol Dial Transplant. 2012;27(5):1812–21.

    PubMed  Article  Google Scholar 

  65. Zeng F, Huang Y, Guo Y, Yin M, Chen X, Xiao L, Deng G. Association of inflammatory markers with the severity of COVID-19: a meta-analysis. Int J Infect Dis. 2020;96:467–74.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. Schulman S, Hu Y, Konstantinides S. Venous Thromboembolism in COVID-19. Thromb Haemost. 2020;120(12):1642–53.

    PubMed  PubMed Central  Article  Google Scholar 

  67. Borsky AE, Savitz LA, Bindman AB, Mossburg S, Thompson L. AHRQ series on improving translation of evidence: perceived value of translational products by the AHRQ EPC Learning Health Systems Panel. Jt Comm J Qual Patient Saf. 2019;45(11):772–8.

    PubMed  Google Scholar 

Download references


Amulya Boppana and Yufu Zhang assisted with data extraction and data validation.


This research has been funded by the University of Illinois at Chicago Center for Clinical and Translational Science (CCTS) Award UL1TR002003. The funding body did not take part in the design of the study and collection, analysis, and interpretation of data and writing the manuscript.

Author information

Authors and Affiliations



WG: Involved in all aspects of this study. JMRF: Data acquisition, data interpretation, drafting and revision of the manuscript. KC: Data acquisition, data interpretation and drafting of the initial manuscript. SH: Data interpretation and drafting of the initial manuscript. KMK: Drafting of the initial manuscript and revision of the manuscript. MP: Data interpretation and drafting of the initial manuscript. JT: Data interpretation and drafting of the initial manuscript. JZ: Data interpretation and revision of the manuscript. DH: Design, Data interpretation, drafting of the initial manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to William Galanter.

Ethics declarations

Ethics approval and consent to participate.

This study was approved by University of Illinois at Chicago Internal Review Board. Permission from University of Illinois at Chicago Privacy Board and Internal Review Board were required to access the data used in this study. All the experiment protocols involving human data were in accordance with the University of Illinois at Chicago Privacy Board and Internal Review Board guidelines.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Galanter, W., Rodríguez-Fernández, J.M., Chow, K. et al. Predicting clinical outcomes among hospitalized COVID-19 patients using both local and published models. BMC Med Inform Decis Mak 21, 224 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Mortality
  • Hospitalization
  • COVID-19
  • Statistical model
  • Prediction
  • Model generalizability