Skip to main content

Development of a data-driven COVID-19 prognostication tool to inform triage and step-down care for hospitalised patients in Hong Kong: a population-based cohort study



This is the first study on prognostication in an entire cohort of laboratory-confirmed COVID-19 patients in the city of Hong Kong. Prognostic tool is essential in the contingency response for the next wave of outbreak. This study aims to develop prognostic models to predict COVID-19 patients’ clinical outcome on day 1 and day 5 of hospital admission.


We did a retrospective analysis of a complete cohort of 1037 COVID-19 laboratory-confirmed patients in Hong Kong as of 30 April 2020, who were admitted to 16 public hospitals with their data sourced from an integrated electronic health records system. It covered demographic information, chronic disease(s) history, presenting symptoms as well as the worst clinical condition status, biomarkers’ readings and Ct value of PCR tests on Day-1 and Day-5 of admission. The study subjects were randomly split into training and testing datasets in a 8:2 ratio. Extreme Gradient Boosting (XGBoost) model was used to classify the training data into three disease severity groups on Day-1 and Day-5.


The 1037 patients had a mean age of 37.8 (SD ± 17.8), 53.8% of them were male. They were grouped under three disease outcome: 4.8% critical/serious, 46.8% stable and 48.4% satisfactory. Under the full models, 30 indicators on Day-1 and Day-5 were used to predict the patients’ disease outcome and achieved an accuracy rate of 92.3% and 99.5%. With a trade-off between practical application and predictive accuracy, the full models were reduced into simpler models with seven common specific predictors, including the worst clinical condition status (4-level), age group, and five biomarkers, namely, CRP, LDH, platelet, neutrophil/lymphocyte ratio and albumin/globulin ratio. Day-1 model’s accuracy rate, macro-/micro-averaged sensitivity and specificity were 91.3%, 84.9%/91.3% and 96.0%/95.7% respectively, as compared to 94.2%, 95.9%/94.2% and 97.8%/97.1% under Day-5 model.


Both Day-1 and Day-5 models can accurately predict the disease severity. Relevant clinical management could be planned according to the predicted patients’ outcome. The model is transformed into a simple online calculator to provide convenient clinical reference tools at the point of care, with an aim to inform clinical decision on triage and step-down care.

Peer Review reports


The World Health Organization (WHO) had declared a pandemic outbreak of a new coronavirus, named COVID-19, on 12 March 2020. As of 30 April 2020, Hong Kong (HK) had a total of 1037 confirmed cases as compared to a global caseload of over three million and mortality of over 210,000 at the same snapshot [1]. Four deaths [2] were reported locally. All patients were admitted to the Hospital Authority (HA) hospitals for management. HK has adopted the “early identification, early isolation and early treatment” infection control approach. She is ranked top to effectively curb COVID-19 transmission in the study comparing containment measures among different nations [3], given that HK has a population of 7.5 million residing in one of the most densely populated cities in the world [4].

HA is statutory, publicly funded organization to provide public medical care to all the citizens in HK. The services are organized via seven hospital clusters comprising of 43 hospitals, 49 specialist outpatient and 73 primary care clinics. It provides over 90% of hospital bed days in the territory through a total of 29,435 beds [5]. HA provides a single electronic health record system via the integrated Clinical Management System (CMS) [6], so that data of patients utilizing the service are automatically captured. Since the SARS epidemic in 2003, HA has made available over 1200 airborne infection isolation (AII) beds across 16 public hospitals to support outbreak of infectious diseases. In view of the unprecedented rapidly growing number of infections in the neighbouring Mainland China in mid-February 2020, the HA management implemented a proactive response plan to convert some acute general wards into “retrofit wards” [7] of negative pressures and directed airflow.

In pursuit of the early identification strategy, Hong Kong has been incrementally widening the surveillance coverage to identify and contact trace suspected cases through various channels. All suspected cases are referred to the HA for management. Samples will be sent to the public laboratory in the Department of Health for confirmation: two positive consecutive reverse-transcription polymerase chain reaction (RT-PCR) tests [8, 9] are required, as suggested by the WHO [10]. The patients can only be discharged from the hospital care when the isolation order is lift, given that their clinical conditions improve and they are afebrile with two negative RT-PCR results taken at least 24 h apart [8, 9].

Starting from mid-March, there was second wave with a surge in imported cases from overseas via airport arrivals. In early April, the occupancy rate of our AII rooms and beds peaked at 78% and 70% respectively, which was approaching saturation. The demand pressure on first-tier isolation beds could be partly relieved by the timely conversion and operation of some 400 second-tier isolation beds in 10 hospitals. Patients with stable clinical features but yet to fulfill the discharge criteria are transferred to the second-tier ward for further management. A prognostic model was developed with an aim to early identify those cases with satisfactory or stable clinical outcome for step-down care.

The outbreak in HK slows down since the end of April, with 1047 confirmed cases as of 10 May, which is the cut-off date of data analytics for this study. However, learning the experience from nearby countries like Singapore, HK remains vigilant and is preparing for the possible third wave of infection once the international travels resume normal. The research team repeated the prognostic model based on the entire COVID-19 cohort, aiming to predict their clinical outcome as early as day 1 of admission.


Study population, data collection and definition

All 1037 confirmed cases as of 30 April 2020 were included in this study, and their observational data were traced up to 10 May. Data sources come from internal systems, CMS and eNID of NDORS. “NDORS” is an electronic platform for both HA and the Department of Health, to digitally report all suspected and confirmed statutory notifiable diseases and other infectious diseases of public health concern. A designated “eNID” module for reporting COVID-19 cases and clinical management was specifically built and interfaced to NDORS [8, 9]. This study was a retrospective data analysis on de-identified patient-based electronic medical records from CMS and eNID. Person-based information on history of chronic diseases was retrieved from an established HA’s chronic disease virtual registry that contains 25 pre-defined chronic diseases. The registry was electronically built and based on all past CMS’ medical records using some operational counting and classification rules specific to each non-cancer disease, together with cancer cases sourced from Hong Kong’s Cancer Registry [11].

All 16 HA hospitals that treated COIVD-19 cases adopted a unified classification scheme on clinical conditions. The in-charge physicians would continuously update the condition status whenever the patient deteriorated or improved. The four clinical conditions are (1) critical: require intubation, or extracorporeal membrane oxygenation (ECMO) or in shock; (2) serious: require oxygen supplement of 3 L or more per minute; (3) stable: with mild influenza-like illness (ILI) symptoms; (4) satisfactory: progressing well and likely to be discharged soon. Based on their clinical condition(s) along the entire clinical course, all cases were further amalgamated into three distinct outcome groups to delineate a grading for disease severity. The groups are “critical/serious”, “stable” and “satisfactory”. The patients must have ever been assessed as either “critical” or “serious” clinical condition for one or more days in the “critical/serious” group; otherwise being classified into the second group if ever assessed as “stable”. The remaining third group must be entirely assessed as “satisfactory” along the clinical course.

Statistical analysis

Descriptive statistical analyses were performed for the entire cohort with respect to epidemiological, clinical and laboratory data. In addition, chi-square test for categorical variables and Kruskal–Wallis test for continuous variables were performed to evaluate if there were any differences in a host of prognostic factors on day 1 and day 5 of hospital admission among the three outcome groups. For development as well as evaluation of the model, the entire 1037 study subjects were, proportional to outcome distribution, randomly split into a training dataset comprising of 829 subjects and a testing dataset of the remaining 208 subjects. Having explored two alternative classification algorithms together with median imputation to handle missing data, the Extreme Gradient Boosting (XGBoost) model, which is a boosting decision tree machine learning framework allowing missing values for individual predictor variables, was finally selected to classify the training data into one of the three outcome groups, after taking into account a host of 30 predictors which included age, gender, chronic disease(s) history, 11 presenting symptoms as well as the worst clinical condition status, 15 biomarkers’ readings and Ct value of RT-PCR tests (based on E-Gene of the TIB MIBIOL kit) on day 1 and day 5 of admission. These predictors were chosen with reference to studies on COVID-19 [12,13,14,15,16,17,18,19,20,21,22,23] and SARS [24,25,26].

The XGBoost classifiers were trained and tuned using a fivefold cross-validation approach with the training data to obtain the optimal hyperparameters [27]. All 30 features were ranked according to their relative importance using F-score, which guided a variable selection process to reduce the full model into a simpler one for practical application. The model output for each subject a probability across each of the three outcome groups, summing up to one, and with the highest probability group as the predicted outcome class. After applying the trained model to the testing dataset, it was then analysed in a 3X3 confusion matrix, based on which the model’s overall accuracy rate was computed. In view of the imbalanced outcome group distribution, by outcome class and macro- and micro-averaged sensitivity and specificity of the three outcome groups were derived to evaluate the model performance. Decision tree of the simplified classifiers was also output for each outcome group. Partial dependency plots were output to depict the marginal effect of each model feature on the predicted outcome (Appendix Fig. 3). The XGBoost models were carried out by using Python’s XGBoost version 1.10 whereas other statistical analyses by SAS version 9.4 software.


Epidemiological and clinical profile

Table 1 provides a complete epidemiological and clinical profile of all 1037 laboratory confirmed COVID-19 cases, 95.2% had been discharged (including 4 deceased cases) and the remaining 50 (4.8%) cases have stayed for 19–70 days as at data analytics cut-off date. All the subjects were post-stratified into three pre-defined outcome groups: 50 (4.8%) in the critical/serious group, 485 (46.8%) stable group and 502 (48.4%) satisfactory group, which can then be compared against their respective clinical condition (4-level) on day 1 and day 5 of admission.

Table 1 Patient profile of 1037 COVID-19 confirmed cases as of 30 April 2020 (with data during hospitalisation updated till 10 May 2020)

The study subjects were aged from 1 month to 96 years. Critical/serious group was significantly older (mean 60.6 years, SD ± 14.0) than stable (mean 37.6 years, SD ± 17.5) and satisfactory (mean 35.6 years, SD ± 16.8) group (p < 0.0001). Over half (53.8%) of total caseload was male, comparatively higher in critical/serious group (64.0%) than stable (47.8%) and satisfactory (58.6%) groups (p = 0.001).

Chronic disease(s) were present in 11.8% of this study cohort, with most common ones being hypertension, hyperlipidemia and diabetes. A significant difference (p < 0.0001) in chronic disease(s) prevalence was found: 42.0%, 12.4% and 8.2% in critical/serious, stable and satisfactory group respectively.

81.4% of all the cases were symptomatic upon COVID-19 confirmation. Cough and fever were the top two symptoms prevalent in almost half of total confirmed cases, whereas sore throat in around one-quarter. Among the symptomatic cases, the mean duration between symptom onset and admission was 5.8 days (SD ± 5.4). Comparatively, the satisfactory outcome group had a significantly (p < 0.0001) longer duration (6.4 days, SD ± 6.2) than the stable and critical/serious groups’ (5.2 days, SD ± 4.7; 5.5 days, SD ± 3.7).

Table 1 shows the proportion of patients on anti-viral and corticosteroid drugs. ICU care and intubation treatments were respectively provided to 78.0% and 46.0% in critical/serious outcome groups.

Laboratory readings on day 1 and day 5 of admission

This study defines day 0 according to the time of admission recorded at admission offices from 00:00 to 23:59. A higher proportion of missing values on day 0 is expected because three-quarters of the study subjects had admitted for less than 12 h. Therefore, Table 2 shows the latest laboratory readings as of day 1 and day 5. For those tests whose normal reference ranges vary across HA laboratories, they are expressed as multiples of the upper normal reference.

Table 2 Laboratory readings on Day 1 and Day 5 of admission among 1037 COVID-19 confirmed cases as of 30 April 2020

Across the 15 biomarkers and two derived ratios (albumin-globulin ratio, neutrophil–lymphocyte ratio), no statistically significant between-group differences were found in mean platelet volume (MPV) and alkaline phosphatase (ALP) on both days 1 and 5, white blood count cell (WBC) on day 1 whereas platelet on day 5. Otherwise, statistical significance was stronger on day 5 than day 1 for all other tests except lactate dehydrogenase (LDH) which had a reverse pattern. Five biomarkers had strongest statistical significance (p < 0.0001) in terms of between-group differences on both days, namely C-reactive protein (CRP), lymphocyte counts, albumin (A), globulin (G) and total protein tests. Their respective readings in terms of mean and standard deviation across the three outcome groups are shown in Table 2. Based on the same blood specimen, N/L and A/G ratios can amalgamate and amplify the effects of each pair of biomarkers whose scale in opposite direction, contributed to a much stronger discriminatory effect on both days (p < 0.0001).

Ct value of RT-PCR tests for E-Gene of the TIB MIBIOL kit indicates the viral load, with low value representing high viral load and a theoretical maximum of 40. Significant between-group differences in Ct value were found, on day 1 with lowest mean reading in critical/serious group (24.3 ± 6.1), followed by stable group (25.1 ± 7.1) and then satisfactory group (27.2 ± 7.8) (p = 0.0001); and their corresponding values on day 5 were 25.3 ± 6.4, 27.7 ± 7.8 and 29.5 ± 7.7 (p < 0.0001).

Prediction model performance and application

The full model

For the XGBoost decision tree model on day 1 and day 5, all 30 predictors were ranked according to their relative importance based on F-score in Fig. 1. Based on the testing dataset (n = 208), an extremely strong concordance between the predicted and actual outcome classification in the confusion matrix was found for both models. The overall accuracy rate of the Day-1 model was 92.3%, with the macro-/micro-averaged sensitivity at 82.6%/92.3% and specificity at 96.0%/96.1%. As for the Day-5 model, the overall accuracy rate was 99.5%, with the corresponding macro-/micro-averaged sensitivity at 99.7%/99.5% and specificity at 99.5%/99.5% (Table 3). The XGBoost model consistently edged over the alternative Decision Tree and Random Forest classification models on these predictive performance measures (Appendix Table 4).

Fig. 1

Top 20 Features* ranked according to importance in the XGBoost model. *Feature importance of total protein, gender, pneumonia, chronic disease, sore throat, fatigue, myalgia, chills, headache, and vomiting under Day-1 model and that of total protein, myalgia, pneumonia, fever, vomiting, chronic disease, headache, fatigue, sore throat, chills under Day-5 model were excluded from this figure

Table 3 Predictive performance of the full model and the simplified model

The simplified model associated with a calculator tool

With reference to their relative importance as indicated by F-scores, the next step was to optimize the selection of a smaller set of variables for training alternative simpler models at an opportunity cost of a reduction in accuracy rate. As a result, of the top 10 important features under Day-1 and Day-5 models (Fig. 1), the same seven model features were selected and incorporated into two simplified models. They included the worst clinical condition status (4-level), age group, and five biomarkers, namely, CRP, LDH, platelet, N/L ratio and A/G ratio. Their discriminating effect across the three outcome groups is graphically shown in Appendix Fig. 3. Onset-to-admission duration and Ct value of RT-PCR test were not selected due to concern of recall bias and limited applicability at other settings respectively. The alanine aminotransferase (ALT) and ALP biomarkers were also dropped as they were relatively less important and only appeared in either one model. The overall accuracy rate of this simplified Day-1 model was 91.3%, with the macro-/micro-averaged sensitivity and specificity respectively at 84.9%/91.3% and 96.0%/95.7%. For the Day-5 model, the overall accuracy rate was 94.2%, with the corresponding macro-/micro-averaged sensitivity at 95.9%/94.2% and specificity at 97.8%/97.1% (Table 3) The decision trees for each outcome group of the Day-1 simplified model are visualized in Fig. 2.

Fig. 2

Decision rules using the key features under the Day-1 simplified model and their thresholds

From Table 1, among the 32 patients who deteriorated from satisfactory or stable to critical/serious condition after day 1, the simplified Day-1 model has correctly predicted outcome group for 22 of them (4 out of 7 in the testing data and 18 out of 25 in the training data). For another 10 patients who deteriorated after day 5, all of them can be correctly classified by the Day-5 model.


The aim of the study is to provide additional information for early clinical decision-making in transferring COVID-19 patients to step-down care. This model utilizes the demographic, presenting symptoms, clinical and laboratory findings of the entire Hong Kong cohort to provide the best available analysis for developing the prognostic models.

It is suggested that HK was facing a different cohort of COVID-19 patients as compared to China. As compared to Zhang’s study [13] on disease severity of 663 patients in Wuhan (14% critical, 48% severe, 38% moderate, 0.5% mild), HK had a totally reverse pattern with only 5% critical/serious cases (Table 1) with regard to similar clinical criteria defining “critical” and “serious” between China [13, 28] and HK. This huge difference can be partly explained by a younger age profile in Hong Kong’s entire cohort relative to Zhang’s study cohort (mean age of 38 years versus 56 years).

Most studies predicted fatal outcome of mortality [12,13,14, 17, 19, 21] and critical illness [17, 19, 20] of COVID-19 cases, except two published studies in China [13, 18] and 9 pre-printed studies reviewed by Wynants et al. [29] on risk factors for disease severity spectrum. Since the mortality of HK COVID-19 cases is low (four deaths among 1037 patients), the above predicted fatal outcome is not applicable to stratify the needs of our patients nor to inform subsequent management decision for step-down care. Local study is necessary to provide additional insight to the disease management.

Our study finds significant correlation of patients’ predisposing risk factors like older age, male sex and presence of chronic diseases, in particular those related to cardiovascular diseases, with adverse outcome in univariate analysis. Similar findings were reported in other studies [12, 13, 16, 17, 20, 22]. The presenting symptoms of dyspnoea and fatigue were significant indicator for poorer outcome in other COVID-19 studies [12, 13, 17, 20]; whereas only dyspnoea was found significant in univariate analysis but relatively less important as compared to biomarkers in this study.

For the biomarkers (individually or in ratios) in the top 20 important model features list (Fig. 1), they could predict disease severity; and most of them are also reported as independent prognostic factors of COVID-19 in other studies (13.17–22). We did not include aspartate aminotransferase (AST), procalicitonin (PCT) and D-dimer as in other COVID-19 studies [12,13,14, 17, 22] because only 16%-23% of our study subjects had such biomarkers tested. A few important prognosticators of this study are also in common with three previous local studies on prognostication in SARS patients, in which elevated CRP, LDH, neutrophil count, ALT, creatinine and platelet counts predictive of mortality, ICU care or oxygenation failure [24,25,26].

The study showed that clinical conditions on day 1 and day 5 could predict the subsequent clinical outcome, particularly in the stable and satisfactory patient groups in addition to their biomarkers’ readings. Patients having ‘satisfactory clinical condition’ in Day-1 and 5 are more likely to have a stable or satisfactory outcome. However, this feature is less prominent in critical/serious group’s decision rules (Fig. 2).

Based on the testing dataset, the current XGboost classifier (“the full model”) achieved a very high predictive accuracy rate of 92% and 99% to classify study subjects into three outcome groups on day 1 and day 5 of admission, which is on average equivalent to day 6 and day 10 of symptom onset given the median/mean onset-to-admission duration being 4/5.8 days. In this big data era, it’s technically feasible to automate predicted probabilities of three outcome classes as predicted from this methodology which simultaneously considers a multitude of factors readily available in HA’s CMS. This study’s XGBoost algorithm by default can support missing data values in study subjects, commonly encountered when using observational data at service settings.

In order to provide quick and accurate prediction at the point of care, simplification of the predictive modelling is suggested. Through a robust variable selection process of the two full models on day 1 and day 5 of admission, a simpler model with seven common specific predictors were identified. They included the worst clinical condition status (4-level), age group, and five biomarkers, namely, CRP, LDH, platelet, N/L ratio and A/G ratio. By using these modified models at the point of care, physicians will have additional and convenient information on the prognostic analysis of the COVID-19 patients at early presentation to hospitals. This will increase the turnaround time of our precious inpatient facilities and thus the efficiency of infection control measures.

Experts worldwide warn that COVID-19 may persist into this winter. For Hong Kong to continue with the current “early identification, early isolation and early treatment” strategy, this prognostic modelling is part of the corporate strategy in the preparedness for the potential third wave of outbreak in Hong Kong. Learning from the examples in nearby countries, like Singapore, Hong Kong is preparing the emergency response in case over hundreds or thousands of new cases presented per day, which will overwhelm our existing isolation bed capacity in the public hospital system. Planning for community isolation and treatment facilities is underway. In order to ensure patient safety and quality care, a tool to identify patients with lesser severity and better outcome is necessary. This group of patients probably will be safely managed at the community settings. This study provides a robust analysis on prognostication for COVID-19 cases. The transformation into a simple calculator tool to predict clinical outcome on day 1 and day 5 of admission makes it more convenient for clinicians to apply in their daily settings. It allows early identification of newly confirmed cases upon presentation and triage to appropriate places for isolation and treatment at appropriate timing.

When this manuscript is being revised in response to reviewers’ comments in mid-August 2020, Hong Kong is under the third wave of epidemic. The Day-1 and Day-5 simplified models were tested against an addition of 2984 cases newly confirmed within the following three months after this study’s data cut-off date, reconfirming that its predictive performance among this new cohort (Appendix Table 5) remained similar as the earlier cohort’s (Table 3). HA has started applying the tool at Hong Kong’s newly set up Community Treatment Facility which isolates and treats clinically stable cases aged below 60 upon confirmation. An automated daily patient list with model predictions together with latest predictors’ values is sent to on-site clinicians as a clinical reference to early identify potential deterioration cases for transfer back to acute hospitals for treatment.

Limitation of the study includes the lack of inclusion of data from radiological imaging. Patients with COVID-19 are found to have lung infection with ground glass and consolidative opacities with peripheral and lower lung distribution and bilateral involvement [30]. We are going to include all chest X-ray images up to day 5 in the next study through AI approach of image analytics.


This study provides comprehensive analysis on epidemiological, clinical and laboratory data within the first five days of admission among the entire Hong Kong cohort of 1037 laboratory-confirmed COVID-19 patients. Having considered a multitude of prognostic factors, the model could accurately predict the clinical outcome of a COVID-19 case on day 1 and day 5 of admission, namely, critical/serious, stable and satisfactory. It aims to serve as a management tool as well as a clinical reference tool in order to plan ahead for response measures on triage and step-down care to cope with the next unprecedented wave of COVID-19 epidemic. With a trade-off between practical application and predictive accuracy, the full model consisting of 30 features were reduced into a simpler model with seven specific features in tandem with a simple, easy-to-understand and transparent calculator tool for applications locally and open access globally.

Availability of data and materials

The de-identified datasets generated and analysed during the current study are not publicly available for patient privacy protection as their disclosure at granular level may entail the risk of subject re-identification. Aggregate data are available from the corresponding author on reasonable request. The calculator tool which is developed based on the prognostic models’ results of this study will be available for online public access (




A/G ratio:

Albumin-globulin ratio


Artificial Intelligence


Airborne infection isolation


Alkaline phosphatase


Alanine aminotransferase


Aspartate aminotransferase


Clinical Management System


Coronavirus Disease 2019


C-reactive protein


Extracorporeal membrane oxygenation


Electronic Notification of Infectious Disease




Hospital Authority


Hong Kong


Intensive care unit


Influenza-like illness


Lactate dehydrogenase


Mean platelet volume

N/L ratio:

Neutrophil–lymphocyte ratio


Notifiable Diseases and Outbreak Reporting System




Reverse-transcription polymerase chain reaction


Severe Acute Respiratory Syndrome


Standard deviation


White blood cell count


World Health Organization


Extreme Gradient Boosting


  1. 1.

    WHO Coronavirus Disease (COVID-19) Dashboard. World Health Organization. 2020. Accessed 20 May 2020.

  2. 2.

    Latest situation of cases of COVID-19 (as of 11 May 2020). Centre for Health Protection, HKSAR. 2020. Accessed 20 May 2020.

  3. 3.

    Gibney E. Whose coronavirus strategy worked best? Scientists hunt most effective policies. Nature. 2020;581:15–6.

    CAS  Article  Google Scholar 

  4. 4.

    Countries by density 2020. World population review. 2020. Accessed 20 May 2020.

  5. 5.

    Introduction. Hospital Authority, HKSAR. 2020. Accessed 20 May 2020.

  6. 6.

    Cheung N-T, Fung V, Kong JHB. The Hong Kong Hospital Authority’s information architecture. Stud Health Technol Inform. 2004;107(2):1183–6.

    PubMed  Google Scholar 

  7. 7.

    SCMP Editorial. We can all help ease burden on hospitals. South China Morning Post. 2020. Accessed 20 May 2020.

  8. 8.

    HA Task Force on Clinical Management on Infection. Interim recommendation on clinical management of adult cases with Coronavirus Disease 2019 (COVID-19). Hospital Authority, HKSAR. 2020. https://ha.home/ho/cico/Interim_Recommendation_on_Clinical_Management_of_Adult_Cases_with_COVID-19.pdf. Accessed 20 May 2020.

  9. 9.

    HA Task Force on Clinical Management on Infection. Interim recommendation on clinical management of paediatric patients of Coronavirus Disease 2019 (COVID 19) infection. Hospital Authority, HKSAR. 2020. https://ha.home/ho/cico/Interim_Recommendation_on_Clinical_Management_of_Paediatric_Patients_of_COVID-19infection.pdf. Accessed 20 May 2020.

  10. 10.

    World Health Organization. Global surveillance for COVID-19 caused by human infection with COVID-19 virus: interim guidance. 2020. Accessed 20 May 2020.

  11. 11.

    Hong Kong Cancer Registry. Hospital Authority, HKSAR. Accessed 20 May 2020.

  12. 12.

    Chen R, Liang W, Jiang M, Guan W, Zhan C, Wang T, et al. Risk Factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in China. Chest. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Zhang J, Wang X, Jia X, Li J, Hu K, Chen G, et al. Risk factors for disease severity, unimprovement, and mortality in COVID-19 patients in Wuhan, China. Clin Microbiol Infect. 2020;26(6):767–72.

    CAS  Article  Google Scholar 

  14. 14.

    Zhang L, Yan X, Fan Q, Liu H, Liu X, Liu Z, et al. D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19. J Thromb Haemost. 2020;18(6):1324–9.

    CAS  Article  Google Scholar 

  15. 15.

    McIntosh K. Coronavirus disease 2019 (COVID-19). Epidemiology, virology, clinical features, diagnosis, and prevention. UpToDate. Accessed 19 May 2020.

  16. 16.

    Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–62.

    CAS  Article  Google Scholar 

  17. 17.

    Zheng Z, Peng F, Xu B, Zhao J, Liu H, Peng J, et al. Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis. J Infect. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Zhu Z, Cai T, Fan L, Lou K, Hua X, Huang Z, et al. Clinical value of immune-inflammatory parameters to assess the severity of coronavirus disease 2019. Int J Infect Dis. 2020;95:332–9.

    CAS  Article  Google Scholar 

  19. 19.

    Yao Q, Wang P, Wang X, Qie G, Meng M, Tong X, et al. Retrospective study of risk factors for severe SARS-Cov-2 infections in hospitalized adult patients. Pol Arch Intern Med. 2020;130(5):390–9.

    PubMed  Google Scholar 

  20. 20.

    Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Yan L, Zhang H-T, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nature Mach Intell. 2020;2:283–8.

    Article  Google Scholar 

  22. 22.

    Cummings MJ, Baldwin MR, Abrams D, Jacobson SD, Meyer BJ, Balough EM, et al. Epidemiology, clinical course, and outcomes of critically ill adults with COVID-19 in New York City: a prospective cohort study. Lancet. 2020;395:1763–70.

    CAS  Article  Google Scholar 

  23. 23.

    Qu R, Ling Y, Zhang Y, Wei L, Chen X, Li X, et al. Platelet-to-lymphocyte ratio is associated with prognosis in patients with coronavirus disease-19. J Med Virol. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Chan JCK, Tsui ELH, Wong VCW, Hospital Authority SARS Collaborative Group. Prognostication in severe acute respiratory syndrome: a retrospective time-course analysis of 1312 laboratory-confirmed patients in Hong Kong. Respirology. 2007;12:531–42.

  25. 25.

    Tsui PT, Kwok ML, Yuen H, Lai ST. Severe acute respiratory syndrome: clinical outcome and prognostic correlates1. Emerg Infect Dis. 2003;9:1064–9.

    Article  Google Scholar 

  26. 26.

    Choi KW, Chau TN, Tsang O, Tso E, Chiu MC, Tong WL, et al. Outcomes and prognostic factors in 267 patients with severe acute respiratory syndrome in Hong Kong. Ann Intern Med. 2003;139:715–23.

    Article  Google Scholar 

  27. 27.

    Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011;21:137–46.

    Article  Google Scholar 

  28. 28.

    National Health Commission of the People’s Republic of China. Diagnosis and treatment protocol for COVID-19 (Trial Version 7). 2020. Accessed 20 May 2020.

  29. 29.

    Wynants L, Van Calster B, Bonten MMJ, Collins GS, Debray TPA, De Vos M, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Wong HYF, Lam HYS, Fong AH-T, Leung ST, Chin TW-Y, Lo CSY, et al. Frequency and distribution of chest radiographic findings in COVID-19 positive patients. Radiology. 2019;295:201160.

    Google Scholar 

Download references


This research was supported by the Hong Kong Hospital Authority (HA). We thank our colleagues from HA who provided insight and expertise that greatly assisted the research. The authors would like to acknowledge the valuable clinical advice towards the prognostic model development from our clinical expert group, including Dr Kin Wing Choi of AHN, Dr TL Que of TMH, Dr Owen Tsang of PMH, Dr TC Wu of QEH, Dr KC Lung of PYNEH, Dr Albert Lie of QMH and Dr Vivien Chuang of Head Office Q&SD within the HA and Professor CS Lau of the University of Hong Kong.


This study received no external funding.

Author information




ELHT: Project organizer, study design, result interpretation and manuscript writing. CSML: Study design, data analysis and interpretation, manuscript revising and approval. PPSW: Literature search, study design, manuscript revising and approval. ATLC: Data collection and compilation, manuscript revising and approval. PKWL: Data collection and compilation, manuscript revising and approval. VTWT: Data analysis and compilation. CFY: Data collection and compilation. CHW: Data collection and organisation. LHYL: Project organizer and manuscript revising and approval. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Eva L. H. Tsui or Carrie S. M. Lui.

Ethics declarations

Ethics approval and consent to participate

The research involving de-identified patient data described in this paper were approved by the Research Ethics Committee (Kowloon Central / Kowloon East) (IRB number: KC/KE-20-0202/ER-3).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



See Fig. 3 and Tables 4, 5.

Fig. 3

Relationship between each key feature under Day 1 and Day 5 of admission and the SHAP value for each outcome group (Red: Critical / Serious, Yellow: Stable, Green: Satisfactory)

Table 4 Comparison on predictive performance of three alternative machine learning classification algorithms using the same features under the full model
Table 5 Predictive performance of the simplified model for an extended testing data (COVID-19 confirmed cases during 1 May–9 Aug 2020), which was supplemented to this study when the manuscript was revised in mid-August 2020

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tsui, E.L.H., Lui, C.S.M., Woo, P.P.S. et al. Development of a data-driven COVID-19 prognostication tool to inform triage and step-down care for hospitalised patients in Hong Kong: a population-based cohort study. BMC Med Inform Decis Mak 20, 323 (2020).

Download citation


  • COVID-19
  • Prognostic
  • Prediction
  • Clinical outcome
  • Disease severity
  • Triage
  • Step-down care