Skip to main content

Predicting COVID-19 disease progression and patient outcomes based on temporal deep learning



The coronavirus disease 2019 (COVID-19) pandemic has caused health concerns worldwide since December 2019. From the beginning of infection, patients will progress through different symptom stages, such as fever, dyspnea or even death. Identifying disease progression and predicting patient outcome at an early stage helps target treatment and resource allocation. However, there is no clear COVID-19 stage definition, and few studies have addressed characterizing COVID-19 progression, making the need for this study evident.


We proposed a temporal deep learning method, based on a time-aware long short-term memory (T-LSTM) neural network and used an online open dataset, including blood samples of 485 patients from Wuhan, China, to train the model. Our method can grasp the dynamic relations in irregularly sampled time series, which is ignored by existing works. Specifically, our method predicted the outcome of COVID-19 patients by considering both the biomarkers and the irregular time intervals. Then, we used the patient representations, extracted from T-LSTM units, to subtype the patient stages and describe the disease progression of COVID-19.


Using our method, the accuracy of the outcome of prediction results was more than 90% at 12 days and 98, 95 and 93% at 3, 6, and 9 days, respectively. Most importantly, we found 4 stages of COVID-19 progression with different patient statuses and mortality risks. We ranked 40 biomarkers related to disease and gave the reference values of them for each stage. Top 5 is Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine. Besides, we have found 3 complications - myocardial injury, liver function injury and renal function injury. Predicting which of the 4 stages the patient is currently in can help doctors better assess and cure the patient.


To combat the COVID-19 epidemic, this paper aims to help clinicians better assess and treat infected patients, provide relevant researchers with potential disease progression patterns, and enable more effective use of medical resources. Our method predicted patient outcomes with high accuracy and identified a four-stage disease progression. We hope that the obtained results and patterns will aid in fighting the disease.

Peer Review reports


Coronavirus disease 2019 (COVID-19) outbreaks have caused health concerns worldwide since December 2019; the disease was declared a pandemic by the World Health Organization (WHO) on 11 March 2020 [1]. Over seven million cases of COVID-19 have been reported worldwide, including more than 400,000 deaths (as of 15 June 2020) [2]. Even though the disease has been controlled in certain countries, the WHO director warns the pandemic is still ‘Speeding Up’ [3]. Because of its sudden onset, many hospitals are still facing medical resource shortages. For example, news in [4] reported a lack of medical resources in New Delhi. In [5], Arizona has experienced record-high hospital capacity as coronavirus cases climb. A reasonable allocation of resources according to patient condition is needed.

The solution to this problem involves determining the stages of disease progression by subtyping and predicting the outcome of COVID-19 patients. Then, targeted treatment and medical resource allocation can be carried out for patients in different stages. Recent studies [6,7,8,9,10,11] have used statistical methods to analyze COVID-19 progress by inpatient symptoms. However, different statistical results were obtained by considering different patient groups and different symptoms. At present, there is no clear division of the stages of COVID-19 progression.

Longitudinal disease analysis is the key to understanding disease progression, designing prognoses and developing early diagnostic tools. The time dynamics of disease can provide more information than static symptom observation [12]. Considering the complex patient states, the amount of interventions and the real-time requirement, the data-driven machine learning approaches by learning from electronic health records are the desiderata to help clinicians [13].

Many existing works have used machine learning methods for COVID-19 prediction tasks. We have summarized them in Table 1. For example, in most method of [27] and in [1, 14,15,16,17,18,19], authors used non-deep learning methods, such as k-NN, LR, Cox, SVM and DT to classify CT/X-ray images and predict the outcomes of COVID-19 patients. However, in terms of prediction accuracy, non-deep learning is not as good as deep learning methods. Deep learning methods can train the parameters with complex nonlinearity to learn the data structures and have achieved state-of-the-art in many medical prediction tasks [28,29,30]. Thus, many current works apply deep learning methods for COVID-19 prediction tasks [17, 19,20,21,22,23,24,25,26]. However, these methods either use the simple multi-layer perceptron for predicting or use the convolutional structures for image classification. Both the above methods ignored the temporal development of patient’s status. In the real-world patient records, except for the basic information, vital signs, test values and diagnoses are both time series, especially for the blood samples of COVID-19 patients, the data we used in this paper.

Table 1 The conclusion of machine learning methods used in COVID-19 prediction tasks

Recently, a deep learning method, recurrent neural network (RNN) [31] can efficiently model temporal sequences. It uses recursion in the direction of sequence evolution to learning the relations among past, present and future. But the basic RNN has the long-term dependency problems [32]. Meanwhile, RNN only process uniformly distributed longitudinal data while COVID-19 patient blood samples are distributed nonuniformly with irregular time intervals between observations. Thus, a method that can model this irregular time series of COVID-19 patients is needed.

In this paper, we retrospectively analyzed the blood samples of 485 patients from the region of Wuhan, China. The medical records collected with standard case report forms, including epidemiological, demographic, clinical, laboratory and mortality outcome information, from an online open dataset under an MIT license. We applied a temporal deep learning method Time-aware Long Short-term Unit (T-LSTM) to model the irregular time series of COVID-19 patients. T-LSTM can predict the mortality with more than 98% accuracy before 3 days. Meanwhile, we have discovered four stages of COVID-19 patients. According to the different stages, we gave the analysis of the patient’s state and found the related biomarkers and complications.


In this section, we first introduce the COVID-19 dataset and the data preprocessing process. Then, we describe the methods for mortality prediction and disease progression in detail.

Dataset description

Blood index values can reflect a COVID-19 patient’s physical condition [10]. COVID-19 patients’ blood samples were collected between 10 January and 18 February 2020 at Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China [33]. The dataset contains 80 characteristics from 375 patients with 6120 records as a training set and 110 patients with 757 records as a test set. A case of sample is shown in Fig. 1. It draws lines of the time series of LHD, lymph and hs-CRP of a 70-year-old female patient during hospitalization. We can see the time intervals between two observations are irregular, which could be a few minutes or even days.

Fig. 1
figure 1

Examples and statistics of COVID-19 dataset. The first block is a line chart of an example in dataset - a 70-year-old female patient. It draws the time series of LHD, lymph and hs-CRP during hospitalization; The second block is the distributions of age, gender, LHD, lymph and hs-CRP of survival class (0) and death class (1); The third block is the statistics about dataset. It contains the counts of time series length, the statistics of overall missing rate and the statistics of each feature’s missing rate under different sampling rate

The detailed statistical information of demographic and 74 clinical laboratory test features is listed Table 2. For example, in the dataset, the average age of patients is 58.83, the survival rate is 53.6% and the ratio of male to female is about 1.5:1. We also list the range and mean value of each feature. In Fig. 1, we display the distributions of some features (age, gender, LHD, lymph and hs-CRP) of survival class (0) and death class (1).

Table 2 Demographic, laboratory and outcome information of 375 samples in training dataset

This COVID-19 blood test data is publicly available at

Dataset preprocessing

First, we attempted to find a suitable time measurement granularity. In the raw dataset, the lengths of sequences are unequal and different sampling times result in missing data, with an 85% missing rate on average. The missing rate is expressed in Eq. 1. Nmissing means the number of time points with missing data in one time series. Nall means the number of time points in that time series. The presence of vacancies has a large impact on data quality, resulting in unstable predictions and other unpredictable effects [34]. We used 3 days as the basic sampling interval, reducing the average mr below 30%. The time series length of raw data, the average missing rate and the missing rate for each feature are shown in Fig. 1.


Meanwhile, for feature selection, using all 74 laboratory test features is unrealistic. To address the high missing rate, repeated features and collection difficulties, we considered three key features: lactic dehydrogenase (LDH), lymphocytes (lymph) and high-sensitivity C-reactive protein (hs-CRP). These features contain specific research biomarkers of COVID-19 patients [33] and can be easily collected in any hospital. Considering that only three features may not achieve high prediction accuracy, we also select 40 features (listed in Table 7) with missing rate less than 30% for comparative experiment.


Recurrent neural networks (RNNs) [31] (the first structure in Fig. 2) are deep network architectures designed to model temporal sequences. They take sequence data as input, recursion occurs in the direction of sequence evolution, and all units are chained together. In basic RNN (the second structure in Fig. 2), the current state ht is affected by the previous state ht − 1 and the current input xt and is described as ht = σ(Wxt + Uht − 1 + b), where σ is an activation function, and W, U and b are learnable parameters. Long Short-Term Memory (LSTM) [32] (the third structure in Fig. 2) is a variant of RNN that is adept at solving long-term dependency problems. A standard LSTM unit consists of a forget gate ft, an input gate it, memory cells Ct, \(\overset{\sim }{C_t}\) and an output gate ot.

Fig. 2
figure 2

Structures of the methods. The first block shows the structure of RNNs, including basic RNN, LSTM and our T-LSTM; The second block shows the structure that how to use T-LSTM to complete the outcome prediction task (lower grey area) and disease progressing task (upper grey area)

However, RNNs only process uniformly distributed longitudinal data by assuming that the sequences have an equal distribution of time differences. COVID-19 patient blood samples are distributed nonuniformly. For example, the time gap between two sequential records could be hours or days. Time-aware Long Short-Term Memory (T-LSTM) [35] (the fourth structure in Fig. 2) incorporates the elapsed time information into LSTM. It applies a memory discount to capture the irregular temporal dynamics. T-LSTM can be formulated as:

$$\begin{array}{*{20}l} {C_{{t - 1}}^{S} = \tanh \left( {W_{d} C_{{t - 1}} + b_{d} } \right)} \hfill & {{\text{Short-term}}\;{\text{memory}}} \hfill \\ {\hat{C}_{{t - 1}}^{S} = C_{{t - 1}}^{S} * g\left( {\Delta _{t} } \right)} \hfill & {{\text{Discounted}}\;{\text{short-term}}\;{\text{memory}}} \hfill \\ {C_{{t - 1}}^{T} = C_{{t - 1}} - C_{{t - 1}}^{S} } \hfill & {{\text{Long-term}}\;{\text{memory}}} \hfill \\ {C_{{t - 1}}^{ * } = C_{{t - 1}}^{T} - \hat{C}_{{t - 1}}^{S} } \hfill & {{\text{Adjusted}}\;{\text{previous}}\;{\text{memory}}} \hfill \\ {f_{t} = \sigma \left( {W_{f} x_{t} + U_{f} h_{{t - 1}} + b_{f} } \right)} \hfill & {{\text{Forget}}\;gate} \hfill \\ {i_{t} = \sigma \left( {W_{i} x_{t} + U_{i} h_{{t - 1}} + b_{i} } \right)} \hfill & {{\text{Input}}\;{\text{gate}}} \hfill \\ {\mathop {C_{t} }\limits^{\sim } = \tanh \left( {W_{c} x_{t} + U_{c} h_{{t - 1}} + b_{o} } \right)} \hfill & {{\text{Candidate}}\;{\text{memory}}} \hfill \\ {C_{t} = f_{t} * C_{{t - 1}}^{ * } + i_{t} * \mathop {C_{t} }\limits^{\sim } } \hfill & {{\text{Current}}\;{\text{memory}}} \hfill \\ {o_{t} = \sigma \left( {W_{o} x_{t} + U_{o} h_{{t - 1}} + b_{o} } \right)} \hfill & {{\text{Output}}\;{\text{gate}}} \hfill \\ {h_{t} = o_{t} * \tanh \left( {C_{t} } \right)} \hfill & {{\text{Current}}\;{\text{hidden}}\;{\text{state}}} \hfill \\ \end{array}$$

In Eq. 2, based on the basic LSTM, T-LSTM possesses some new designs. \({C}_{t-1}^S\) component learns the short-term memory of sequence by learnable network parameters. \({C}_{t-1}^T\) is the long-term memory calculated from the former memory cell Ct − 1 with getting rid of \({C}_{t-1}^S\). \({C}_{t-1}^S\) is adjusted to the discounted short-term memory \({\hat{C}}_{t-1}^S\) by the elapsed time function gt). The previous memory \({C}_{t-1}^{\ast }\) is changed to the complement subspace of \({C}_{t-1}^T\) combined with \({\hat{C}}_{t-1}^S\).

We use a log calculation for the elapsed time function. Δt describes the time gap between two records at two sequential time steps t and t − 1. Tt is the actual time at time step t.

$$g\left( {\Delta _{t} } \right) = \frac{1}{{\log \left( {e + \Delta _{t} } \right)}}, \quad\Delta _{t} = T_{t} - T_{{t - 1}}$$

Analysis strategy

We first describe the two tasks in this study and then introduce the specific methods. The whole method process is shown in Fig. 3.

Fig. 3
figure 3

The results of outcome prediction. The first line’s charts are the AUC-ROC of mortality prediction results using baselines; The second line’s chart is the changes of accuracy and loss during training T-LSTM; The third line’s charts are the dimension experiments. They show the accuracy of mortality prediction by using different representation dimensions and the effect of representation dimension reduction; The fourth line’s charts are the effect when using DBSCAN

Task 1 (Outcome prediction ) A set of labeled patient data is represented as \(\mathcal{D}=\left\{\left({x}_i,{c}_i\right)\in \left(X,C\right)|i=1,\dots, n\right\}\). X is a time series set of patients, where \({x}_i=\left\{{x}_i^t|t=1,\dots, {t}_{onset}\right\}\) represents a patient’s records over t time steps; specifically, \({x}_i^t\) is multivariate data, and each dimension is a clinical record represented by a numeric vector. C {0, 1} is the outcome, where class 0 means death and class 1 means survival. The outcome prediction task aims to predict patient outcomes by the prediction function f : X → C

Task 2 (Temporal patient subtyping / Disease progression mining) The goal is to find patient groups G = {gi| i = 0, …, m} with similar feature representation \(R=\left\{{r}_i^t|i=0,\dots, n;t=0,\dots, {t}_{onset}\right\}\). \({r}_i^t\) is the representation of clinical record \({x}_i^t\) at time t. Then, the patient groups G distributed over time are used to analyze the stages of disease progression

In COVID-19 patient outcome prediction task, T-LSTM is used to handle patient record sequences and then make the prediction. The process is displayed in the proposed method of Fig. 2, in the lower gray area.

For a patient i, the input of T-LSTM at time step t is a three-dimensional feature vector \({x}_i^t=\left[{v}_{LDH},{v}_{lymphocytes},{v}_{hs- CRP},\right]\) with time gap Δt. The output is the state representation si at the last time step. We apply this outcome prediction task as a binary classification task, with two classes: death and survival.

The cross-entropy [36] is mainly used to measure the difference between two probability distributions. We expect our predicted distribution of patient outcomes to be closer to the true distribution. Thus, we use the cross-entropy loss function in Eq. 4. Besides, when using sigmoid active function, this loss can avoid the reduced learning rate causing by traditional mean square error loss when gradient decreases.

$$L={L}_{CE}\left(C,\hat{C}\right)=-{\sum}_xp(x) logq(x)=-{\sum}_{i=1}^n\hat{c_i}{\log}{c}_i+\left(1-\hat{c_i}\right){\log}\left(1-{c}_i\right)$$

p(x) is the prior probability (true label vector) and q(x) is the prediction probability (predicted results vector). Correspondingly, \(\hat{C}\) is the real class of input data, and C represents the prediction class.

In COVID-19 patient disease progression task, temporal patient subtyping can uncover the dynamic characteristics of diseases by significantly nuanced subtyping, which leads to the potential stages of disease progression. We addressed the issue by building a time stage reference and providing a low-dimensional representation of each subject, encoding his or her position with respect to this reference.

The method structure is displayed in the upper gray area of proposed method in Fig. 2. It has 4 steps: 1) Acquisition of patient representation rt. We used the hidden state ht, extracted from every T-LSTM unit, as the patient’s representation rt at time step t. 2) Dimension reduction of rt. For better demonstration, we used the t-distributed Stochastic Neighbor Embedding (t-SNE) [37] method to reduce these high-dimensional vectors rt into two dimensions. 3) Obtaining the patient group set G. As prior information about the patient groups was not available, we acquired patient groups by applying an unsupervised clustering method, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [38], on rt. 4) Analysis of G and stages of disease progression. The mortality rate MR, and the average time distance TD were calculated as the analysis criteria.

$$TD=\frac{1}{\mid {g}_k\mid }{\sum}_{x_i^t\in {g}_k}\left({T}_{t_{onset}}-{T}_t\right)$$

Equation 5 expresses the mortality rate. Ndeath is the number of patients with the death outcome and Npatient is the total number of patients. Eq. 6 expresses the average time distance. Tt means the current prediction time and \({T}_{t_{onset}}\) means the time of outcome. gk is the number of patients in group gk.

Evaluation metrics

The prediction results were evaluated by assessing the area under the curve of the Receiver Operating Characteristic (AUC-ROC). The ROC is a curve of the True Positive Rate (TPR) and the False Positive Rate (FPR). TN, TP, FP and FN represent true positives, true negatives, false positives and false negatives, respectively.

$$TPR=\frac{TP}{TP+ FN}$$
$$FPR=\frac{FP}{TN+ FP}$$

The patient groups obtained by unsupervised clustering were evaluated by the Calinski-Harabaz Index (CH), which measures the covariance of data within a class and between classes. A larger CH value indicates a better clustering performance. In Eq. 9, m is the number of data and k is the number of groups. Bk and Wk respectively represent the covariance matrices between groups and within groups.


When we get the stages of COVID-19 patients, we used Kullback-Leibler Divergence (KL divergence) to analyze patient characteristics through each laboratory test feature. KL divergence can measure the degree of difference between two probability distributions. For each feature, we first establish the Gaussian distribution \(\mathcal{N}\left(\mu, {\sigma}^2\right)\) with expected value μ and variance σ2 at each stage. Then, we calculate the average KL divergence of the distribution of adjacent stages. If the average KL divergence of a feature is large, it more likely is a biomarker to distinguish different stages. The basic KL divergence of distribution p(X) and q(X) and the KL divergence of two univariate Gaussian distributions are in Eq. 10 and 11.

$$KL\Big(p(X)\mid \left|q(X)\right)=\sum_{x_i\epsilon X}p\left({x}_i\right) {\log}\frac{p\left({x}_i\right)}{q\left({x}_i\right)}$$
$$KL\left(\mathcal{N}\left({\mu}_1,{\sigma}_1^2\right)\Big\Vert \mathcal{N}\left({\mu}_2,{\sigma}_2^2\right)\right)=\log \frac{\sigma_2}{\sigma_1}+\frac{\sigma_1^2+{\left({\mu}_1-{\mu}_2\right)}^2}{2{\sigma}_2^2}-\frac{1}{2}$$

For measure and evaluate each feature, we use the average KL divergence (Average KL) between neighbor stages gi, gi + 1. m is the number of groups.

$$Average\ KL=\frac{1}{m}\sum_{i=0}^{m-1}{KL}_{g_i,{g}_{i+1}}$$


We used the records of 375 patients as a training set; the ratio of the training set to the verification set was 0.8:0.2. The records of 110 patients made up the test set. This experiment was conducted on 5-fold cross-validation. The code implementation is publicly available at


We use the related works summarized in Table 1 as comparison methods. Related works are divided into non-deep learning methods and deep learning methods. We use Cox [19], k-NN [16], SVM [17], DT [1], BPNN [20], PNN [21], RNN, LSTM and T-LSTM for COVID-19 mortality prediction. T-LSTM is our method.

Outcome prediction results

Table 3 shows the results of COVID-19 mortality prediction using baselines. The AUC-ROC is evaluated at 0, 3, 6, 9, 12, 15, and 18 days early. Here, the results are obtained when the patient’s representations are 64 dimensional. The results indicate that our method T-LSTM performed better than all of baselines no matter how early before the onset times of patients. More precisely, using T-LSTM, the outcome prediction accuracy is above 90% at 12 days early and is approximately 97% accurate when predicting 3 days before the disease outcome. More detailed results of train, validation and test sets using T-LSTM are listed in Table 4.

Table 3 AUC-ROC of COVID-19 mortality prediction results by using baselines
Table 4 AUC-ROC of COVID-19 mortality prediction results by using T-LSTM on different sets at different timestamps

The first four figures in Fig. 3 are the visualizes of prediction results. The first two figures are the AUC-ROC of prediction results of baselines and T-LSTM in different earliness. The third figure is the changes of prediction accuracy and cross-entropy loss when training the model. The fourth figure represents the relation of patient representation dimension and AUC-ROC of prediction using T-LSTM. Too few dimensions lead to incomplete feature learning, while too many dimensions lead to redundant calculations and easy over-fitting. Considering result accuracy, computational complexity and ease of representation use in the following task, we decided to use 64 dimensional vectors to represent patients.

Based on prediction results, we found: 1) Deep learning approaches (T-LSTM, RNN, PNN and BPNN) has higher COVID-19 outcome prediction accuracy than non-deep learning approaches (Cox, k-NN, SVM and DT) as they have completed the highly nonlinear feature transformation by neural junction structures. 2) RNN-based models (T-LSTM and RNN) performance better on time series data as they contain state connections for reproducing time delays and output feedback connections for forming a loop. 3) Time-aware model (T-LSTM) has the best performance as it can model the time series with irregular time intervals, which is a prominent feature of COVID-19 blood sample dataset.

Further, we also select 40 features (listed in Table 7) as the input of T-LSTM for comparative experiment. The results in Table 5 indicate that learning a large number of patient characteristics does not necessarily contribute to COVID-19 patient mortality prediction task. The three biomarkers, LDH, lymph and hs-CRP can make the results better. The AUC-ROC of using 3 features is 3% higher than using 40 features on average. This is because too many features will introduce redundant and irrelevant dependencies leading by redundant features.

Table 5 AUC-ROC of COVID-19 mortality prediction results by using T-LSTM with 40 or 3 laboratory testsb

Disease progression results

By implementing the four steps of disease progression mining, we obtained the 4 stages in both the death class (critical) and the survival class (general), shown in Fig. 4.

Fig. 4
figure 4

The result of COVID-19 progression. This figure shows the four stages of COVID-19 patients by using T-LSTM. The upper clusters are the original clustering of data. The lower are the patient subtyping by using T-LSTM. We can find there are four clusters with distinct boundaries both in death/critical class (red) and survival/general class (blue)

For better visualization, we reduced the dimension of the patient’s representation vector, the fifth figure in Fig. 3 is the dimension reduction effect. We chose 2 dimensions due to low representation loss and clear observation. Besides, the DBSCAN clustering effect evaluated by the CH index is shown in the sixth and seventh figures in Fig. 3. Different clustering effects can be obtained by changing the cluster radius parameter ε. The best CH index values for the death class and the survival class are 680.07 and 44.24, respectively.

In this case, both classes have four groups. Four stages of COVID-19 patients are shown in Fig. 4. For each stage, we calculate the mortality rate MR and the average time distance TD. For the death class, MR increases over stages and is 100% at stage 4. For the survival class, MR decreases over stages and is 0% in stage 4. TD in both classes decreases, meaning that the 4 stages are distributed over time. Meanwhile, as the CH index of the survival class is higher than that of the death class, the survival class stages are relatively loosely distributed.

In Fig. 4, the first clustering is obtained by using biomarkers directly and shows that reasonable stages could not be found. In the first clustering, no stage is clustered in the death class and the 2 stages in the survival class have similar mortality rates and no time difference, as the shade of blue indicates. However, using our method, different stages have obvious differences, such as the data point color deepening with the stages. Meanwhile, as shown in the two insets, the class boundary is clearer based on our method.

The division of stages contains the potential characteristics of COVID-19. Here, we present three findings. First, at the time of initial diagnosis, the COVID-19 infected patients’ physical conditions are similar, regardless of final survival or death. In Fig. 4, the distance between stage 1 for the death class and the survival class is small, and the two even overlap. This indicates that outcome prediction is likely not accurate at the time of infection. Second, the physical condition of patients who eventually die changes less than that of those who eventually survive. We conclude this from CH index values, where the CH value of the survival class is larger than that for the death class. Third, mortality rate varies by stage. For example, if the patient is classified into the death class and is at stage 1, there is still hope of survival, as shown by the green triangle sample in Fig. 4. However, if the patient is in stage 3 or 4, he or she is very likely to die. Based on estimating the current stage of a patient, doctors will be given a reference, which can help them assess a patient’s current situation. Based on that, doctors can carry out targeted treatment and reasonable resource allocation more easily. Thus, the ultimate goal of this study, helping improve the quality of medical care, can be achieved.

Meanwhile, we calculated the mean values of 40 laboratory test features in each stage, the feature values vary with stages. Table 6 lists 10 of these features - Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine, INR, Serum Sodium, eGFR, Serum Chlorine and Albumin. The changes of values through 4 stages are visualized in Fig. 5. Under different classes, the trends of features are different.

Table 6 Feature statistics of patients in different stages of COVID-19 disease progression
Fig. 5
figure 5

Changes of features in different stages. This figure shows the changes of features (Mortality rate, Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine, INR, Serum Sodium, eGFR, Serum Chlorine and Albumin) through 4 stages. Under different classes, the trends of features are different

Further, we calculated the average KL divergence between adjoint stages of each features in 40 clinical laboratory tests data. We ranked the average KL values. The higher the ranking, the better the biomarkers can be used to distinguish different stages. By ranking 40 biomarkers according to the degree of correlation with COVID-19 (Table 7), we have found the biomarkers which are more relevant to COVID-19. The top 10 are: Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine, INR, Serum Sodium, eGFR, Serum Chlorine and Albumin. For each marker, we gave its reference value in each stage, shown in Table 6. Different markers have unique trends in different stages.

Table 7 Ranking of average KL divergence values of top 40 features

Combining the correlation analysis with the reference value analysis, we found that the critical COVID-19 patients are usually accompanied by low values of lymph, eGFR, albumin and Serum Sodium, high values of LDH, hs-CRP, indirect bilirubin, creatinine and INR. For example, in the critical stage 4, the average lymph (%) is just 4 and the average LDH (U/l) is up to 499. Besides, there are three major complications of COVID-19 patients - myocardial injury, liver function injury and renal function injury. We got the conclusions separately through the value of 1) LDH, 2) albumin and indirect bilirubin, 3) serum sodium, serum chlorine and creatinine in different stages.


In recent years, deep learning (DL) technology has been widely used because of its superior performance in various medical applications [28, 29], such as medical image recognition [39] and medication recommendations [40]. In the past year, the spread of COVID-19 has had a peripheral effect on the global economy and health. Therefore, we expect to combine DL methods to study and fight COVID-19.

The states of COVID-19 patients in hospital are dynamic time sequence processes. In addition to the basic information of patients, the vital signs, diagnoses and other lab tests are all time series. Existing many works [14,15,16,17,18,19,20,21,22,23,24,25,26,27, 41, 42] have achieved good results for COVID-19 prediction tasks. But they paid little attention to analyze and model the characteristics of COVID-19 patients’ time series. Dynamic time series modeling can grasp the relationship between historical observations and current observations, and learn the potential development mode of sequence, which is conducive to more accurate prediction and representation. Besides, we have found that the time series of COVID-19 patients is irregularly sampled - Different time intervals exist in adjacent observations. Every possible test is not regularly measured during an admission. When a certain symptom worsens, corresponding variables are examined more frequently; when the symptom disappears, the corresponding variables are no longer examined. These time intervals will add a time sparsity factor when the intervals between observations are large [13]. Therefore, it is necessary not only to deal with time series, but also to deal with irregular time series according to the characteristics of COVID-19 patients. In this paper, we use time-aware LSTM model solved this problem.

Deep learning methods have outstanding performance in prediction tasks. If a doctor predicts survival or death only by observing the biomarkers and using a threshold, the accuracy is at or below 80% for early predictions. However, the clinical reference value of inaccurate results is very low [43, 44]. The DL method has better performance, and the time-aware aspect enables higher accuracy, as shown in Table 3.

However, there are some concerns about the use of DL methods in the high-risk tasks of healthcare.

First, it may be risky to apply predictive methods directly to clinical practice [45]. DL methods may be assistive tools for doctors but not used to make decisions directly. It is challenging for doctors to make optimal decisions, a data-driven and high-accuracy prediction method could help. In this paper, we can predict patient outcomes with higher accuracy than baselines. The method can effectively predict whether the infected patient will die or survive 12 days prior to disease outcome with over 90% accuracy. The prediction accuracies at 3-, 6-, and 9-days prior are 98, 95 and 93%, respectively.

Second, the DL method is the black-box models which are troubled by poor interpretability [46, 47], but clinical settings prefer interpretable models. For example, finding the appropriate prediction-related biomarkers is important. Currently, certain studies have identified suitable predictive biomarkers, such as the 3 biomarkers in [33], which are regarded to have a significant impact on patient mortality. For interpretability, our method identified four disease stages distributed over time. This interesting finding cannot be distinguished simply by the value of biomarkers, as shown as the comparison of two clustering results in Fig. 4. The discovered stages are closely related to mortality and time of illness and can help analyze the status of infected patients. This shows that the DL method can explore new patterns in multidimensional space that cannot be demonstrated by a simple variable value [48]. We also ranked 40 biomarkers according to the degree of correlation with COVID-19 progression, which can provide interpretable results to help doctors better understand the model.

This study has three basic contributions. 1) we can predict patient outcomes with higher accuracy than all baselines. 2) We identified four stages of COVID-19 progression. The stages are closely related to mortality and time of illness and can help analyze the status of infected patients. 3) We give the ranking of 40 biomarkers according to the degree of correlation with COVID-19. Based on this, we found three major complications of COVID-19 patients - myocardial injury, liver function injury and renal function injury.

Further, there is room for further improvement. First, because of the data limitations, our method may face risk of bias, because data-driven methods are easily influenced by different source of data. For example, the results may vary when using different datasets [45]. Second, our current interpretation is based on results, such as the degree of association between biomarkers and disease. We hope to give more explanations about the complex DL black-box model, such as telling more specific effect of each part of the model on the result. Meanwhile, we hope to enlighten the relevant researchers to further study these 4 stages and present more clinical explanations. In particular, we expect to be able to give specific treatments for different stages. Targeted treatment is significant for both patient rehabilitation and the reasonable allocation of medical resources.


The sudden outbreak and epidemic of COVID-19 has led to worldwide suffering and shortages of medical resources. In this paper, we propose T-LSTM to predict patient outcomes with high accuracy - 98, 95 and 93% at 3, 6, and 9 days, which will enable reasonable allocation of medical resources. T-LSTM can effectively model the irregular sampled time series in blood test samples of COVID-19 patients and predict more accurately than existing baselines. Meanwhile, we identified four COVID-19 stages. We ranked 40 biomarkers according to correlations to the outcomes of patients, gave the reference values of top 10 biomarkers for each stage. The top 10 biomarkers are: Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine, INR, Serum Sodium, eGFR, Serum Chlorine and Albumin. We also found 3 complications of COVID-19, which are myocardial injury, liver function injury and renal function injury. By analyzing patients’ life conditions at different stages, doctors can choose specific, targeted treatments. Future work will focus more on the study of pathological characteristics of different stages. Aiming at four stages, targeted treatments are expected to be designed. Meanwhile, more real clinical data are expected to be available for model validation and the model will be used to mine the inherent hidden features of other diseases.

Availability of data and materials

The code implementation is publicly available at The data is from an online open dataset under an MIT license (



Corona Virus Disease 2019


World Health Organization


Lactic Dehydrogenase


High-sensitivity C-reactive Protein




Deep Learning


Recurrent Neural Network


Long Short-Term Unit


Time-aware Long Short-Term Memory


Probabilistic Neural Network


Radial Basis Function Neural Network


Generalized Regression Neural Network


Back Propagation Neuron Network


Decision Tree


Random Forest


eXtreme Gradient Boosting


Support Vector Machines


Cox’s Proportional Hazards Regression


Linear Regression


Naive Bayes


Linear Discriminant Analysis


t-distributed Stochastic Neighbor Embedding


Density-Based Spatial Clustering of Applications with Noise


Mortality Rate


Average Time Distance


The Area Under the Curve of the Receiver Operating Characteristic


Calinski-Harabaz Index

KL divergence:

Kullback-Leibler Divergence


  1. World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report 68, 28 March 2020.

  2. World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report 147, 15 June 2020.

  3. Emily Czachor. WHO Director Warns COVID-19 Pandemic is ‘Speeding Up,’ Here for ‘Long Haul’. Newsweek, News. 6/29/2020.

  4. Sébastien Farcis. Coronavirus: worries and worries about bed shortage in New Delhi. Liberation, Reportage. 6/15/2020.

  5. Katherine Fung. Arizona Hits Record-High Hospital Capacity as Coronavirus Cases Climb. Newsweek, News. 6/29/2020.

  6. Huang C, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506.

    Article  CAS  Google Scholar 

  7. Chen N, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–13.

    Article  CAS  Google Scholar 

  8. Yang X, et al. Clinical course and outcomes of critically ill patients with SARS CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Resp Med. 2020;8:475–81.

    Article  CAS  Google Scholar 

  9. Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:10229.

    Google Scholar 

  10. Wang D, Hu B, Hu C, Zhu F, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. JAMA. 2020;323(11):1061.

    Article  CAS  Google Scholar 

  11. Yang X, Yu Y, Xu J, Shu H, Xia J, Liu H, et al. Clinical course and outcomes of critically Ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med. 2020;8(5):1–7.

    Article  Google Scholar 

  12. Spadon G, Hong S, Brandoli B, Matwin S, Rodrigues-Jr JF, Sun J. Pay Attention to Evolution: Time Series Forecasting with Deep Graph-Evolution Learning. arXiv preprint arXiv. 2020;2008:12833.

    Google Scholar 

  13. Sun C, Hong S, Song M, Li H. A review of deep learning methods for irregularly sampled medical time series data. arXiv. 2020;2010:12493.

    Google Scholar 

  14. Wang C, Deng R, Gou L, Fu Z, Zhang X, Shao F, et al. Preliminary study to identify severe from moderate cases of COVID-19 using NLR&RDW-SD combination parameter. 2020. medRxiv 2020.04.09.20058594.

  15. Farid AA, Selim GI, Khater HAA. A novel approach of CT images feature analysis and prediction to screen for corona virus disease (COVID-19). Int J Sci Eng Res. 2020;11(3):1–9.

    Google Scholar 

  16. Kumar R, Arora R, Bansal V, Sahayasheela VJ, Buckchash H, et al. Accurate prediction of COVID-19 using chest X-ray images through deep feature learning model with SMOTE and machine learning classifiers. 2020. medRxiv 2020.04.13.20063461.

  17. Batista AFM, Miraglia JL, Donato THR, Chiavegatto Filho ADP. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. 2020. medRxiv 2020.04.04.20052092.

  18. Li K, et al. The clinical and chest CT features associated with severe and critical COVID-19 pneumonia. Investig Radiol. 2020;55(6):327.

    Article  CAS  Google Scholar 

  19. Liang W, Yao J, Chen A, et al. Early triage of critically ill COVID-19 patients using deep learning. Nat Commun. 2020;11:3543.

    Article  CAS  Google Scholar 

  20. Sujath R, Chatterjee JM, Hassanien AE. A machine learning forecasting model for COVID-19 pandemic in India. Stoch Env Res Risk Assess. 2020;34(7):959–72.

    Article  Google Scholar 

  21. Dhamodharavadhani S, Rathipriya R, Chatterjee JM. COVID-19 mortality rate prediction for India using statistical neural network models. Front Public Health. 2020;8:441.

    Article  CAS  Google Scholar 

  22. Panwar H, Gupta PK, Siddiqui MK, et al. Application of deep learning for fast detection of COVID-19 in X-rays using nCOVnet. Chaos, Solitons Fractals. 2020;138:109944.

    Article  Google Scholar 

  23. Harsh P, Gupta PK, Siddiqui MK, Morales-Menendez R, Bhardwaj P, Singh V. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos, Solitons Fractals. 2020;140:110190.

    Article  Google Scholar 

  24. Babukarthik RG, Adiga VAK, Sambasivam G, Chandramohan D, Amudhavel J. Prediction of COVID-19 Using Genetic Deep Learning Convolutional Neural Network (GDCNN). IEEE Access. 2020;8:177647–66.

    Article  Google Scholar 

  25. Wang L, Wong A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images. arXiv. 2020;2003:09871.

    Google Scholar 

  26. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, et al. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296(2):E65–71.

    Article  Google Scholar 

  27. Radanliev P, Roure DD, Walton R. Data mining and analysis of scientific research data records on covid 19 mortality, immunity, and vaccine development - in the first wave of the Covid-19 pandemic. Diabetes Metab Syndr. 2020;14(5):1121.

    Article  Google Scholar 

  28. Adam G, Rampášek L, Safikhani Z, et al. Machine learning approaches to drug response prediction: challenges and recent progress. Precis Onc. 2020;4:19.

    Article  Google Scholar 

  29. Jalali A, Lonsdale H, Do N, et al. Deep learning for improved risk prediction in surgical outcomes. Sci Rep. 2020;10:9289.

    Article  CAS  Google Scholar 

  30. Siuly S, Zhang Y. Medical big data: neurological diseases diagnosis through medical data analysis. Data Sci Eng. 2016;1:54–64.

    Article  Google Scholar 

  31. Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1998;1(2):270–80.

    Article  Google Scholar 

  32. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735–80.

    Article  CAS  Google Scholar 

  33. Yan L, Zhang HT, Goncalves J, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. 2020;2:283–8.

    Article  Google Scholar 

  34. Chu X, Ilyas IF, Krishnan S, Wang J. Data cleaning: Overview and emerging challenges, in Proc. Int. Conf. Manage. Data SIGMOD Conf., San Francisco, CA, USA, Jun./Jul; 2016. p. 2201–6.

    Google Scholar 

  35. Baytas I M, Xiao C, Zhang X, et al. Patient Subtyping via Time-Aware LSTM Networks. the 23rd ACM SIGKDD International Conference. ACM, 2017.

    Google Scholar 

  36. Corduneanu C. Integral Equations and Applications; 1991.

    Book  Google Scholar 

  37. Laurens VDM, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008;9(2605):2579–605.

    Google Scholar 

  38. Ester M, Kriegel H-P, Sander J, Xiaowei X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD. 1996:226–31.

  39. Wang S, Wang S, Zhang S, Fan F, He G. Research on recognition of medical image detection based on neural network. IEEE Access. 2020;8:94947–55.

    Article  Google Scholar 

  40. Shang J, Xiao C, Ma T, Li H, Sun J. GAMENet: graph augmented memory networks for recommending medication combination. AAAI. 2019:1126–33.

  41. Tang Z, et al. Severity assessment of coronavirus disease 2019 (COVID-19) using quantitative features from chest CT images. 2020. arXiv:2003:11988.

  42. Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19. Appl Intell. 2020;50(11):3913–25.

    Article  Google Scholar 

  43. Wang Z, He Z, Shah M, Zhang T, Fan D, Zhang W. Network-based multi-task learning models for biomarker selection and cancer outcome prediction. Bioinform. 2020;36(6):1814–22.

    Article  CAS  Google Scholar 

  44. Liu L, Li H, Hu Z, Shi H, Wang Z, Tang J, Zhang M. Learning hierarchical representations of electronic health records for clinical outcome prediction. AMIA Annu Symp Proc. 2020;2019:597–606.

    PubMed  PubMed Central  Google Scholar 

  45. Wynants L, Calster BV, Bonten MMJ, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ (online). 2020;369:m1328.

    Google Scholar 

  46. Molnar C. Interpretable machine learning: a guide for making black box models explainable; 2019.

    Google Scholar 

  47. Ito T, Tsubouchi K, Sakaji H, et al. Contextual sentiment neural network for document sentiment analysis. Data Sci. Eng. 2020;5:180–92.

    Article  Google Scholar 

  48. Gibney HMJ. Analysis of meal patterns with the use of supervised data mining techniques—artificial neural networks and decision trees. Am J Clin Nutr. 2008;88(6):1632–42.

    Article  Google Scholar 

Download references


This paper is dedicated to those who want to fight COVID-19.


This work was supported by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and UKRI’s Global Challenge Research Fund (ES/P011055/1). This work was also supported by the National Key Research and Development Program of China (No. 2020YFB2103402). The funders had no role in study design, data collection, analysis, the writing of the manuscript, or the decision to submit this article for publication.

Author information

Authors and Affiliations



C.S. and S.H. conceptualized the idea. S.H., Z.W. and H.L. initialized, conceived and supervised the project. C. S, S.H. and M.S. collected data and implemented the experiments. C. S, S. H, M.S. and Z.W. drafted the manuscript. All authors provided a critical review of the manuscript and approved the final draft for publication. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Hongyan Li or Zhenjie Wang.

Ethics declarations

Ethics approval and consent to participate

The original study was approved by the Ethics Committee of Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology (Yan Li, et al. “An interpretable mortality prediction model for COVID-19 patients.” Nature Machine Intelligence). In the current study, the data used is from that study as an online open dataset under an MIT license (

Consent for publication

Not applicable.

Competing interests

No financial competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, C., Hong, S., Song, M. et al. Predicting COVID-19 disease progression and patient outcomes based on temporal deep learning. BMC Med Inform Decis Mak 21, 45 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • COVID-19
  • Disease progression
  • Outcome early prediction
  • Irregularly sampled time series
  • Time-aware long short-term memory