Skip to main content
  • Research article
  • Open access
  • Published:

Stacked LSTM based deep recurrent neural network with kalman smoothing for blood glucose prediction



Blood glucose (BG) management is crucial for type-1 diabetes patients resulting in the necessity of reliable artificial pancreas or insulin infusion systems. In recent years, deep learning techniques have been utilized for a more accurate BG level prediction system. However, continuous glucose monitoring (CGM) readings are susceptible to sensor errors. As a result, inaccurate CGM readings would affect BG prediction and make it unreliable, even if the most optimal machine learning model is used.


In this work, we propose a novel approach to predicting blood glucose level with a stacked Long short-term memory (LSTM) based deep recurrent neural network (RNN) model considering sensor fault. We use the Kalman smoothing technique for the correction of the inaccurate CGM readings due to sensor error.


For the OhioT1DM (2018) dataset, containing eight weeks’ data from six different patients, we achieve an average RMSE of 6.45 and 17.24 mg/dl for 30 min and 60 min of prediction horizon (PH), respectively.


To the best of our knowledge, this is the leading average prediction accuracy for the ohioT1DM dataset. Different physiological information, e.g., Kalman smoothed CGM data, carbohydrates from the meal, bolus insulin, and cumulative step counts in a fixed time interval, are crafted to represent meaningful features used as input to the model. The goal of our approach is to lower the difference between the predicted CGM values and the fingerstick blood glucose readings—the ground truth. Our results indicate that the proposed approach is feasible for more reliable BG forecasting that might improve the performance of the artificial pancreas and insulin infusion system for T1D diabetes management.

Peer Review reports


Diabetes Mellitus is a chronic disorder associated with abnormally high levels of blood glucose because the body is unable to produce enough insulin to meet its needs. \(\alpha\)-cell and \(\beta\)-cell in the pancreas are responsible for maintaining the glucose level in blood by secreting insulin and glucagon hormones [1]. Diabetes can be classified primarily into two categories. Type-1 diabetes is due to \(\beta\)-cell destruction and would cause absolute insulin deficiency. Type-2 diabetes is due to a progressive insulin secretory defect on the background of insulin resistance [2]. In type-1 diabetes, hypoglycemia sets in when blood sugar levels are too low (blood glucose concentration \(< 70\,\hbox{mg/dl}\)) [3] and hyperglycemia occurs when blood sugar levels are too high for a prolonged period (blood glucose concentration \(> 180\,\hbox{mg/dl}\)) [4,5,6]. In the long term, hyperglycemia causes severe complications of heart, blood vessel, eyes, kidneys, and other organs [7]. Therefore, proper diabetes management is vital for human health.

External insulin treatments are indispensable for T1D diabetes patients to maintain the blood glucose level in a healthy range [8]. With the naive approach for diabetes management, a patient needs to measure BG concentration several times throughout the day and night using the finger-stick test. Currently, it is the most common self-monitoring approach. Improved techniques such as a combination of an insulin pump for Continuous Subcutaneous Insulin Infusion (CSII) and a device for continuous glucose monitoring (CGM), are required for an effective blood glucose management system, known as sensor-augmented-pump (SAP) therapy [9]. CGM device takes glucose measurements with an interval of a particular time-frame. For example, most of the CGM devices take 288 measurements per day with 5 min interval. SAP therapy has been further improved by utilizing control algorithms for dynamic insulin delivery. In terminology, that is known as “Artificial Pancreas” (AP), closed-loop control of blood glucose in diabetes [10]. Statistical and machine learning techniques with the availability of previous continuous BG records make BG prediction more convenient. This prediction mechanism allows a patient or control algorithm to take the initiative to lower the adverse effect of unintended glycemic events.

Various statistical and machine learning methodologies have been proposed for blood glucose forecasting. The autoregressive integrated moving average (ARIMA) model-based algorithm is an example of a classical statistical method for blood glucose prediction [11]. The primary limitation of naive machine learning approaches is that it fundamentally depends on the representation of the input data (e.g., support vector regression [12] or k-nearest neighbor classifier [13]). Daskalaki et al. [14] proposed a generic physiological model of blood glucose dynamics to generate essential features of a Support Vector Regression (SVR) model that took daily events such as insulin boluses and meals into consideration. The meal model, insulin model, and exercise model were used with the SVR model in work [15].

Several works have been done for blood glucose prediction using artificial neural networks (ANNs) [16,17,18,19]. Classically, an ANN has a single hidden layer. However, deep learning models have many hidden layers. These deep models with higher complexity outperform traditional shallow neural networks. These deep learning models with higher complexity can learn the pattern automatically from data [20, 21]. Especially for sequential data, recurrent neural networks outperform feed-forward ANNs. However, the drawbacks of using classical RNNs are limitations in its ability to discover, that arises from the vanishing/ exploding gradient problem. This issue has been addressed by Long short-term memory (LSTM) networks [22] with the addition of memory cell and forget gate to classical RNN. In very recent works, [23,24,25,26], convolutional RNN and LSTM are investigated for BG prediction more accurately. Fox et al. [27] used RNN with Gated Recurrent Unit (GRU) cell and Sun et al. [28] utilized Bi-LSTM based RNN for BG forecasting. However, the accuracy achieved with state-of-the-art models for real patient data is not high enough that these approaches might not be applicable in the health domain. For example, the best average RMSE value achieved for the OhioT1DM dataset is 19.04mg/dl so far in the most recent works. Moreover, in these recently proposed methodologies with the RNN model, sensor fault is not taken into consideration. Comparatively, the less accuracy in the BG prediction schemes motivates us for this work to improve the reliability and accuracy of the BG forecasting mechanism.

Early detection and prevention of potential glycemic events such as hypoglycemia and hyperglycemia are one of the primary purposes of the artificial pancreas system. Comparatively, more extended works have been done associated with hypoglycemia detection than the detection of hyperglycemic events. Several statistical methods were studied, such as linear prediction models [29], recursive auto-regressive partial least squares models [30] and multi-variable model [31], for modeling the CGM sensor data for a reliable hypoglycemia early alarm system. In work [32], an RNN model and two different autoregressive models were proposed to design a hypoglycemia/hyperglycemia early warning system (EWS). The author developed a time-sensitive ANN-based hypoglycemia prediction system that can predict future hypoglycemic events within a prediction horizon (PH) of 30 min [33]. Additional physiological models, along with ANN, are studied in work [34] to predict nocturnal hypoglycemia. However, we see very limited works explicitly for the detection of hyperglycemia [35]. The authors have shown in this work that the electrocardiographic (ECG) signals can be employed with the ANN model to detect hyperglycemic events practically and non-invasively.

CGM sensor reading is a crucial factor in BG prediction as a slight error in CGM sensor reading might result in the wrong prediction. However, sensor fault is very common in the CGM system. That is why most of the clinical BG data sets are prone to have errors. Several factors are responsible for that kind of fault, such as the decay of sensor sensitivity, pressure-induced sensor attenuation (PISA) [36], and interruption in signal transmission, etc. Furthermore, bias and latency might be present in a CGM reading [37]. Another issue is that the reading of sensors from the same manufacturer might be different due to manufacturing variability. As a consequence, the predicted BG value and the estimation for insulin might be erroneous due to such faulty BG reading. Consequently, it might decrease the efficacy of the diabetes management system. So, to propose more reliable BG forecasting methodologies, sensor fault should be taken into consideration. However, synthesized data sets are exceptions of this case as this is out of the scope of the sensor fault.

Moreover, there might be another issue responsible for inaccurate BG prediction. The CGM sensor glucose is measured from interstitial fluid samples instead of the blood sample [38]. However, there is a discrepancy in time and magnitude between BG and interstitial glucose (IG) [39]. In the BG forecasting system, CGM sensor readings are used as one of the primary inputs to predict the future BG level. As IG and BG values are different, the BG prediction with IG as input is challenging. We find in our experiment that the BG prediction made with the Kalman smoothed CGM reading is closer to actual BG reading measured with fingerstick than the prediction made with unprocessed CGM reading. In this study, we assume the fingerstick BG reading as the ground truth.

In this work, we propose a deep learning approach for blood glucose prediction using a multi-layered LSTM based recurrent neural network model. We use OhioT1DM (2018 version) dataset [40], containing eight weeks’ data for each of six people with type-1 diabetes, for training and evaluation of our proposed model. We utilize several features instead of only CGM measurement. This is because the dynamics of the glucoregulatory system depend on several factors such as carbohydrate intake from meals, the amount of bolus dose or infused insulin, exercise, and physical activeness. Using only the CGM measurement as input might not be enough for learning those complex dynamics. Therefore, we investigate several combinations of different physiological information from the OhioT1DM dataset and choose the most optimal combination of those for our proposed model. We find that step count information from a fitness band can be a useful feature. It turns out that the combination of CGM reading, step count, carbohydrate intake from the meal, and bolus dose, is the most optimal feature set as the input of the model. Note that we use preprocessed CGM data with Kalman smoothing (KS) [41] instead of using raw CGM data to mitigate the effect of sensor fault on BG prediction. The overall architecture of our proposed BG prediction system has been illustrated in Fig. 1.

Our main contribution can be summarized as follows:

  • We propose a novel approach using a stacked LSTM based deep recurrent neural network architecture for BG prediction.

  • We introduce a more reliable BG prediction system by utilizing Kalman smoothing for mitigating the sensor fault in CGM reading. We demonstrated that the prediction made with this approach is closer to the actual BG level (fingerstick BG reading) than the conventional prediction method.

  • Our study reveals that a person’s step count information from the fitness band can be effectively utilized for improving BG prediction accuracy.

Overall, our proposed methodologies provide more accurate and reliable forecasting accuracy in terms of RMSE.



We use the OhioT1DM (2018 version) [40] dataset to train and test our model. This dataset includes eight weeks of data for each of six type-1 diabetes patients. The number of male and female patients was two and four, respectively. For data collection, Medtronic 530G insulin pumps and Medtronic Enlite CGM sensors were used throughout the data collection period. Each of the patients reported daily events data via a smartphone app and a fitness band. The dataset includes CGM blood glucose level every 5 min—288 samples per day, blood glucose levels with fingerstick (self-monitoring), insulin doses, in the form of bolus and basal, self-reported meal time with estimated carbohydrate intake, exercise time, sleep, work, stress, and illness. The data set also includes 5-min aggregations of step count, heart rate, galvanic skin response (GSR), skin, and air temperature. In this study, we experiment with each of these attributes to find out the optimal attribute set for the BG prediction model. Table 1 shows the number of training and test examples for each patient.

Table 1 Gender, number of training and test examples per patient

At first, we train our model with input data containing the basic and single feature—that is CGM values. Then we calculate the accuracy of this model. We assume this accuracy as base accuracy. In the next phase of the study, we use incremental and brute force strategy for adding new attributes to the input of the model. After including a new attribute to the input of the model, we determine the model’s performance/accuracy. If the accuracy gets better than the previous accuracy where the model is trained without that particular attribute, only then we consider that attribute as a potential attribute for further experiment. For example, we notice that the accuracy of the model trained with CGM values and bolus information is better than the model trained with only CGM values (base accuracy). So we consider bolus information as a potential attribute. whereas, adding heart-rate information with CGM values, does not provide better accuracy than base accuracy. So we do not include this attribute in the input of our model. From our experiment, it turns out that CGM values, carbohydrate intake from the meal, insulin dose as a bolus, and 5-min aggregation of step count from the fitness band, have a positive effect on the accuracy of the model. It is worth mentioning that some attributes such as insulin from basal, sleep, heart rate, GSR, and skin temperature have no effect or sometimes adverse effect on the accuracy improvement of our proposed model. Table 2 shows the prediction accuracy for different feature sets as input.

Table 2 BG prediction experimental performance in terms of RMSE, with Stacked LSTM model for different feature set as input

Feature extraction from physiological information

We select CGM values, meal info, insulin dose, and step count info, as the final feature set. These four features constitute the 4-channel inputs for the proposed model.

Carbohydrate information

In the case of meal information, we consider the fact that blood sugar starts to rise after 15 min of having the meal and reaches its peak after 1 h [42]. We name this phase as the Increasing Phase. Then the carbohydrate level in blood starts to decrease. That is the Decreasing Phase. At any time-index \(t_s\), we calculate the amount of carbohydrate that is effective at that moment in the blood after having a meal. The calculated amount of carbohydrates is treated as the input for the time-index \(t_s\) to the model. Every time the subject encounters a meal, we need to keep track and update the time-index \(t_{meal}\) and the amount of carbohydrate \(C_{meal}\) from the meal. We ignore the first 15 min (3 samples of data-point with 5 min of an interval) right after having a meal since it takes 15 min to have the effect of the meal on the blood glucose level. Thus, the equation for the effective carbohydrates \(C_\textit{eff}\) in blood for any time-index \(t_s\) within the first 60 min of time interval after having the meal, is as follows:

$$\begin{aligned} \begin{array}{l} C_{{eff}}\left( t_s\right) = \left\{ \left( t_s-t_{{meal}}\right) \beta _{{inc}}\right\} C_{{meal}} \end{array} \end{aligned}$$

Here \(\beta _{inc}\) is carbs increasing factor. We use \(\beta _{inc} = 0.111\), which implies that the carbohydrate level in the blood reaches its maximum (100% of the amount of carbohydrate taken from the meal) by increasing with the rate of 11.1% for every increment of time-index (5 min interval). Note that, \(t_{meal} >= t_s\). At the 60th minute after having the meal, \(C_\textit{eff}\) attains the maximum value by increasing with a rate of 11.1% per time-index. At that moment, the value of \((t_s-t_{meal})\beta _{inc}\) has been approximately less than or equal to 1.00. Note that we do not consider the first 15 min (3 time-indexes) right after having the meal in the above equation; instead, we consider nine time-indexes out of twelve time-indexes withing 60-min interval so that the value of \(C_\textit{eff}\) doesn’t exceed 100% of \(C_\textit{meal}\). When \(C_\textit{eff}\) reaches its maximum, then the decreasing phase begins. We calculate the amount of carbohydrate effective or appeared in the blood as glucose at any particular time-index \(t_s\) within the decreasing phase is as follows.

$$\begin{aligned} \begin{array}{l} C_\textit{eff}\left( t_s\right) = C_{\textit{meal}}\left\{ 1 - \left( t_s-t_{\textit{meal}}\right) \beta _{\textit{dec}} \right\} \end{array} \end{aligned}$$

Here, \(\beta _{dec}\) is the carbs decreasing factor. we set the value of \(\beta _{dec}\) as 0.028 according to the assumption that the duration of the decreasing phase is around 3 h. This means that after 3 h (the number of time-index is 36), \(C_\textit{eff}\) would be approximately near to 0.00. We update \(C_\textit{eff}\) in every steps where, \(C_\textit{eff} = max(0, C_\textit{eff})\). Thus, we ignore any negative values for \(C_\textit{eff}\).

Insulin information

We use crafted insulin information as one of the four inputs to the proposed RNN model. We consider only bolus information for further crafting. In this study, we find that basal insulin has no considerable effect on blood glucose prediction accuracy. There are two pathways of insulin absorption [43], slow and fast. Approximately 67% of delivered insulin passed through the slow channel with an average absorption rate of 0.011 \(min^{-1}\), whereas 33% of insulin passed through the fast one with an absorption rate of 0.021 \(min^{-1}\). Therefore, in this experiment, we use the weighted average of these two absorption rates, which is 0.014 \(min^{-1}\). For every 5 min interval, we denote it with \(R_\textit{insulin} = 5*0.014\). We calculate the effective insulin on the body \(I_\textit{eff}\) at any particular time-index \(t_s\) with the equation as follows:

$$\begin{aligned} I_\textit{eff}\left( t_s\right) = I_\textit{bolus}\ - \left( t_s-t_\textit{bolus}\right) R_\textit{insulin} \end{aligned}$$

Here, \(I_\textit{bolus}\) is the amount of insulin delivered in the form of bolus to the patient, and \(t_\textit{bolus}\) is the time-index when the most recent insulin delivered to the patient. We update \(I_\textit{eff}\) in every steps, where \(I_\textit{eff} = max(0, I_\textit{eff})\). This calculated effective insulin in the body \(I_\textit{eff}\) is used as one of the inputs for the RNN model. In the above equation, we do not consider insulin absorption delay.

Step count information

We compute a weighted average of the number of steps taken by the patient at time \(t_s\). To calculate \(S_{avg}\), we consider the previous 50 min (10 readings with 5 min of the interval) where the steps count for most recent time-index has a more significant weight. Note that the weight decreases gradually with time-index.

$$\begin{aligned} S_{avg}\left( t_s\right) = \frac{1}{n} \sum _{i=0}^{n-1} (n-i) \times steps\left( t_s - i\right) \end{aligned}$$

In the above equation, \(n=10\) as we only consider the previous 50 min or 10 data points. The computed \(S_{avg}\) is treated as one of the inputs. Theoretically, RNN can learn from the previous sequential data by its nature. However, we have observed from our extensive experiments that using additional weight for calculating the step information improves the prediction accuracy to a small extent but consistently for each of the six patients. This explicitly implied weight in the recent 50 min out of the 120 min step count history (24 data points), emphasize the effect of the recent walking pattern. Eventually, this emphasis facilitates the RNN model to predict BG level with marginally better accuracy.

Glucose level information from CGM readings

Either unprocessed or Kalman smoothed CGM readings are considered as an input channel out of the four inputs.

Kalman smoothing for preprocessing

In this section, we discuss Kalman filtering and Kalman smoothing briefly. The KS method outputs an interpolated time series of glucose estimates with mean and variance. It can automatically correct errors in the CGM readings where the estimated variance can be utilized for determining at which times the data are reliable. In our study, KS has been used as a pre-processing technique for sensor fault correction in the CGM reading. We use a modified implementation of KS for the OhioT1DM dataset, from the work [41].

The Kalman filter is a technique of estimating the current state of a dynamical system from the previous observations. In Kalman filtering, records of data are used for the calculation of the estimates. Thus, the Kalman filter is appropriate for real-time data processing. It is a forward algorithm where each step is computed analytically. The model and observation can be written as:

$$\begin{aligned} x_{k+1}= & {} \phi _{k}x_{k} + B_{k}u_{k} + w_k \end{aligned}$$
$$\begin{aligned} y_{k}= & {} H_{k}x_{k} + v_k \end{aligned}$$

Here, x, u, and y are the system internal state, input to the system, and measured output respectively. Whereas v is the process noise, and w is the measurement noise. These noise processes are assumed to be zero-mean Gaussian. \(\phi\) is the transition matrix, and H is the measurement matrix.

In the phase of time update, the Kalman filter computes the priori estimates, a state estimate \({\bar{x}}\) and state covariance matrix \({\bar{P}}\) [44].

$$\begin{aligned} {\bar{x}}_k= & {} \phi _{k-1}{\hat{x}}_{k-1} + B_{k-1}u_{k-1} \end{aligned}$$
$$\begin{aligned} {\bar{P}}_k= & {} \phi _{k-1}{\hat{P}}_{k-1}\phi ^T_{k-1} + Q_{k-1} \end{aligned}$$

Then the phase measure update is performed, where the posteriori estimate \({\hat{x}}\) and \({\hat{P}}\) are calculated. The equations are as follows [44]:

$$\begin{aligned} K_k= & {} {\bar{P}}_k H^T_k\left( H_k{\bar{P}}_k H^T_k + R_k\right) ^{-1} \end{aligned}$$
$$\begin{aligned} \hat{x}_k= & {} K_k\left( y_k - H_k \bar{x}_k\right) \end{aligned}$$
$$\begin{aligned} \hat{P}_k= & {} \left( I - K_k H_k\right) \bar{P}_k \end{aligned}$$

Kalman smoothing can be applied to get better estimates than Kalman filtering. However, it is required to have the whole dataset available at the time of performing Kalman smoothing. In our experiment, that is true. The Rauch–Tung–Striebel (RTS) algorithm [44] utilizes previous as well as the following data at the time k to generate the estimate. In RTS, there is one forward pass through the available data applying the Kalman filter to generate the priori, posteriori, and covariance matrices. These generated estimates and covariance are then treated as input to a subsequent backward pass. In this phase, RTS calculates the smoothed estimate \({\hat{x}}^s_k\) and \({\hat{P}}^s_k\).

$$\begin{aligned} C_k= & {} {\hat{P}}_k \phi _k {\bar{P}}^{-1}_{k+1} \end{aligned}$$
$$\begin{aligned} {\hat{x}}^s_k= & {} {\hat{x}}_k + C_k\left( {\hat{x}}^s_{k+1} - {\bar{x}}_{k+1}\right) \end{aligned}$$
$$\begin{aligned} {\hat{P}}^s_k= & {} {\hat{P}}_k + C_k\left( {\hat{P}}^s_{k+1} - {\bar{P}}_{k+1}\right) C^T_k \end{aligned}$$

We apply Kalman smoothing on CGM readings from the OhioT1DM dataset to get smoothed CGM values. In the OhioT1DM dataset, there are two separate individual files for each patient (Total 12 data files in the dataset). These two files contain training and test data, respectively. These training and test data sets for each patient are non-overlapping as they comprise patient information from different time windows (days). In the preprocessing phase, we apply Kalman smoothing on training and test data files independently for a particular patient to ensure no data/label leakage between the training and test data set. Consequently, we can effectively avoid situations where the preprocessing parameters are fitted with the knowledge of the test set. Original CGM readings (represented by red dots) and smoothed CGM values (represented by the blue line) are shown in Fig. 2. It is noticeable that, after applying Kalman smoothing, there are fewer abrupt changes or fluctuations in smoothed CGM values.

Fig. 1
figure 1

Architecture of the proposed BG prediction system

Fig. 2
figure 2

Kalman smoothed CGM values vs raw CGM readings for 12 h time window from the patient #563. Here red dots are raw CGM readings, where the blue line is the smoothed CGM values

Fig. 3
figure 3

Two layered stacked LSTM network

Fig. 4
figure 4

12-h of prediction results over PH = 30, for patient #570 (left) and #575 (right), are illustrated. Here red dots are CGM readings (unprocessed); those are ground truths. Where the blue line is the prediction curve, and the light blue region is the standard deviation

Fig. 5
figure 5

12-h period prediction over PH=30, for patient #563 with the model trained with unprocessed CGM readings (Left) and Kalman-smoothed CGM readings for sensor fault correction (Right) respectively


Our model uses a neural network to learn the prediction. A recurrent neural network is a feed-forward neural network that can model sequential data. It utilizes weight sharing between each element in the sequence over time. There are diverse variants of RNN. In the basic RNN variant termed as Vanilla RNN, the transition function is a linear transformation of the hidden state vector h and the input vector x, followed by an activation function for non-linearity.

$$\begin{aligned} h_t = \tanh {\left( W\left[ h_{t-1} , x_t \right] + b\right) } \end{aligned}$$

Where W is weight matrix, b is a bias vector, and \(\tanh\) is the activation function. Classical RNN has the form of a chain of repeating modules of neural networks with straightforward architecture. Theoretically, RNN can learn long term dependency. However, practically it suffers from vanishing gradient problem and exploding gradient problem as a result of long term dependency. This dependency makes RNN less useful and more challenging to train [45]. Hochreiter and Schmidhube [22] proposed Long short-term memory (LSTM) for addressing this issue. The LSTM network mitigates this long-term dependency problem to some extent by utilizing the concept of memory, the gate structure, and constant error carousel. In the case of our study, LSTM is more suitable to model blood glucose levels as there are dependencies upon immediate previous entries in sequential diabetes patient data. Consequently, we prefer the LSTM based RNN model over the other model architectures to make the BG prediction more rigorous.

In our work, we build an LSTM network containing 128 element hidden vector.

We add a dropout layer after the first LSTM layer. Dropout is intended to reduce overfitting and improve the generalization of the model [46]. The last layer of the LSTM outputs a vector \(h_i\), fed as the input of a fully connected multi-layer network. This network consists of three layers, including two dense layers and one output layer. These dense layers contain 512 neurons, 128 neurons, and the output layer contains a single neuron, respectively. There is an activation function for each layer. We choose the rectified linear unit (ReLU) activation function for the first two dense layers and the exponential activation function for the output layer.

The input of the model is a multi-dimensional sequence of preprocessed BG level from CGM reading, carbohydrates amount from the meal, carbohydrates amount from the bolus, and step count related data. The output of the model is a prediction regarding BG level withing a prediction horizon. We experiment with the prediction horizon of 30 and 60 min.

Our proposed model adopts the negative log-likelihood (NLL) loss function. We use Adaptive Moment Estimation (Adam) as the optimizer.

We study the correlation between BG prediction accuracy and depth of the model architecture. We see in previous work that the deep recurrent neural networks provide empirical superiority over shallow networks [47]. The shallow network cannot precisely model the information with a temporal hierarchy [48]. However, the concept of depth in an RNN is not the same as it is in feedforward neural networks [49]. To make the RNN model deeper, we employ the stacking technique. Our proposed model consists of two LSTM layers. The first LSTM layer provides a sequence output that is fed as one input to the LSTM layer above. Both LSTM layers have the same internal architecture described earlier. Figure 3 illustrates the architecture of two Layered Stacked LSTM.

We also experimented with the GRU cell instead of the LSTM cell. However, in this application, the network with the LSTM cell outperforms the network with GRU cells. A comparison between the performance of these two types of RNN is shown in “Results” section. Thus, we choose the LSTM cell for our final RNN model.

We also consider the practicality of the self-attention based model for BG prediction. In recent years, models with attention mechanisms such as transformer [50] outperform naive RNN models, especially in machine translation and natural language processing tasks [51]. However, Mirshekarian et al. [52] show that a neural attention module can improve the BG prediction performance improvement for only synthetic data, whereas the authors do not find any prediction performance improvement for real data. As we focus on improving BG prediction accuracy for only real data, we do not consider using the self-attention mechanism in this study further.

Experimental setup

We conduct extensive experiments on data and our proposed model to tune hyperparameters and determine the optimal setup. We measure the efficacy of our proposed method to predict BG values for two different setups. In the first setup, we use raw CGM readings, whereas, in the second setup, we use Kalman smoothed CGM values for sensor error correction.


We experiment with two different RNN cell types—LSTM and GRU cells, respectively. The dynamics of the glucoregulatory system are considered as nonlinear. Hence, learning blood glucose dynamics is a complicated task where consideration of previous data is very crucial for effective prediction mechanisms. As a consequence, we skip experimenting with Vanilla RNN cells because of its long term dependency problem [53]. Our experimental result demonstrates that the LSTM network achieves better prediction accuracy than a network with GRU cells. We train our model with the six patients’ training data from the OhioT1DM dataset. We experiment with 30, 60, 120, and 240 min of data history as the input for the proposed model. Those contain 6, 12, 24, and 48 training examples respectively as CGM readings are taken with every 5 min of interval. We also investigate the effect of the number of LSTM state hidden units and fully connected layers in the network. Different combinations of LSTM state size-of 64, 128, and 248 with two fully connected layers are tested. We find that the network model with 128 LSTM states with two fully connected layers outperforms other configurations. A learning rate of \(10^{-3}\) was used for training with a maximum number of 6000 epochs. However, an early-stopping patience threshold of 128 epochs is employed for better convergence.

Training, testing, and validation

We use the T1DM dataset for training, validation, and testing purposes. This dataset contains 12 files for six data contributors. For each person, there are two files for training and testing data, respectively. We split the data in the training file into the training and validation dataset. The entire dataset in the testing file is used for testing purposes for each subject. We partition the training data file such that 80% of the data is used for training, and the rest of the data is used for validation. The purpose of the validation data is to provide an evaluation of the model after hyperparameter tuning.

In the training phase, the prediction from the model is used to determine the subsequent prediction curve at each epoch. It is possible to evaluate the model at a certain point in time if there are at least 24 prior data points are available (prior data points for 2 h or 120 min). This threshold is to assure that there are enough data points available for the model to make a feasible prediction. For the training process, we set the batch size to 128. However, we try with larger batch sizes than 128 and found that it makes the prediction accuracy worse in terms of RMSE. Finally, we train the two instances of the model separately with raw CGM data, and Kalman smoothed CGM data, respectively, along with other features including carbohydrates from the meal, bolus insulin, and cumulative step counts in a fixed time interval. In this study, we consider the previous 50 min of history for calculating the cumulative step numbers.

Development environment

We have implemented our proposed model with Python 2.7, whereas Tensorflow-gpu v1.8.0 has been used as a machine learning framework. For utilizing the GPU’s computing capability, we use CUDA 9.0 and cuDNN v7.3.0. The other libraries we include in this project are Keras (built-in with TensorFlow), SciPy, Pandas, NumPy, Matplotlib, and Seaborn. Moreover, We implement the preprocessing module (Kalman smoothing) in Matlab.

Evaluation criteria and results

In this section, we discuss the BG prediction result of our proposed deep RNN model for the T1DM-testing dataset. The performance of the model is compared based on the accuracy over the 30 and 60 min prediction horizon. The hyperparameters of the model are tuned extensively for optimal results.

Evaluation criteria

Initially, we investigate the usefulness of preprocessing the CGM values. We use Kalman smoothing for processing the CGM readings. We compare the deviation of raw CGM readings and preprocessed CGM values from the ground truth reference value. In our analysis, we assume the BG level measured by fingerstick as ground-truth. The glucose readings from the sensor sampling in interstitial fluid are substantially different from blood glucose values measured at the same time [39]. As a result, CGM manufacturers suggested that patients should use capillary blood glucose measurements before any treatment decisions. Moreover, the self-measurement of blood glucose (SMBG), has been used as a reference for different CGM systems accuracy comparison [54]. Fingerstick testing is one of the most convenient SMBG methods. These are the principal reasons behind our assumption of choosing the fingerstick reading as the ground truth for comparison between raw CGM value and preprocessed CGM values. We used root-mean-square error (RMSE) to calculate the difference between CGM values (raw CGM, preprocessed CGM) and fingerstick reference BG values at a particular time.

$$\begin{aligned} \textit{RMSE} = \sqrt{ \frac{1}{N} \sum _{i=1}^{N}\left( y_i - x_i\right) ^2} \end{aligned}$$

Here, y denotes the CGM values (either raw CGM values or smoothed CGM values), and x denotes the fingerstick reference BG values.

We find that preprocessed CGM values are much closer to fingerstick BG reading than raw CGM values. Table 3 illustrates the accuracy comparison, in terms of RMSE, between raw CGM and Kalman smoothed CGM values with respect to fingerstick BG readings. Table 3 shows that both of the CGM readings (measured from interstitial fluid) and pre-processed CGM values differ from finger stick reading (measure from blood glucose). However, most importantly, processed CGM values have a lower error than raw CGM values. Thus, we conclude that the model trained with Kalman smoothed CGM values along with other features, is more effective in forecasting the BG level.

Table 3 Comparisons of the accuracy (\(\textit{RMSE}\)) of raw CGM values and Kalman Smoothed CGM values with respect to self-measured fingerstick BG readings (ground truth) for each patient

However, the performance of a model can be evaluated with different criteria. Among those, the root-mean-square error (RMSE) between the reference CGM values and predicted BG level, is one of the most widely adopted methods to assess the BG prediction accuracy. Thus, we evaluate the performance of our model in terms of RMSE in this paper.

$$\begin{aligned} \textit{RMSE} = \sqrt{ \frac{1}{N} \sum _{i=1}^{N}\left( \hat{y}(i|i-PH)^2 - y(i)\right) ^2} \end{aligned}$$

where \({\hat{y}(i|i - PH)}\) denotes the model’s prediction results provided the previous data and y denotes the reference CGM reading, N is the number of data points.

Results for the dataset with unprocessed/raw CGM readings

In this section, we discuss the BG forecasting accuracy for the proposed stacked RNN model for the OhioT1DM dataset, where CGM readings are raw/unprocessed. Initially, we investigate the effect of the depth of LSTM layers of the network on the prediction accuracy of the model. In our experiment, we consider the prediction horizon of 30 and 60 min. The experimental results for models with single LSTM layers and stacked LSTM layers are summarized in Table 4. Results from the table demonstrate that the RNN model with stacked LSTM layered architecture performs better than RNN with a single LSTM layer for all of the cases.

Table 4 The table represents the prediction comparison of proposed models over 30 and 60 min of the prediction horizon

The proposed model’s (Stacked LSTM) predictive results for patient #570 and #575 over PH=30 min are illustrated in Fig. 4. The proposed model provides the lowest and the highest RMSE result for patient #570 and patient #575 respectively.

We also experiment with single GRU cell-based RNN for BG level prediction. However, both of the models with single and stacked LSTM cells provide better RMSE than the model with the GRU cell. The mean RMSE for the six patients’ BG prediction are 20.07 and 31.12 for the PH=30 and 60 min respectively.

Results for the dataset with Kalman smoothed CGM readings

In this section, the proposed model’s predictive accuracy with stacked LSTM layers is evaluated for the preprocessed testing dataset with Kalman smoothing technique described in “Kalman smoothing for preprocessing” section. Note that, In our study, only CGM values are preprocessed, and the rest of the features remain the same. Then we estimate the RMSE of the model’s prediction individually with preprocessed CGM values and raw CGM values, respectively, from the testing dataset as our goal is to lower the difference between the predicted CGM values and the real fingerstick blood glucose readings. The preprocessing techniques are described in “Feature Extraction from Physiological Information” section. Table 5 presents the forecasting accuracy (RMSE) of the model trained with the preprocessed (CGM values) dataset using the Kalman smoothing technique and original (raw CGM values) dataset.

Table 5 Accuracy (in terms of RMSE) of the final RNN model with stacked LSTM layers

Figure 5 demonstrates that preprocessing the CGM reading with Kalman smoothing, improves the prediction accuracy to a substantial extent.

Analysis and result comparison

Initially, we compare our work with different machine learning models for the OhiT1DM dataset. However, Xie et al. provide the BG prediction performance comparison of several traditional machine learning approaches for the OhiT1DM dataset in their work [55]. We take their experiment result for comparing our work. The results in Table 6 show that LSTM model outperforms the conventional machine learning model. Our proposed approach is more generalized as the prediction RMSE for all six patients is uniformly improved. As a consequence, we do not further experiment with traditional machine learning approaches.

Table 6 Performance comparisons of our work with traditional machine learning approaches, in terms of RMSE

From Table 4, it is evident that, with a wider prediction horizon, the forecasting model becomes more complicated. However, we observe that deep approaches with stacked LSTM layers provide advantages over the shallow model with a single LSTM layer in forecasting BG level, particularly for a higher prediction horizon. As a result, we choose the deeper model for the final experiment.

From Table 5, we can observe that prediction RMSE for each patient ranges from 15.94 to 20.94 and 4.73 to 8.54 for the model trained with raw CGM readings and processed CGM readings, respectively. The predictions for patients #575 and #591 are comparatively more inaccurate than other patients, whereas we achieved better RMSE for patient #570 and #588. There might be several reasons for the RMSE difference. The first reason is that #575 and #591 have a larger number of missing data. On the other hand, #570 has the least missing data among all of the patients’ test dataset. Another reason is that the fluctuation of CGM readings in the test dataset. It is noticeable that in the test dataset, #575 and #591 have substantially more fluctuation than #570 and #588. More specifically, the last portion of #575 and the first portion of #591 have more abrupt swings. Shown in Fig. 4, 12-h period BG level of #570 contains less fluctuation than #575. It is noteworthy that the prediction error is highest around the spikes and turning regions of the CGM trajectory. Furthermore, there is a marginal time delay in the prediction curve. This delay is responsible for the prediction error.

Such abrupt fluctuations in training and testing dataset make it difficult for the model to learn and predict the BG level accurately. One of the possible reasons behind such fluctuation is the complicated dynamic of the glucoregulatory system. Another possible reason is the sensor fault of the CGM system. We mitigate such undesirable errors by applying the Kalman smoothing technique on the dataset that makes CGM values less likely to have abrupt fluctuations. Subsequently, it boosts the learning capability of the model resulting in significantly better prediction accuracy.

Figure 5 illustrates how error correction of CGM reading, enhances the precision of the prediction curve for a particular 12-h time window for patient #563. Here we use patient #563 as the dataset for this patient has average fluctuation. Note that, the SD (light blue region) is remarkably less for the prediction made with the model trained with processed data. Hence, the model trained with the error corrected dataset (Fig. 5—Right), is capable of forecasting BG with more confidence than the model trained with the unprocessed dataset (Fig. 5—Left). Moreover, we notice an improvement in the time delay in the proposed model.

We provide a performance comparison of our work with related works for the OhioT1DM dataset in Table 7. For comparison, we consider previous works [24, 25, 56] those uses the OhioT1DM dataset for result evaluation. Table 7 demonstrates that our proposed model with KS provides the best accuracy for every patient among all other related works. Even without utilizing KS, we achieved the topmost accuracy for the four patients out of six. For patient #563 and #570, the work [24] has slightly better accuracy than ours. [25] and [24] use LSTM and dilated RNN for BG prediction respectively whereas [56] uses a convolutional neural network (CNN) trained with the OhioT1DM dataset for the BG level prediction. To the best of our knowledge, we achieved the leading average prediction accuracy for the OhioT1DM dataset.

Table 7 Prediction accuracy comparison, in terms of RMSE, between the proposed model and models from related works for the OhioT1DM dataset (PH = 30 min)


Our proposed method provides a notably precise prediction mechanism for the typical blood glucose range (\(<70\,\hbox{mg/dl}\) and \(> 180\,\hbox{mg/dl}\)). However, our elaborated experimental results suggest that prediction accuracy decreases slightly when blood glucose level values (Ground Truth) lie outside the ideal range. The primary reason for this issue might be the smoothing process. With the KS preprocessing technique, the spikes and extreme CGM readings get smoothed and become less sharp. The smoothed values are imperceptibly biased to the average BG value. Those smoothed values tend to lie in the normal range. After preprocessing, these smoothed values are used in training steps. It reduces the learning capability of the model marginally, particularly for extremely high or low BG values. Further study might be done to improve the prediction accuracy for extremely high or low blood glucose range.

Another point is that there are some missing data in the OhioT1Dm dataset. For the patient, 575 and 591 have 1309 and 782 missing data points, each. Continuous missing data has been observed between Dec 26 to Dec 28 and Dec 26 to Jan 5, respectively. Whereas patient 570 has only 649 missing data. However, these missing data might have an adverse effect on prediction. Moreover, the device type used for acquiring the patient’s data is fixed; that is the Medtronic Enlite CGM sensor. The slow sampling rate also limits the size of the dataset. In our future work, we would also consider a faster sampling rate for patient data acquisition. Also, the patient number and racial diversity are limited in the dataset. We have the plan to collect more patient-data with several devices from the broad and racially diverse patient-base and use those data to explore more generalized approaches for BG forecasting. Additionally, we plan to work on glycemic event detection, especially on hypoglycemic event detection. Besides, there might be scope for further experiments in the future to assess the potentiality of using the self-attention mechanism for BG prediction. Furthermore, the precise glucose level prediction might help to address different types of attacks on wireless insulin pumps [57,58,59,60].


This work investigated methods for more accurate blood glucose prediction. We demonstrated that preprocessing the CGM readings with Kalman smoothing for sensor error correction could be useful for improving the robustness of the BG prediction. We utilized different physiological information such as meal, insulin, aggregations of step count, and preprocessed CGM data in our method. We proposed a novel approach leveraging the stacked LSTM based deep RNN model to improve the BG prediction accuracy in this paper. It is evident from our study that preprocessing the CGM values with Kaman smoothing makes the BG prediction curve less uncertain and less fluctuating. Our proposed approach provides more reliable predictions than traditional methods while we assumed fingerstick BG readings as the ground truth in our experiment. We want to lower the difference between the predicted CGM values and the real fingerstick blood glucose readings. The BG prediction with Kalman smoothed CGM data is closer to the actual BG level (The Fingerstick BG reading) than without the Kalman filter. This more accurate prediction can aid diabetes patients to avoid adverse glycemic events. Our proposed methodologies could be employed to get insight into T1D patient’s future BG level trends that might result in a more dependable diabetes management system.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author [40] on request at the link



Adaptive moment estimation


Artificial neural networks


Autoregressive integrated moving average


Artificial pancreas


Blood glucose


Continuous glucose monitoring


Continuous subcutaneous insulin infusion


Convolutional neural network




Early warning system


Gated recurrent unit


Galvanic skin response


Interstitial glucose


Kalman smoothing


Long short-term memory


Negative log-likelihood


Pressure-induced sensor attenuation


Rectified linear unit


Root-mean-square error


Recurrent neural networks




prediction horizon


Self-measurement of blood glucose


Support vector regression


  1. Group N.D.D, et al. Classification and diagnosis of diabetes mellitus and other categories of glucose intolerance. Diabetes. 1979;28(12):1039–57.

    Article  Google Scholar 

  2. Association AD, et al. 2. Classification and diagnosis of diabetes. Diabetes Care. 2017;40(Supplement 1):11–24.

    Article  Google Scholar 

  3. Cryer PE, Davis SN, Shamoon H. Hypoglycemia in diabetes. Diabetes care. 2003;26(6):1902–12.

    Article  CAS  Google Scholar 

  4. Kotagal M, Symons RG, Hirsch IB, Umpierrez GE, Dellinger EP, Farrokhi ET, Flum DR. Perioperative hyperglycemia and risk of adverse events among patients with and without diabetes. Ann Surg. 2015;261(1):97.

    Article  Google Scholar 

  5. Facchinetti A, Favero S, Sparacino G, Cobelli C. An online failure detection method of the glucose sensor-insulin pump system: improved overnight safety of type-1 diabetic subjects. IEEE Trans Biomed Eng. 2012;60(2):406–16.

    Article  Google Scholar 

  6. Zavitsanou S, Mantalaris A, Georgiadis MC, Pistikopoulos EN. In silicoclosed-loop control validation studies for optimal insulin delivery in type 1 diabetes. IEEE Trans Biomed Eng. 2015;62(10):2369–78.

    Article  Google Scholar 

  7. Association AD, et al. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2010;33(Supplement 1):62–9.

    Article  Google Scholar 

  8. Vettoretti M, Facchinetti A, Sparacino G, Cobelli C. Type-1 diabetes patient decision simulator for in silico testing safety and effectiveness of insulin treatments. IEEE Trans Biomed Eng. 2017;65(6):1281–90.

    Article  Google Scholar 

  9. Bergenstal RM, Tamborlane WV, Ahmann A, Buse JB, Dailey G, Davis SN, Joyce C, Peoples T, Perkins BA, Welsh JB, et al. Effectiveness of sensor-augmented insulin-pump therapy in type 1 diabetes. N Engl J Med. 2010;363(4):311–20.

    Article  CAS  Google Scholar 

  10. Cobelli C, Renard E, Kovatchev B. Artificial pancreas: past, present, future. Diabetes. 2011;60(11):2672–82.

    Article  CAS  Google Scholar 

  11. Yang J, Li L, Shi Y, Xie X. An arima model with adaptive orders for predicting blood glucose concentrations and hypoglycemia. IEEE J Biomed Health Inform. 2018;23(3):1251–60.

    Article  Google Scholar 

  12. Georga EI, Protopappas VC, Ardigò D, Marina M, Zavaroni I, Polyzos D, Fotiadis DI. Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression. IEEE J Biomed Health Inform. 2012;17(1):71–81.

    Article  Google Scholar 

  13. Karegowda AG, Jayaram M, Manjunath A. Cascading k-means clustering and k-nearest neighbor classifier for categorization of diabetic patients. Int J Eng Adv Technol. 2012;1(3):147–51.

    Google Scholar 

  14. Bunescu R, Struble N, Marling C, Shubrook J, Schwartz F. Blood glucose level prediction using physiological models and support vector regression. In: 2013 12th International conference on machine learning and applications. IEEE. 2013;1:135–40.

  15. Georga EI, Protopappas VC, Fotiadis DI. Glucose prediction in type 1 and type 2 diabetic patients using data driven techniques. In: Knowledge-oriented applications in data mining; 2011. p. 277–96.

  16. Pérez-Gandía C, Facchinetti A, Sparacino G, Cobelli C, Gómez E, Rigla M, de Leiva A, Hernando M. Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technol Therap. 2010;12(1):81–8.

    Article  Google Scholar 

  17. Zecchin C, Facchinetti A, Sparacino G, De Nicolao G, Cobelli C. Neural network incorporating meal information improves accuracy of short-time prediction of glucose concentration. IEEE Trans Biomed Eng. 2012;59(6):1550–60.

    Article  Google Scholar 

  18. Pappada SM, Cameron BD, Rosman PM, Bourey RE, Papadimos TJ, Olorunto W, Borst MJ. Neural network-based real-time prediction of glucose in patients with insulin-dependent diabetes. Diabetes Technol Therap. 2011;13(2):135–41.

    Article  CAS  Google Scholar 

  19. Tresp V, Briegel T, Moody J. Neural-network models for the blood glucose metabolism of a diabetic. IEEE Trans Neural Netw. 1999;10(5):1204–13.

    Article  CAS  Google Scholar 

  20. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

    Article  Google Scholar 

  21. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics; 2010. p. 249–56.

  22. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Article  CAS  Google Scholar 

  23. Li K, Daniels J, Liu C, Herrero-Vinas P, Georgiou P. Convolutional recurrent neural networks for glucose prediction. IEEE J Biomed Health Inform. 2019

  24. Chen J, Li K, Herrero P, Zhu T, Georgiou P. Dilated recurrent neural network for short-time prediction of glucose concentration. In: KHD@ IJCAI; 2018. p. 69–73.

  25. Martinsson J, Schliep A, Eliasson B, Meijner C, Persson S, Mogren O. Automatic blood glucose prediction with confidence using recurrent neural networks. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ IJCAI-ECAI 2018, 13 July 2018; 2018. p. 64–8.

  26. Allam F, Nossai Z, Gomma H, Ibrahim I, Abdelsalam M. A recurrent neural network approach for predicting glucose concentration in type-1 diabetic patients. In: Lopez-Franco C, Alanis AY, Arana-Daniel N, editors. Engineering applications of neural networks. Berlin: Springer; 2011. p. 254–9.

    Chapter  Google Scholar 

  27. Fox I, Ang L, Jaiswal M, Pop-Busui R, Wiens J. Deep multi-output forecasting: learning to accurately predict blood glucose trajectories. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM; 2018. p. 1387–95

  28. Sun Q, Jankovic MV, Bally L, Mougiakakou SG. Predicting blood glucose with an lstm and bi-lstm based deep neural network. In: 2018 14th Symposium on neural networks and applications (NEUREL). IEEE; 2018. p. 1–5.

  29. Cameron F, Niemeyer G, Gundy-Burlet K, Buckingham B. Statistical hypoglycemia prediction. J Diabetes Sci Technol. 2008;2(4):612–21.

    Article  Google Scholar 

  30. Bayrak ES, Turksoy K, Cinar A, Quinn L, Littlejohn E, Rollins D. Hypoglycemia early alarm systems based on recursive autoregressive partial least squares models. J Diabetes Sci Technol. 2013;7(1):206–14.

    Article  Google Scholar 

  31. Turksoy K, Bayrak ES, Quinn L, Littlejohn E, Rollins D, Cinar A. Hypoglycemia early alarm systems based on multivariable models. Ind Eng Chem Res. 2013;52(35):12329–36.

    Article  CAS  Google Scholar 

  32. Daskalaki E, Nørgaard K, Züger T, Prountzou A, Diem P, Mougiakakou S. An early warning system for hypoglycemic/hyperglycemic events based on fusion of adaptive prediction models. J Diabetes Sci Technol. 2013;7(3):689–98.

    Article  Google Scholar 

  33. Eljil KS, Qadah G, Pasquier M. Predicting hypoglycemia in diabetic patients using time-sensitive artificial neural networks. Int J Healthc Inf Syst Inform (IJHISI). 2016;11(4):70–88.

    Article  Google Scholar 

  34. Bertachi A, Biagi L, Contreras I, Luo N, Vehí J. Prediction of blood glucose levels and nocturnal hypoglycemia using physiological models and artificial neural networks. In: KHD@ IJCAI; 2018. p. 85–90.

  35. Nguyen LL, Su S, Nguyen HT. Neural network approach for non-invasive detection of hyperglycemia using electrocardiographic signals. In: 2014 36th Annual international conference of the IEEE engineering in medicine and biology society. IEEE; 2014. p. 4475–8.

  36. Facchinetti A, Del Favero S, Sparacino G, Cobelli C. Modeling transient disconnections and compression artifacts of continuous glucose sensors. Diabetes Technol Therap. 2016;18(4):264–72.

    Article  CAS  Google Scholar 

  37. Mahmoudi Z, Boiroux D, Hagdrup M, Nørgaard K, Poulsen NK, Madsen H, Jørgensen JB. Application of the continuous-discrete extended kalman filter for fault detection in continuous glucose monitors for type 1 diabetes. In: 2016 European control conference (ECC). IEEE; 2016. p. 714–9.

  38. medtronic diabetes: why sensor glucose does not equal blood glucose. Accessed 28 April 2014.

  39. Kulcu E, Tamada JA, Reach G, Potts RO, Lesho MJ. Physiological differences between interstitial glucose and blood glucose measured in human subjects. Diabetes Care. 2003;26(8):2405–9.

    Article  CAS  Google Scholar 

  40. Marling C, Bunescu RC. The ohiot1dm dataset for blood glucose level prediction. In: KHD@ IJCAI; 2018. p. 60–3.

  41. Staal OM, Sælid S, Fougner A, Stavdahl Ø. Kalman smoothing for objective and automatic preprocessing of glucose data. IEEE J Biomed Health Inform. 2018;23(1):218–26.

    Article  Google Scholar 

  42. Kraegen E, Chisholm D, McNamara ME. Timing of insulin delivery with meals. Horm Metab Res. 1981;13(07):365–7.

    Article  CAS  Google Scholar 

  43. Wilinska ME, Chassin LJ, Schaller HC, Schaupp L, Pieber TR, Hovorka R. Insulin kinetics in type-1 diabetes: continuous and bolus delivery of rapid acting insulin. IEEE Trans Biomed Eng. 2004;52(1):3–12.

    Article  Google Scholar 

  44. Rauch HE, Tung F, Striebel CT. Maximum likelihood estimates of linear dynamic systems. AIAA J. 1965;3(8):1445–50.

    Article  Google Scholar 

  45. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertainty Fuzziness Knowl Syst. 1998;6(02):107–16.

    Article  Google Scholar 

  46. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

    Google Scholar 

  47. Graves A, Mohamed A-r, Hinton G. Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE; 2013. p. 6645–9.

  48. Hermans M, Schrauwen B. Training and analysing deep recurrent neural networks. In: Advances in neural information processing systems; 2013. p. 190–8.

  49. Pascanu R, Gulcehre C, Cho K, Bengio Y. How to construct deep recurrent neural networks 2013; arXiv preprint arXiv:1312.6026

  50. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in neural information processing systems; 2017. p. 5998–6008.

  51. Zobaed S, Haque ME, Rabby MF, Salehi MA. Senspick: sense picking for word sense disambiguation; 2021. arXiv preprint arXiv:2102.05260

  52. Mirshekarian S, Shen H, Bunescu R, Marling C. Lstms and neural attention models for blood glucose prediction: comparative experiments on real and synthetic data. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE; 2019. p. 706–12.

  53. Bengio Y, Simard P, Frasconi P, et al. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994;5(2):157–66.

    Article  CAS  Google Scholar 

  54. Luijf YM, Mader JK, Doll W, Pieber T, Farret A, Place J, Renard E, Bruttomesso D, Filippi A, Avogaro A, et al. Accuracy and reliability of continuous glucose monitoring systems: a head-to-head comparison. Diabetes Technol Therap. 2013;15(8):721–6.

    Article  Google Scholar 

  55. Xie J, Wang Q. Benchmark machine learning approaches with classical time series approaches on the blood glucose level prediction challenge. In: KHD@ IJCAI; 2018. p. 97–102.

  56. Zhu T, Li K, Herrero P, Chen J, Georgiou P. A deep learning algorithm for personalized blood glucose prediction. In: KHD@ IJCAI; 2018. p. 64–78.

  57. Hei X, Du X, Lin S, Lee I, Sokolsky O. Patient infusion pattern based access control schemes for wireless insulin pump system. IEEE Trans Parallel Distrib Syst. 2014;26(11):3108–21.

    Article  Google Scholar 

  58. Hei X, Du X, Lin S, Lee I. PIPAC: Patient infusion pattern based access control scheme for wireless insulin pump system. In: 2013 Proceedings IEEE INFOCOM 2013 Apr 14. IEEE. p. 3030–8.

  59. Hao B, Hei X, Tu Y, Du X, Wu J. Voiceprint-based access control for wireless insulin pump systems. In: 2018 IEEE 15th international conference on mobile ad hoc and sensor systems (MASS) 2018 Oct 9. IEEE. p. 245–53.

  60. Zhao J, Kong K, Hei X, Tu Y, Du X. A visible light channel based access control scheme for wireless insulin pump systems. In: 2018 IEEE international conference on communications (ICC) 2018 May 20. IEEE. p. 1–6.

Download references


This work is supported in part by US NSF under grants CNS-1812553, OIA-1946231. We uploaded the initial and unrevised version of this work on arXiv at the link:


Publication cost would be funded in part by the US NSF under Grants CNS-1812553, OIA-1946231.

Author information

Authors and Affiliations



MFR, IL, ASM, XH developed the study concept and designed model. YT collected OhioT1DM dataset. YT, MFR and XH conceptualize and implemented the feature extraction methodologies. MIH implemented the Kalman-smoother for the dataset. MFR and ASM designed and developed the RNN model for BG forecasting. YT and MIH carried out the experiments and analyzed results. MFR, YT and XH drafted the manuscript. All authors were involved in the revision of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Md Fazle Rabby.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rabby, M.F., Tu, Y., Hossen, M.I. et al. Stacked LSTM based deep recurrent neural network with kalman smoothing for blood glucose prediction. BMC Med Inform Decis Mak 21, 101 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: