Time series model for forecasting the number of new admission inpatients

Background Hospital crowding is a rising problem, effective predicting and detecting managment can helpful to reduce crowding. Our team has successfully proposed a hybrid model combining both the autoregressive integrated moving average (ARIMA) and the nonlinear autoregressive neural network (NARNN) models in the schistosomiasis and hand, foot, and mouth disease forecasting study. In this paper, our aim is to explore the application of the hybrid ARIMA-NARNN model to track the trends of the new admission inpatients, which provides a methodological basis for reducing crowding. Methods We used the single seasonal ARIMA (SARIMA), NARNN and the hybrid SARIMA-NARNN model to fit and forecast the monthly and daily number of new admission inpatients. The root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used to compare the forecasting performance among the three models. The modeling time range of monthly data included was from January 2010 to June 2016, July to October 2016 as the corresponding testing data set. The daily modeling data set was from January 4 to September 4, 2016, while the testing time range included was from September 5 to October 2, 2016. Results For the monthly data, the modeling RMSE and the testing RMSE, MAE and MAPE of SARIMA-NARNN model were less than those obtained from the single SARIMA or NARNN model, but the MAE and MAPE of modeling performance of SARIMA-NARNN model did not improve. For the daily data, all RMSE, MAE and MAPE of NARNN model were the lowest both in modeling stage and testing stage. Conclusions Hybrid model does not necessarily outperform its constituents’ performances. It is worth attempting to explore the reliable model to forecast the number of new admission inpatients from different data. Electronic supplementary material The online version of this article (10.1186/s12911-018-0616-8) contains supplementary material, which is available to authorized users.


Background
With an increasing global population and economy, the demand for healthcare continues to rise. Hospital crowding has become a major problem faced by large hospitals. Hospital adverse events increase with crowding, and have further effects on patient satisfaction, quality of nursing, treatment, wait time, and length of stay [1][2][3][4]. A vast literature about overcrowding focus on the outpatient wards [1,5] and emergence departments [4,6]. Overcrowding appearing in the inpatient wards should also be paid attention to. When no inpatient beds are available to admit new inpatients, overcrowding would occur. Often, inpatient beds may be scarce as a result of too many patients with non-urgent medical conditions seeking healthcare.
The prediction of admissions is one piece of larger equation in the using hospital census, patient acuity, disease burden, allocation of resources and general management to improve hospital performance and improve patient outcomes. Much of research on hospital management focuses on the emergence of demand predicting [7][8][9][10], forecasting of outpatient visits [11,12], inpatients discharge [13], and patient volume [14]. However, little published research is available regarding predicting the number of new admission inpatients. Monitoring and forecasting for new admission inpatients are important processes in making feasibility decisions for hospital resource management, reducing crowding, and improving the quality of medical care delivered.
Time series forecasting approaches have been adopted in other research fields, such as infectious disease [15][16][17][18], power and energy [19], finance and economy [20,21], traffic [22], environment [23], and hydrology [24]. Among these approaches, for problems involving linear time series forecasting, the autoregressive integrated moving average (ARIMA) model is linear in that predictions of the future values are constrained to be linear functions of past observations. However, the prediction accuracy of ARIMA model is restricted due to its inability to capture the nonlinear relationships of time series in the real world. For nonlinear problems, the artificial neural network (ANN) has enhanced forecasting accuracy due to its intrinsic properties that can approximate any sort of arbitrary nonlinear function [25]. More recently, hybrid forecasting models that combine the ARIMA and ANN models to handle linear and nonlinear relationships that exist in time series data have been extensively applied in many fields with high predictive performance [16,17,19,21,[26][27][28] . These previous studies remind us that the number of new admission inpatients as time series could also be predicted by hybrid models.
Our team has successfully applied the hybrid model with ARIMA and the nonlinear autoregressive neural network (NARNN) to the field of infectious diseases, for example forecasting the prevalence of schistosomiasis in humans in Qianjiang City and Yangxin City, China [17,28], and the incident cases of hand, foot, and mouth disease in Shenzhen, China [29]. Wu [16] also verified the feasibility of a hybrid ARIMA-NARNN model in forecasting the incidence of hemorrhagic fever with renal syndrome in Jiangsu Province, China. These literatures indicate combining both the ARIMA and NARNN models could improve the forecasting performance due to incorporate both the linear and nonlinear patterns found in the real world.
In this paper, we will explore whether the ARIMA-NARNN hybrid model is reliable for forecasting the number of new admission inpatients to a large hospital. Our aim is to forecast the monthly and daily new admission inpatients using time series models. This will enable hospitals to provide more efficient and better quality care to their patients.

Data sources
Our hospital, as a member of the first batch of public tertiary hospitals in Chongqing, China, is a large-scale comprehensive medical institution involves in medical care, education and scientific research. By now our hospital opens with 2628 inpatient beds, and there are almost 2,000,000 outpatients, 100,000 emergency admissions and 100,000 discharges during a year. Like most other tertiary hospitals in China, we are faced with the growing challenge of overcrowding. Between 2010 and 2015, the amount of outpatient-emergency patients, new admissions and surgeries increased by 96.75, 37.59, 37.13%, respectively. Although the largest increase was observed in the number of outpatient-emergency patients, allocation of hospital resourcesis also greatly effected by admitted patients. Therefore, we chose to focus on new admissions in this study.
To analyze the "day of the week" effect and the "month of the year" effect of new admission inpatients, we included data from two different time series: monthly data from January 2010 to October 2016 (82 months) and daily data from January 4 to October 2, 2016 (273 days). The data was obtained from the Hospital Information System (Additional file 1). The study was approved by the ethics committee of Daping Hospital of Third Military Medical University.

The SARIMA model construction
Taking into account the characteristics of seasonal fluctuation of new admission inpatients, the seasonal ARIMA (SARIMA) model was constructed. The SAR-IMA (p, d, q)(P, D, Q)s model is developed from the ARIMA model [15]. There are seven main parameters in the SARIMA model: the order of autoregressive (p) and seasonal autoregressive (P), the order of regular difference (d) and seasonal difference (D), and the order of moving average (q) and seasonal moving average (Q), and finally, the length of seasonal period(s). Stationarity is a necessary condition in building a SARIMA model and differencing is often used to stabilize the time series data. The main methods to check the stationarity of time series include the sequence trend diagram, autocorrelation function (ACF), partial autocorrelation function (PACF), augmented dickey-fuller (ADF) unit root test, phillips and perron (PP) test, nonparametric test and so on. In this study, the ACF, PACF plots, and ADF test were used to identify the stationarity of time series and the possible order of autoregression and moving average. The most suitable model was selected according to the akaike information criterion (AIC), schwarz bayesian criterion (SBC) and the Ljung-Box Q-test. Both monthly and daily seasonal periodicities were taken into account in this analysis. The two time series are nonstationary. Regular difference and seasonal difference are used to stabilize them. The new stationary series after difference are as the target sequence of the SARIMA model.
Before the modeling process, the time series were split into two sets each: one (modeling data set) was used to develop the models and the other (testing data set) to test the model. The modeling monthly set included data from January 2010 to June 2016 (1/2010-6/2016), while data from July to October 2016 (7/2016-10/2016) was used as the corresponding testing data set. The modeling daily data set was from January 4 to September 4, 2016 (1/4/2016-9/4/2016), while the testing data was collected within 1 week from September 5 to September 11, 2016(9/5/2016-9/11/2016) and four-weeks from September 5 to October 2, 2016 (9/5/2016-10/2/2016). The SARIMA model was developed with SAS Software version 9.4.

The NARNN model construction
The NARNN model is capable of predicting a simple time series given past values of the same time series, . NARNN incorporates a default two-layer FFBP with a sigmoid transfer function in the hidden layer, a linear transfer function in the output layer. The output of the NARNN, y(t), is fed back to the input of the network (through delays). The configuration is showed in Fig. 1. The NARNN model was performed with the Neural Network Toolbox in MATLAB version 7.11(R2010b). The following steps describe how to build the NARNN model.
Step 1: Inputted the target series to generate a command-line script.
Step 2: Used the default data division function type to divide the data randomly to three parts: the training subset (training the network), the validation subset (stopping training before over-fitting) and the testing subset (testing the network generalization). Set the ratios for training (80%), validation (10%), and testing (10%).
Step 3: Adjusted the arguments feedback delays and hidden units by trial and error. Set the hidden units (10~18) and feedback delays (4~10) depending on our experience with the amount of data. In total of 63 architectures were tested to obtain the optimal model according to the error autocorrelation plot, the time series response plot, the MSE and the correlation coefficient (R).
Step 4: According to the feedback delays, we inputted the targets of the closed loop network for multi-stepahead prediction.

The hybrid SARIMA-NARNN model construction
The hybrid SARIMA-NARNN model was developed in two stages. In the SARIMA model stage, the main goal was to extract the linear relationships between the original data. The SARIMA model was then used to generate the residuals. In the NARNN model stage, the chief aim was to model the nonlinear relationships that exist in the residuals. The eventual combined forecasting values of the time series were the sum of predictions from SARIMA model and adjusted residuals from NARNN model:ŷ t ¼L t þN T , whereŷ t was the predicted value by the SARIMA-NARNN model at time t,L t denoted the predicted value by the SARIMA model at time t, andN t denoted the residuals predicted by the NARNN model.

Performance statistic index
The modeling errors and testing errors were used to compare the fitness and prediction performance of the SARIMA, NARNN and SARIMA-NARNN models. The three indices: root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), were selected for evaluation of the errors. The formulas for calculation are defined as follows: Results

SARIMA model analysis
The monthly time series achieved stationary state after regular difference of 1 order, followed by seasonal difference of 1 order and length of seasonal period of 12.  Fig. 2 b, c, e, and f. Most of the correlations were at around zero within a 95% confidence interval, suggesting that the time series achieved stationarity.
Results of ADF test of MOS and DOS after difference was considered are shown in Table 1. All the P-values were less than 0.05 supporting the absence of unit root. This provided further confirmation that the difference in the series was stationary.
The autocorrelation of residuals is presented in Table 3. All the P-values were more than 0.05, showing that the residuals were all white noises, which indicated the information was extracted sufficiently.  All predicted values are available in the Additional file 2. We then computed the monthly residual series (MRS) and daily residual series (DRS), which were subsequently applied as the target series of the NARNN model.

NARNN model analysis
The optimal NARNN models we applied to forecast the MOS, MRS, DOS and DRS are shown in Table 4: target series MOS with hidden units 11 and delays 8, MRS with hidden units 16 and delays 6, DOS with hidden units 13 and delays 10, and DRS with hidden units 14 and delays 7. All MSE of the training, validation, and testing subsets were relatively small, and all the R values were greater than 0.8.
The error autocorrelation function plot of different target series are displayed in Fig. 3. The correlation coefficients for all the models, except for the one at zero lag, fell within the 95% confidence limits, demonstrating that the models were applicable. The time series response plots are displayed in Fig. 4, showing that the outputs were distributed evenly on both sides of the response curve and the errors were small in the training, testing, and validation subsets, indicating that the model reliably reflected the data. We observed that the predicted residuals from July to October 2016 were − 240. 47

SARIMA-NARNN model analysis
The monthly and daily values predicted by the SARIMA-NARNN model are shown in the Additional file 2. The point-to-point comparison between original observations and predicted values from the SARIMA, NARNN and SARIMA-NARNN models are shown in Fig. 5 and Fig. 6. The curve of the original observations and predicted series from the SARIMA-NARNN model was closer than those from the SARIMA and NARNN models (Fig. 5 a, b and c), indicating that the hybrid model was well fitted to the data of monthly new admission inpatients. However, among the three models, the predicted curve from the NARNN model was the closest to the original curve (Fig. 6 a, b and c), indicating that the NARNN model was appropriate for forecasting the daily new admission inpatients.

Comparing analysis
The differences in modeling errors and testing errors between the original observations and predicted values of monthly and daily new admission inpatients are presented in Table 5.
For the monthly data, the modeling RMSE and the testing RMSE, MAE and MAPE of the SARIMA-NARNN model were less than those obtained from the single SARIMA or NARNN model, but the MAE and MAPE were more than those obtained from NARNN model.
For the daily data, we calculated the testing errors of one-week and four-weeks. The NARNN model was the best with the lowest RMSE, MAE and MAPE in modeling stage and testing stage, indicating that the NARNN model was well fitted to the data of daily new admission inpatients.

Discussion
To our knowledge, this study was the first to develop and apply the time series models in admission patients research, with the specific purpose of forecasting the number of new admission inpatients trends and guiding management strategies. We sought to construct a single SARIMA model, a single NARNN model, and a hybrid SARIMA-NARNN model based on the monthly and daily data of an entire hospital. The NARNN model and    Fig. 5, the original new admission inpatients fluctuated every year based on the monthly data. However, an upward trend was observed overall. The result of the SARIMA model analysis incorporated a 12-step seasonal differencing operation. The monthly time series analysis supports a "month of the year" effect. The lowest numbers were observed in January or February each year, presumably due to the Spring Festival holiday. The numbers reached the maximum in March 2010, 2012, 2015 and 2016, and greater numbers in March compared to other months were also observed in other years, a phenomenon that could potentially be attributed to long holiday and seasonal replacement. Based on these findings, we suggest that hospital management should strategize and assign medical resources accordingly. The modeling RMSE, MAE, MAPE of the SARIMA-NARNN model decreased by 42.89, 47.85, 48.86% and the corresponding testing error decreased by 11.35, 20.25, 19.99%, respectively as compared to using the SARIMA model alone. When compared to the NARNN model, the modeling RMSE of the SARIMA-NARNN model decreased by 3.12%, and the testing RMSE, MAE, MAPE decreased by 57.35, 52.66, 52.11%, respectively. Interestingly, the modeling MAE and MAPE of the SARIMA-NARNN model increased by 28.47 and 27.26%, respectively. As mentioned in the article [30,31], the RMSE is not always a superior parameter over the MAE, a combination of metrics is often required to accurately evaluate model performance. However, all testing errors of the SARIMA-NARNN model were the lowest among the three models and overall, the predicted curves of the hybrid model was close to the original curves (Fig. 5 a, b and c). Therefore, we concluded that the hybrid model was the most appropriate for forecasting the monthly new admission inpatients.
As shown in Fig. 6a, b and c, our analysis of daily data indicates an obvious "day of the week" effect. Maximum values were usually observed on Mondays, while the minimum values tended to fall on Saturdays or Sundays every week. Some fluctuations were found under the influence of various festivals. For examples, the lowest number was observed during the 7th to the 13th of February likely due to the Spring Festival holiday and the one-week maximum was observed on Tuesday (3th of May) probably because this was the first day after the May Day holiday. In addition, the maximum value was also found on Sunday (18th of September) potentially due to the Mid Autumn Festival holidays from Thursday to Saturday prior. Forecasting performance could be greatly influenced by these fluctuations. If the time series predictions were within the range of these holidays, extra cautions should be paid on interpreting prediction results. As compared to using the SARIMA model alone, the modeling RMSE, MAE, and MAPE of the NARNN model decreased by 55. 28 According to the development trend of new admission inpatients, we can make some following suggestions for the hospital managers. Try to avoid the medical staff leave at the peak of admission; Carry out the repair work for the inpatient beds on Saturday or Sunday; Provide vacant beds by clinical departments with fewer admission inpatients to other departments with more admission inpatients. Set up some waiting beds for turnover in the whole hospital; Make an "emergence plan about overcrowding"-once overcrowding occur the "overcrowding Although the ARIMA model is one of the most mature time series forecasting methods, our study [17,28] and other studies [32] have indicated that its forecasting performance for predicting real world cases is slightly lower than other models. Therefore, we do not recommend using the ARIMA model exclusively. The NARNN model is capable of successfully simulating some time series due to its dynamic property, high fault tolerance performance, and ability to capture nonlinear information [25,33]. In practical data analysis, the NARNN model should be construct. In addition, our results were consistent with previous publication, which reported the comparative study of autoregressive neural network hybrids, showing that hybrid models are not always better and the model construction process should remain an important step despite the popularity of hybrid models [34]. The four-weeks testing errors were much greater than those of one-week, showing that the prediction accuracy was obviously reduced with the increase of forecasting time. It is the inherent disadvantages of the time series forecasting model-the forecasting ability to extrapolate is limited, the longer the forecasting time, the lower the prediction accuracy. Further studies are needed to develop synthetic approaches combining various types of models to improve the ability of forecasting the new admission inpatients from different data.
From a clinical perspective, our research shows that it is benefit to monitor the change trend of admission inpatients by adding time series model to the hospital information system. When the predicted new admission inpatients are increasing, hospital managers can open more preparation beds or let doctors reduce the admissions. From a methodology perspective, our research shows that the time series model can be applied to study the development trend of admission inpatients. NARNN model was implemented based on the neural network time series tool of MATLAB which provided a graphical   *MAPE values should be multiplied by 10 −2 . a the testing error in one-week, b the testing error in four-weeks environment to make the design process of model easy. Although many researches have indicated hybrid models could improve the forecasting performance, our results do not support this point. Understanding how and which models could be implemented in which data requires hospital managers prudent choice.

Conclusions
In summary, the SARIMA-NARNN model for forecasting did not always provide better estimates than the single NARNN model. Our results show that combined models do not necessarily outperform the individual constituents. Therefore, it is worth attempting to explore different reliable models with high degree of accuracy for forecasting the number of new admission inpatients using different data.