 Research article
 Open access
 Published:
Medical service demand forecasting using a hybrid model based on ARIMA and selfadaptive filtering method
BMC Medical Informatics and Decision Making volume 20, Article number: 237 (2020)
Abstract
Background
Accurate forecasting of medical service demand is beneficial for the reasonable healthcare resource planning and allocation. The daily outpatient volume is characterized by randomness, periodicity and trend, and the time series methods, like ARIMA are often used for shortterm outpatient visits forecasting. Therefore, to further enlarge the prediction horizon and improve the prediction accuracy, a hybrid prediction model integrating ARIMA and selfadaptive filtering method is proposed.
Methods
The ARIMA model is first used to identify the features like cyclicity and trend of the time series data and to estimate the model parameters. The parameters are then adjusted by the steepest descent algorithm in the adaptive filtering method to reduce the prediction error. The hybrid model is validated and compared with traditional ARIMA by several test sets from the Time Series Data Library (TSDL), a weekly emergency department (ED) visit case from literature study, and the real cases of prenatal examinations and Bultrasounds in a maternal and child health care center (MCHCC) in Ningbo.
Results
For TSDL cases the prediction accuracy of the hybrid prediction is improved by 80–99% compared with the ARIMA model. For the weekly ED visit case, the forecasting results of the hybrid model are better than those of both traditional ARIMA and ANN model, and similar to the ANN combined data decomposition model mentioned in the literature. For the actual data of MCHCC in Ningbo, the MAPE predicted by the ARIMA model in the two departments was 18.53 and 27.69%, respectively, and the hybrid models were 2.79 and 1.25%, respectively.
Conclusions
The hybrid prediction model outperforms the traditional ARIMA model in both accurate predicting result with smaller average relative error and the applicability for shortterm and mediumterm prediction.
Background
Public healthcare agencies always find themselves in challenging timely and qualified delivery of medical services. The medical resources allocation is increasingly concerned by the management of healthcare service provider since it is directly related to the timely delivery of medical services. The reasonable allocation of medical resources is a scientific decision considering the changes in medical service needs arising in regional population. Good understanding on the medical service demand not only calls for analysis on the current and historical amount of the medical treatment delivered, but also relies on accurate predicting of the trend in the near future. Such trends provide invaluable information for needs assessment, resource planning, facilities evaluation and policy formulations. Therefore, a reliable health demand forecasting (e.g. the outpatient visits in different departments of a hospital) can create alerts for the management of patients’ overflows and scientifically allocate critical medical resources so as to reduce the costs in supplies and staff redundancy.
At present, compared with hospitals in western countries, most of large general hospitals and specialist hospitals in China are still operated in a kind of “walkin” outpatient instead of “booking” outpatient. Such “walkin” outpatient leads to a serious “overcrowd phenomenon” which is the main contribution factor to the patients complain on public healthcare service. Both the increasing number of outpatient visits and crowding make a negative impact on quality of medical service. To solve such problem, in long term, China strongly encourages patients to receive normal medical service in different levels of public healthcare institutions like community health center, subdistrict/district hospital and to book for special medical treatment in large general hospital by referring. The other more direct (shortterm) solution is to configure the healthcare service provider’s resource matching its real demand. Using maternity and child healthcare service as example, if medical service demand can be clearly known in advance, some routine checks can be assigned to districtlevel maternal and child health care center (MCHCC) instead of going to large hospitals for examination. Then the resources in districtlevel MCHCCs can be fully utilized and at the same time the overcrowded phenomenon in large hospital can be alienated as well.
The current challenge is to forecast the individual medical service demand trend based on the time series historical data so that different levels public healthcare institutions could arrange resources and prepare in advance. The accurate healthcare forecasting is important to improve the management level of medical institutions especially for those hospitals with mainly “walkin” outpatient.
Healthcare forecasting is about predicting future health services, healthcare needs and rates of utilization of services based on a foreknowledge acquired through a systematic process. In this work, it is focused on the medical service demand forecasting. In the field of medical service demand prediction, several previous studies were published in the prediction of daily emergency department (ED) visits [1,2,3,4,5], and the incidence of infectious diseases. Most published literatures using time series were developed for shortterm forecasting. The time series analysis is the best tool for forecast the trend and is find out a pattern in the historical data and then extrapolate the pattern into the future. The predicting accuracy for available time series models is normally acceptable, when there exist wide variation and large fluctuation or extreme data, the predicting error is not satisfied. The purpose of this paper is to propose a hybrid forecasting method which integrates two traditional approaches to obtain a more reliable forecast for medical service demand forecasting. The hybrid forecasting model is applied to a districtlevel MCHCC to predict the daily outpatient visits in the two main departments of prenatal examination and Bultrasound examination. The remainder of this paper is organized as follows: The related literature researches are summarized in section 2. Traditional ARIMA, selfadaptive method and the hybrid forcasting model are described in detail in section 3. The verification of the hybrid model by several test cases from both time series data library and literature works are implemented in section 4. Section 5 explores the application in forecasting the outpatients visit in the prenatal and Bultrasound examinations. Finally section 6 summarizes the main conclusions and future prospects of this article.
Literature
For the past few years, there has been increasing attention on the time series models to predict medical services demand. Among those studies, two categories of the prediction methods are commonly used, the statisticbased and the AIbased. The former statisticbased models include autoregressive integrated moving average (ARIMA) models [5,6,7,8,9,10,11], moving average (MA) method and exponential smoothing (ES) method [3, 12, 13]. The latter AIbased models include artificial neural networks (ANN) [14,15,16,17] and support vector machines (SVM) [18].
In the study of statisticbased prediction methods, time series models are widely used since time series patterns [4, 5] can better capture shortterm fluctuations [6, 19]. ARIMA has advantage in systematic investigation of time series data to obtain meaningful statistical and mathematical interpretations of the series. Hence it is the most popular and direct prediction methods using time series data in healthcare field. The main ARIMA forecasting application involves outpatient visits [4, 10, 11, 20], ED visits [3, 5, 9, 12, 13, 21, 22], hospital discharge [2], etc. Schweigler et al. [9] studied the shortterm prediction accuracy of the emergency room hospitalization rate using ARIMA and the traditional historical average model. Han et al. [10] used the ARIMA model to predict the monthly outpatient visits of a hospital in Sichuan. Some researchers also used the ARIMA model to forecast the incidence of infectious diseases [7, 8, 23,24,25]. Wang et al. [7] used ARIMA model to predict the incidence of hepatitis B; Peng et al. [24] established a seasonal ARIMA model based on historical data of in Jiangsu Province to control the development trend of the epidemic. Han et al. [25] developed and validated a predictive model for outbreaks of respiratory infectious diseases through the ARIMA model, and obtained a comprehensive monitoring and prediction model based on the number of emergency visits.
Recent efforts of ARIMA in time series forecasting research are focused on the more accurate approach improving the traditional ARIMA or integrating with several other models or data preprocess technique. Kadri et al. [26] proposed a statistical method based on the vector autoregressive moving average (VARMA) model and a multivariate time series prediction model for shortterm prediction of daily attendance of ED. Lim et al. [27] used the ARIMA model with multiple linear regression model with ARIMA errors, with or without the inclusion of influenza predictors to predict the number of emergency department (ED) admissions in Singapore due to pneumonia. The data results show that the MAPE of the two multiple linear regression models with ARIMA error are less than 10%. Luo et al. [11] proposed a new prediction model based on a seasonal ARIMA model and a single exponential smoothing model, considering the periodicity and the effect of the day of the week on the daily outpatient diagnosis of the hospital. The model has more better predict performance than a single model. Aghelpour et al. [28] compared the SARIMA model with the SVR and its combined model. The SARIMA model is currently used in multiple forecasting fields and is a time series that can describe unstable behavior in different seasons.
Since some complex time series contain nonlinear components, many researchers introduced artificial intelligence to time series medical prediction, e.g. artificial neural networks (ANN) [14,15,16,17], support vector machines (SVM) [18, 29], etc. Gul and Guneri [14] predicted patient length of stay (LOS) based on patient age, sex, mode of arrival, treatment unit, medical tests and inspection in the ED using ANN. Yousefi and Ferreira [15] provided a ANNbased forecasting tool in order to predict the number of visitors in an emergency department in Hospital Risoleta Tolentino Neves. Both feed forward neural network (FWNN) and recurrent neural network (RNN) presented reasonable predictions of the ED visits in a oneweekahead time horizon, and RNN slightly outperform the FWNN for this task. Diao et al. [18] established SVM to predict the incidence rate by extracting the characteristics of viral hepatitis data and reported that the time series data of viral hepatitis incidence rate is more complicated, and the fitting effect of SVM model is better than ARIMA model. Furthermore, several researches mentioned that due to the complexity of some time series data consisting of linear and nonlinear modes, it may be difficult to obtain higher prediction accuracy using only linear models or neural network models. The hybrid model or the integration with data decomposition emerged in time series forecasting problem recently. Purwanto et al. [30] present a dual hybrid forecasting model based on a combination of linear regression, neural network and fuzzy models to finally yield a qualitative output for decision making in healthcare management. Khaldi et al. [31] studied the effect of multistep prediction strategies on the performance of long and shortterm recursive neural network models (SRN, LSTM, and GRU), and finally proposed corresponding strategies for the three models. Bento et al. [32] proposed a novel Batinspired hybrid method integrating bat algorithm and scaled conjugate gradient algorithm to improve the learning ability of neural networks. Khaldi et al. [16] studied the artificial neural network (ANN) combined with a signal decomposition technique to predict the weekly emergency department arrivals in hospitals. The time series is decomposed into several subsignals, and each subsignal is modeled using a different ANN model. They proved that data decomposition is a powerful tool for data preprocessing, which can improve the generalization ability of ANN while reducing the problem of overfitting. Huang et al. [17] used a hybrid method of empirical mode decomposition and backpropagation artificial neural network optimized with particle swarm optimization to predict outpatients.
As we known, a prediction model can be qualified as good model not only because of its high prediction accuracy but also on good understanding of the data features. Moreover, the model should also be implemented easily even for complicated time series. In this work, we attempt to simplify the outpatient visits prediction problem and improve the forecasting accuracy by capturing the intrinsic fluctuating characteristics of the hospital’s daily outpatient visits.
Hospital’s daily outpatient visit forecasting is a typical time series prediction problem. The randomness and periodic fluctuation characteristics have the greatest impact on the prediction accuracy. Since the ARIMA model has advantages in its simplicity and can directly use only endogenous time series data as input, it is widely used in time series forecasting. This article focuses on outpatient visit forecasting problem in hospitals with mostly “walkin” outpatient and aims to develop an efficient method that can predict trends in the number of daily visits. In order to achieve more accurate prediction results, we first apply the ARIMA model to identify the periodicity and autocorrelation in time series data. The prediction of time series by a traditional ARIMA model is only applicable to shortterm predictions (in the next few days), and the prediction accuracy is low for medium and longterm predictions (more than one month). The adaptive filtering method is further introduced to adjust the parameters in the ARIMA model so as to compensate for the shortcomings of low accuracy for longterm prediction in the traditional ARIMA model. Finally, test cases in benchmark library and literature are used to validate the proposed hybrid model. The hybrid model is applied to predict the outpatients visit in the prenatal and Bultrasound examinations from January 2017 to March 2018 in Ningbo Yinzhou District Maternal and Child Health Care Hospital.
Methods
The basics of the traditional ARIMA and the adaptive filtering is presented and is discussed each advantage and disadvantage, and a hybrid model integrating both method is proposed with detail flow and verification in this section.
ARIMA model
ARIMA model is generally denoted ARIMA (p, d, q) and is a systematic approach for predictive modeling of both stationary time series and nonstationary timeseries data. For nonstationary time series data, its mean and variance are unstable, and it is generally converted to the stationary time series first by differential operation, and then the stationary time series data is used to establish the ARMA model. The general expression of ARIMA model is:
The value of time series at time t is a multiple linear function of the historical data of the previous p period (x_{t − 1}, x_{t − 2}, …, x_{t − p}) and the prediction error of the previous q period (ε_{t − 1}, ε_{t − 2}, …, ε_{t − q}). The error term ε_{t} are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.
The modeling process is divided into four steps:
Step 1: Stationarizing the Time Series: The method of stationarizing is to difference time series data, where d is the degree of differencing. Data preprocessing is required for time series, that is, stationary and randomness tests. There are two methods for checking the stationary of time series. One is the observation method, which is based on the characteristics of the time series graph and the autocorrelation graph. The other is the method of constructing statistics test, so called unit root test. Pure randomness test, also known as white noise test, is usually tested Q statistics and LB statistics.
Step 2: ARIMA model identification: It is to determine the order of model, that is to determine the order of the autoregressive part (p), the order of difference (d), and the order of the moving average part (q). The d is already determined in step 1, while p and q are determined according to nature of the autocorrelation function (ACF) and partial autocorrelation function (PACF). In case of several (p,q) existing, AIC (Akaike Information Criterion) or BIC (Schwartz Bayesian Criterion) for different (p, q) are calculated by eq. (2) (3) and the (p, q) with smallest AIC or BIC value is determined to be the final order parameter.
Where n is the sample size, σ^{2} is the sum of the squared residuals, {p, d, q} are model parameters.
Step 3: Estimation of coefficients and validation: Estimation of coefficients is performed using the least square method or the maximal fitness method. Generally, the rationality of the model is to test the standard model fitting residuals. If the fitting residuals of the model satisfy the normal distribution with zero as the mean and the autocorrelation coefficient for any lag order residuals is zero, the model is regarded as the optimal model for time series. If the model test is unreasonable, return to step 2 to reidentify the model.
Step 4: Application of the model: The final ARIMA model is ready to make prediction on future time points by rolling onestep forecasting.
Selfadaptive filtering method
The Selfadaptive filtering method is based on a weighted average of historical time series observations. If the weights are all equal, the adaptive filtering method is the moving average method. According to the mathematical optimization principle, the weights in the moving average model are adjusted to reduce the prediction error [33]. This method is more common in economic fields and engineering testing applications [34], such as stock trend, futures market forecast [35], but rarely used in the healthcare field. Its general expression is:
Where \( {\overline{x}}_{t+1} \) is the predicted value of the t + 1 period of the original data of the time series, w_{i} is the weight of the t − i + 1 period data, x_{t − i + 1} is the observation value of the t − i + 1 period of the time series data, and N is the number of valuable previous period of the time series for prediction. The core idea is the determination of its weights by adaptive error tracking and immediate compensating. The weights are kept adjusting according to the prediction error feedback in each iteration of all training data until a set of “best” weights is found when the error is converged.
The detail process is as follows:
Step1: Determine N and the initial weights.
Usually N is set as the cycle length if the time series has cyclic feature, or is determined by autocorrelation analysis if no obvious cycle existing. A set of initial weights w_{i}(i = 1, 2, …, N) is set equally as 1/N, that is, the basic moving average;
Step 2: Predicting and error tracking.
The predicted value \( {\overline{x}}_{t+1} \) is calculated according to the formula (4) and the error is calculated between the actual value and the predicted value:
Step 3: Adjusting weights in iteration.
The weight is adjusted iteratively according to the error e_{i + 1}, the observation value and a learning constant k in order to compensate the prediction error. The iterative formula of weight is:
The formula comes from the steepest descent method approximation. According to the principle of optimization, it takes the minimum standard deviation of prediction as the objective function. According to the literature [8], the sufficient condition for the convergence of (6) is:
The denominator represents the mean square sum of n observations in historical time series data. A good k value not only reduces the number of iterations, but also ensures the error is minimized.
The weights are adjusted until a set of “best” weights are found to minimize the error. Assume total training time series data is M, such weights adjusting repeat (MN) times in one round of iteration. Then the last set of weights obtained in this iteration is set as the initial weight of the next iteration. The iteration round is stopped until the error is converged, and then the set of “best” weight is obtained.
Step 4: Predicting by using the best weight.
Such “best” weights will be used for forecasting according to the formula (4).
There are two advantages of adaptive filtering. First, it is simple and the number of “weights” and the learning constant k can be selected according to the needs of the researcher to control forecasting. Secondly, the method uses the observation of all series to find the “best” weight, and keep updating the weight with the change of historical data, making the prediction more accurate. Moreover, the “weight” in the adaptive filtering method is arbitrary and without any constraints, that is, the sum of the adjusted weights is not necessary to be equal to 1 and even can be negative.
ARIMAselfadaptive filtering hybrid forecasting model
Traditional ARIMA modeling normally has good accuracy only for shortterm prediction but the prediction error increases with the increasing of prediction horizon. However, the adaptive filtering is just perfect to reduce prediction errors by iteratively adjusting the “weight”. Hence, the integration of these two methods can make the shortterm prediction even more accurate than the traditional ARIMA model and simultaneously keep the good prediction accuracy even when the prediction horizon increases.
One of the steps in the traditional ARIMA modeling process is the estimation of model parameters, i.e. φ_{1}, φ_{2}, …φ_{p}, θ_{1}, θ_{2}, …, θ_{q} in eq. (1). Since eq. (1) can be viewed as the weighted polynomial, φ_{1}, φ_{2}, …φ_{p}, θ_{1}, θ_{2}, …, θ_{q} turn to be the “weights” in the adaptive filtering method. After the third step of ARIMA modeling, we use the estimated parameter φ_{1}, φ_{2}, …φ_{p}, θ_{1}, θ_{2}, …, θ_{q} as the initial weight and use selfadaptive filtering idea to adjust the weight parameters so that the prediction error is reduced as much as possible. Finally, a set of “best” parameters is fitted back to the ARIMA model for prediction. The model is implemented as shown in Fig. 1 and the convergence is measured by the minimum absolute error (MAE) of the ARIMA prediction results. Figure 1 gives the flow chart of this hybrid forecasting method. The specific modeling steps are as follows:
Step 1: First, the traditional ARIMA model is determined based on the obtained time series observations, involving stationary checking, ACF and PACF calculating, and parameters estimation, and obtain the ARIMA(p,d,q) and the initial estimated parameters φ_{1}, φ_{2}, …φ_{p}, θ_{1}, θ_{2}, …, θ_{q}.
Step 2: Use the ARIMA model for prediction, and calculate the absolute error e_{t} of the predicted value, and determine the MAE of all predicted values in one iteration round for checking the error convergence.
Where: e_{t} is the prediction error at time t, x_{t} is the actual observed value at time t, \( {\overset{\frown }{x}}_t \) is the predicted value.
Step 3: Start adaptive filtering to keep adjusting the parameters and iteratively calculate the “best” parameters. The condition for stepping out of the iteration round loop is either the error is converged or enough iteration rounds have finished.
Step 4: Return the “best” parameters to the ARIMA model for model prediction.
Results
Verification of the hybrid model
In order to validate the proposed hybrid forecasting method, test cases from benchmark library and literature are used. We selected different time series data from The Time Series Data Library (TSDL) (https://datamarket.com/data/list/?q=provider:tsdl) created by Monash University in Australia for model verification. The selected test cases cover time series data of stationary, with uptrend or downtrend, with both periodic and trend, etc. and the prediction results of the hybrid model are compared with the traditional ARIMA model to evaluate prediction accuracy and applicable range. Meanwhile, we referred weekly ED visits data in Khaldi et al. [16], and applied hybrid forecasting model for prediction. We compare the prediction results with the ARIMA, ANN, and ANN with data decomposition in the paper.
TSDL case verification
A total of 13 sets of time series data of different sizes (Table 1) selected in TSDL are classified into four categories: Stationary time series; with both periodicity and trend; rising trend; the downward trend. For the time series with both periodic and trend features, two types are further selected. One is that the amplitude of data fluctuations in a period is constant over time (e. g Wisconsin employment time series shown in Fig. 2). The other is that the fluctuations of the data vary over time (e. g, Monthly production of Gas in Australia shown in Fig. 2). For time series with only rising or falling trend, we also further choose two types of data, one is without fluctuation in the rising or falling trend of the data or the fluctuation amplitude is negligible. The other type is with fluctuations in the data upward or downward trend.
The time series historical data of each case is divided into two parts, the observation set and the verification set. The observation set is used to predict the latter data and compare the predicted value with the true value of the verification set. The traditional ARIMA model and the hybrid model are used to predict the 13 sets of data, and the relative prediction errors (PE) of both models are calculated. Several measurements including the maximum, minimum and mean absolute percentage error (MAPE) in the PE, and the value of the standard deviation of the PE are summarized in Table 1. The basic ARIMA model (p,d,q), the additional selfadaptive iteration and its computation time are also given in Table 1.
The comparison results of the 13 sets of time series cases from TSDL are summarized in Table 1. It can be seen that in the shortterm prediction, the PE obtained by the ARIMA model in the stationary time series is larger. The MAPE of the ARIMA model for predicting stationary time series is generally between 10 and 30%. For the nonstationary time series, the prediction accuracy of the ARIMA model is higher, and the predicted MAPE is between 1 and 5%. The proposed hybrid prediction model has a much better prediction accuracy of over 97% or even approaching to actual value (such as examples 3, 4, and 8) for forecasting time series with periodicity and trend,. As to the standard deviation of relative error, we find that when the amount of time series sample data is small (such as case 10 and case 13), the σ_{PE} value of the hybrid prediction model is larger than that of the ARIMA model. For the other cases (with larger data volumes), the hybrid prediction model all gives a smaller σ_{PE} than the ARIMA model.
The additional computation time mainly depends on the iterations in searching optimized parameters. Therefore, computation efficiency is different case by case in those 13 test cases. The iterations and the additional computation time are provided in the Table 1. The additional computations in those 13 cases range from few seconds to 13 s, depending on the number of convergence iterations in finding the optimal parameters.
Literature test case comparison
In this section, the hybrid model was applied to a literature test case by Khaldi et al. [16] for further verification and comparison. The time series data of weekly emergency department visits in the university hospital Hassan II of Fez city of Morocco from January 2010 to December 2016 were used as input to the forecasting models. The hybrid model is compared with the traditional ARIMA, ANN (Feedforward neural network), ANN combined with data decomposition technique called Global Empirical Modal Decomposition (EEMDANN), and ANN with Discrete wavelet Transform (DWT) decomposition (DWTANN), the latter three forecasting models were proposed in their work. The total test data has 364 weekly ED visits data, and the former 80% (in total 291 data) is set as training set and the latter 20% (in total 73 data) is used for testing set according to literature work [16]. The performance metrics of root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (R) are used to evaluate the forecasting models.
The detail process can be referred to section 3.1 and 3.3.
(1) The time series stationarizing is verified by ADF test and the pprobability value is obtained as 0.5901, which is larger than 0.05 and indicates the nonstationary of the initial time series. After one order of differencing, pprobability value is 0.001 < 0.05 and the time series after differencing can be regarded as stationary, hence d = 1. And the p value in white noise test is 1.7261e05, which indicates the nonwhile noise series.
(2) ARIMA model identification is to determine the order of AR(p) and MA(q) according to BIC by eq. (3). The calculated BIC matrix is as following.
The minimal BIC is 3575 with p = 1 and q = 1, and the ARIMA (1, 1, 1) is identified.
(3) Parameters estimation is done by using leastsquare estimates of coefficients and the estimated parameters are shown in Table 2.
(4) The estimated parameters are set as the initial weights in selfadaptive filter method, and are further adjusted iteratively according to eq. (6), where the learning rate k is set as \( k\le \frac{1}{\underset{n}{\max}\left\{\sum \limits_{i=1}^n{x}_i^2\right\}} \) =9.147582522628832e08. The selfadaptive adjustment stops until the error converged shown in Fig. 3. It takes additional 1678 iterations (6 rounds of iterations) to obtain the optimal parameters shown in Table 3.
(5) Finally, the final model is used for prediction. The forecasting results are shown in Fig. 4.
The comparison results of hybrid model with other four forecasting methods in Khaldi et al. work [16] are shown in Table 4. As indicated in Table 4, compared with the literature, the proposed hybrid model in MAE outperforms traditional ARIMA 257%, ANN 133%, DWTANN 4% and achieves approximately similar accuracy as EEMDANN.
The practical application in medical service demand forecasting
Preliminary analysis and preprocessing of data
Daily visits data of the prenatal examination department and the Bultrasound examination department of a Maternity and Child Health Care Hospital (MCHCH) in Ningbo from January 1, 2017 to March 30, 2018 were collected as the forecasting case data. From January 1, 2017 to March 30, 2018, the total number of prenatal examination visits (PEV) in MCCH was 369. Due to the lack of Bultrasound examination at weekends and holidays, the total number of the Bultrasound examination visitors (BUEV) was 310. The time series figure is shown in Fig. 5. We consider the particulars of the weekend and remove the weekend data from the raw data. After the preprocessing, the total number of PEV data and BUEV data is 309.
According to Figs. 5, PEV and BUEV fluctuate greatly, especially on weekends, National Days, Spring Festivals and other holidays. Due to long public holidays during the Spring Festival and National Day, the number of visits was recorded the lowest in February and October. In addition, both two time series data show periodic changes within one week, and there are great differences between different time points in the same cycle. From the time series of data, both PEV and BUEV have no obvious trend. These two time series data belong to time series data containing only periodic features. We removed the data for the first week, and the data was used to build the model, with 283 observations (from January 9, 2017 to on February 28, 2018, N = 283, 93% of all data) was the training set. The remaining 22 observations were used to verify the predictive value of the model (T = 22, 7% of all data, from March 1, 2018 to March 30, 2018).
Forecasting by ARIMA model and hybrid model
Both forecasting models are implemented in the MATLAB2014a environment.
Model fitting and parameter estimation
The time series of PEV and BUEV are stationary sequences and display strong cyclicity feature over the period of a week.
The PEV and BUEV time series data are first undergo the ADF test. From Table 5 the tstatistics and pprobability values of the PEV time series in the ADF test are − 2.2588 and 0.023482, respectively. While the tstatistic value and the p value of BUEV are − 2.6585 and 0.0082611. The tstatistics value less than 1% and pvalue less than 0.05 means the time series stationary. Therefore, both the PEV series and BUEV series are stationary time series and are not necessary for differential operation. Secondly, the white noise tests for PEV and BUEV were performed as well. The results in Table 6 indicate the pvalue of the two time series outputs is 0, which means both time series are stationary nonwhite noise series. Both series did not require to be difference, hence d = 0.
The p and q order of the ARIMA model is determined according to the BIC value calculated by eq. (3).
All combinations of ARIMA (p, q) when p and q are both less than log (length (data)) = 6 (where length (data) is the length of the total data amount of the time series) are calculated and the BIC values are shown in BIC matrix, where row represents p value from 0 to 6, while column represents q value from 0 to 6. The PEV BIC matrix and the BUEV BIC matrix are shown in (10, 11). As indicated in these two matrix, the minimum PEV BIC value is 2804.5322 with p = 1 and q = 1, and the minimum BUEV BIC value is 2109.1708 with p = 1 and q = 1. Hence, the fitted models of PEV and BUEV are both ARIMA (1, 0, 1) and the estimated parameters using leastsquare estimates of coefficients for the two models are shown in Table 7. The ARIMA models of PEV and BUEV have p values greater than 0.05 according to the residual analysis, which indicate the reliability of both models.
Model parameter adjustment
The formula (6) in the adaptive filtering method is applied to adjust the parameters. The error converges after 576 iterations in PEV and 286 iterations in BUEV. Table 8 shows the comparison before and after the adjustment of the model parameters.
Model forecasting
Finally we applied the fitted two models to predict the PEV and BUEV time series and obtained a 22day predicted value (Figs. 6).
As shown in Table 9, the predicted values of the ARIMA model and the hybrid prediction model are compared. Similarly, we compare the PE_{max}, PE_{min}, MAPE and σ_{PE} values of the predicted results. In order to more intuitively represent the difference between the predicted value and the actual observed value, Fig. 6 respectively show the fitting effect of the predicted values of PEV and BUEV.
As mentioned above, in the two case studies, the analysis results show that the hybrid model has better prediction performance, and the mean value of the variance of the relative error and relative error is smaller.
Discussion
An efficient forecasting model has to have a tradeoff between prediction accuracy and model complexity (number of parameters). In this paper, we choose the basic ARIMA model to identify the features of the time series, describe the autocorrelation, trend and periodicity from the time series, and establish estimate parameters. Then the parameters of the model are further adjusted by the steepest descent principle in the adaptive filtering method to keep tracing and immediately feedback to compensate error. Although it takes additional time to optimize ARIMA parameters, the forecasting accuracy is greatly increased because of such selfadaptive adjustment iteration. The cost of the additional computation is case by case.
As indicated in section 4.1 and 4.2, for different test cases in TDSL benchmark library, the literature test case, and the practical application in outpatient visits in MCHCH in Ningbo China, the proposed hybrid forecasting method outperforms several other methods. The results show that the hybrid model has shown obvious improvement in forecasting accuracy compared with the traditional ARIMA model case by case. Furthermore, when dealing with the time series display strong cyclicity feature over the period, our hybrid method even reports better performance. The traditional ARIMA model predicts 18.53% MAPE value of PEV, while the hybrid model is 2.79%. For the BBUEV time series, the MAPE value of the ARIMA model is 27.69%, and the hybrid model is 1.25%. It can be seen from Figs. 6 that the prediction results of the ARIMA model change relatively smoothly, and the trend of data changes is not well shown. Referring to Fig. 2, the predicted results of the two models are compared with the observed values. The results of the hybrid prediction model proposed in this paper are more detailed than the ARIMA model and are closer to the actual situation.
What’s more, the PEV and BUEV of MCHC in Ningbo are further studied using the proposed hybrid forecasting model. According to the characteristics of the time series, both PEV and BUEV are in a cycle of one week, due to the large number of patients and limited medical resources. The medical service demand of most health care hospital in China has a weekly cycle, so this model can be extended to other hospitals. And data size is extended to medium and longterm data, this model can also be applied to medium or long time forecast.
Conclusions
In this work, an integration of a traditional ARIMA model and a selfadaptive filtering model is proposed to forecast the demand of medical service. ARIMA is used to identify the demand feature and obtain the initial prediction model, while selfadaptive filtering model is applied to readjust the prediction model weights to further improve prediction accuracy. Such hybrid prediction model has advantage in both prediction accuracy and prediction horizon. Applied to the forecasting on daily outpatient visiting of Maternal and Child Health Hospital in Ningbo, the hybrid model outperforms to the traditional ARIMA model in prediction accuracy. The MAPE predicted by the traditional ARIMA model in the two departments of prenatal examination and Bultrasound examination is 18.53 and 27.69%, and that of the hybrid model is 2.79 and 1.25%, respectively. The results of this forecasting study can be later used in outpatient appointment scheduling decisions of the target Maternal and Child Health Hospital to optimize the pregnant appointment, alleviate long queues in outpatient clinics and increase patient satisfaction.
This article studies the forecasting of medical service demand. With an accurate demand forecasting, the resource can be appropriately allocated and assigned to match the forecasting demand so as to in one hand reduce the patients waiting time and in the other hand reduce the medical staffs’ idle time. Furthermore, the accurate forecasting of daily outpatient visits is also crucial to scientific management of the medical service provider since it is actually the critical input to several decisions in system operation, e.g. material resource planning and scheduling, inventory control, resource allocating, labor resource rostering, etc.
In addition, the proposed hybrid model enhances the capability of traditional ARIMA, while keeps the advantage of its simplicity in only utilizing endogenous time series data as input. It can also be applied to other application fields as long as there has the time series forecasting problem.
There are still some limitations and several further studies need to be done in this research. Firstly, for the practical application of the hybrid model, only the outpatient data of two specific departments in one Ningbo Maternal and Child Health Hospital for the next 4 weeks is predicted, which is still a kind of a shortterm forecasting. The midterm and longterm medical service demand forecasting needs to be further verified using hybrid model. The followup study needs to deeply analyze the best application conditions of the hybrid model. Secondly, as revealed in several researches [16, 36], data preprocessing is needed because data in the real world is incomplete, noisy and inconsistent, the couple with preprocessing technique onto data can improve forecasting capabilities of the ARIMA model. Hence, instead of directly using the inherent features of single visiting number time series data as input, the data preprocessing method like decomposing the time series and other data series like climate factors may also be involved in the forecasting model to further improve the predicting accuracy.
Availability of data and materials
All data generated or analysed during this study are included in this published article [and its supplementary information files].
Abbreviations
 ARIMA:

Autoregressive Integrated Moving Average
 TSDL:

Time Series Data Library
 MCHCC:

Maternal and child health care center
 MAPE:

Mean Absolute Percentage Error
 ED:

Emergency Department
 MA:

Moving Average
 ES:

Exponential Smoothing
 ANN:

Artificial Neural Networks
 SVM:

Support Vector Machines
 IPD:

Indoor Patient Department
 OPD:

Outdoor Patient Department
 LOS:

Length of Stay
 FWNN:

Feedforward Neural Network
 RNN:

Recurrent Neural Network
 ACF:

Autocorrelation function
 PACF:

Partial autocorrelation function
 MAE:

Average absolute error
 PE:

Prediction Errors
 PEV:

Prenatal Examination Visits
 BUEV:

Bultrasound examination visitors
 AR:

Autoregressive
References
Kadri F, Harrou F, Chaabane S, et al. Time series modelling and forecasting of emergency department overcrowding. J Med Syst. 2014;38(9):107.
Zhu T, Luo L, Zhang X, et al. Time series approaches for forecasting the number of hospital daily discharged inpatients. IEEE J Biomed Health Informatics. 2017;21(2):515–26.
Champion R, Kinsman LD, Lee GA, et al. Forecasting emergency department presentations. Aust Health Rev A Publication Aust Hospital Assoc. 2007;31(1):83–90.
Wang Y, Gu J. Hybridization of Support Vector Regression and Firefly Algorithm for Diarrhoeal Outpatient Visits Forecasting, IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI); 2014. p. 70–4.
Sun Y, Heng BH, et al. Forecasting daily attendances at an emergency department to aid resource planning. BMC Emerg Med. 2009;9(1):1–9.
Bell AYS. Emergency department wait time Modelling and prediction at North York general hospital. Dissertation. University of Toronto. 2015.
Wang CP, Wang ZF, Shan J. Application in infectious disease forecasting by ARIMA model. Chin J Hospital Stat. 2006;03:229–32.
Jin RF, Qiu H, Zhou X, et al. Forecasting incidence of intestinal infectious diseases in mainland China with ARIMA model and GM (1,1) model. Fudan Univ J Med Sci. 2008;35(5):675–80.
Schweigler LM, Desmond JS, Mccarthy ML, et al. Forecasting models of emergency department crowding. Acad Emerg Med. 2009;16(4):301–8.
Han CY. Experimental study on forecasting hospital outpatient visits by ARIMA seasonal product model. Computer CD Software Appl. 2014;17(2):72–4.
Luo L, Luo L, Zhang X, et al. Hospital daily outpatient visits forecasting using a combinatorial model based on ARIMA and SES models. BMC Health Serv Res. 2017;17(1):469.
Côté M, Smith M, Eitel D, et al. Forecasting emergency department arrivals: a tutorial for emergency department directors. Hosp Top. 2013;91(1):9–19.
Bergs J, Heerinckx P, Verelst S. Knowing what to expect, forecasting monthly emergency department visits: a timeseries analysis. Int Emerg Nurs. 2014;22(2):112–5.
Gul M, Guneri AF. Forecasting patient length of stay in an emergency department by artificial neural networks. J Aeronautics Space Technol. 2015;8(2):43–8.
Yousefi M, Ferreira R P M, Yousefi M. A modeling approach for daily patient visits forecasting in an emergency department. 5th International Conference on Engineering Optimization, 2016.
Khaldi R, El Afia A, Chiheb R. Forecasting of weekly patient visits to emergency department: real case study. Second international conference on intelligent computing in data sciences (ICDS 2018). Procedia Computer Sci. 2019;148:532–41.
Huang D, Wu Z. Forecasting outpatient visits using empirical mode decomposition coupled with backpropagation artificial neural networks optimized by particle swarm optimization. PLoS One. 2017;12(2):e0172539.
Diao XF, Li WC. SVM and ARIMA based infectious disease forecasting. Modern Prevent Med. 2017;44(9):1545–8.
Hu LX, Chen YY, Li J, et al. Application of grey model to forecast incidence trend of intestinal infectious diseases. Dis Surveillance. 2009;24(2):135–6.
Sharma A, Mansotra V, Shastri S. Forecasting public healthcare Services in Jammu & Kashmir using time series data mining. Computer Sci Software Eng. 2015;5(12):570–5.
Marcilio I, Hajat S, Gouveia N. Forecasting daily emergency department visits using calendar variables and ambient temperature readings. Acad Emerg Med. 2013;20(8):769–77.
Adri F, Harrou F, Chaabane S, Tahon C. Time series modeling and forecasting of emergency department overcrowding. J Med Syst. 2014;38(9):1–20.
Funk S, Camacho A, Kucharski AJ, et al. Realtime forecasting of infectious disease dynamics with a stochastic semimechanistic model. Epidemics. 2016;22:56–61.
Peng ZX, Bao CJ, Zhao Y, et al. ARIMA product season model and its application on forecasting in incidence of infectious disease. Appl Stat Management. 2008;27(2):362–8.
Han KT, Jeong HK, et al. Forecasting respiratory infectious outbreaks using EDbased syndromic surveillance for febrile ED visits in a Metropolitan City. Am J Emerg Med. 2018;37:183–8.
Kadri F, Harrou F, Ying S. A multivariate time series approach to forecasting daily attendances at hospital emergency department. IEEE Symposium Series on Computational Intelligence, SSCI 2017  Proceedings 2018January; 2017. p. 1–6.
Lim C, Chen M. Forecasting Emergency Department Admissions for Pneumonia in Tropical Singapore. ISDS; 2018.
Aghelpour P, Mohammadi B, Biazar SM. Longterm monthly average temperature forecasting in some climate types of Iran, using the models SARIMA, SVR, and SVRFA. Theor Appl Climatol. 2019;138:1471–80.
Bui C, Pham N, Vo A, et al. Time Series Forecasting for Healthcare Diagnosis and Prognostics with the Focus on Cardiovascular Diseases, 6th International Conference on the Development of Biomedical Engineering in Vietnam (BME6); 2017. p. 809–18.
Purwanto EC, Logeswaran R, et al. Adv Eng Softw. 2012;53(7):23–32.
Khaldi R, El Afia A, Chiheb R. Impact of Multistep Forecasting Strategies on Recurrent Neural Networks Performance for Short and Long Horizons, The 4th International Conference on Big Data and Internet of Things; 2019. p. 1–8.
Bento PMR, Pombo JAN, Calado MRA, Mariano SJPS. Optimization of neural network with wavelet transform and improved data selection using bat algorithm for shortterm load forecasting. Neurocomputing. 2019;358:53–71.
Zhou XP. Principles, procedures and applications of selfadaptive filtering methods. Forecasting. 1985;02:36–41.
Tao TY, Gao F, Wu ZF. Selfadaptive filtering and its application to dam monitoring. Sci Surveying Mapping. 2009;34(05):181–2.
Wang JQ. Application of selfadaptive filtering method in economic forecasting. Industrial Technol Econ. 1996;15(04):88–90.
Najah N, Ruhaidah S, Shabri A. Monthly streamflow forecasting with autoregressive integrated moving average. J Phys Conf Ser. 2017;890(1):1–6.
Acknowledgements
The authors are grateful to the managers in the Yinzhou District Maternal and Child Health Care Hospital for providing history data and the support our work.
Funding
No funding.
Author information
Authors and Affiliations
Contributions
YH wrote the final manuscript. CX and YH propose a hybrid model and perform experiments. WX supervised the project. MJ, DH and YH collected the data and analyzed the data. All authors discussed the results, reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Huang, Y., Xu, C., Ji, M. et al. Medical service demand forecasting using a hybrid model based on ARIMA and selfadaptive filtering method. BMC Med Inform Decis Mak 20, 237 (2020). https://doi.org/10.1186/s12911020012561
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911020012561