Machine learning based forecast for the prediction of inpatient bed demand

Background Overcrowding is a serious problem that impacts the ability to provide optimal level of care in a timely manner. High patient volume is known to increase the boarding time at the emergency department (ED), as well as at post-anesthesia care unit (PACU). Furthermore, the same high volume increases inpatient bed transfer times, which causes delays in elective surgeries, increases the probability of near misses, patient safety incidents, and adverse events. Objective The purpose of this study is to develop a Machine Learning (ML) based strategy to predict weekly forecasts of the inpatient bed demand in order to assist the resource planning for the ED and PACU, resulting in a more efficient utilization. Methods The data utilized included all adult inpatient encounters at Geisinger Medical Center (GMC) for the last 5 years. The variables considered were class of inpatient encounter, observation, or surgical overnight recovery (SORU) at the time of their discharge. The ML based strategy is built using the K-means clustering method and the Support Vector Machine Regression technique (K-SVR). Results The performance obtained by the K-SVR strategy in the retrospective cohort amounts to a mean absolute percentage error (MAPE) that ranges between 0.49 and 4.10% based on the test period. Additionally, results present a reduced variability, which translates into more stable forecasting results. Conclusions The results from this study demonstrate the capacity of ML techniques to forecast inpatient bed demand, particularly using K-SVR. It is expected that the implementation of this model in the workflow of bed capacity management will create efficiencies, which will translate in a more reliable, inexpensive and timely care for patients.


Introduction
Designing and implementing effective hospital capacity management decisions and efficient staffing decisions is a critical challenge in every healthcare system. Specifically, a mismatch in bed capacity to bed demand and the corresponding clinical staffing requirements can have negative effects on key performance indicators like hospital access, wait times, quality of care as well as patient and employee satisfaction. It also invariably results in an increase of an assortment of costs. When the supply of hospital beds exceeds the demand of beds, it will likely result in higher costs and wasted resources from maintaining and staffing open beds [1,2]. When the demand for beds exceeds the supply, the hospital will likely experience longer waiting times especially for patients in the ED who are waiting for an inpatient bed, which can result Page 2 of 13 Tello et al. BMC Medical Informatics and Decision Making (2022) 22:55 in sub-standard quality of care, poor employee satisfaction, increase rate of near-misses, and lower patient satisfaction [3][4][5].
The only independent variable in this phenomenon is the bed demand, which naturally fluctuates to express flu season, holidays, vacations, etc. Studying these changes in the demand for inpatient beds is a widely discussed and studied problem which impacts hospital's ability to provide timely care for patients, among other negative effects such as increased probabilities of adverse events [6,7], higher length of stay in the ED [8,9], increased mortality [10], and low patient and staff satisfaction [11]. To tackle this issue, a myriad of strategies have been proposed and tested. For instance, the creation of holding units, the introduction of early discharge [12] and the adoption of surgical demand smoothing [13] were identified as effective approaches that can increase patient flow [14].
Many of these concepts include data insights provided by predictive modeling or ML approaches which relies on data from clinical admissions [15][16][17]. We recognize that providing near-term bed demand forecasts to administrative personnel such as operating room, schedulers, inpatient bed coordinators, and operations managers can increase their ability to assertively maintain efficient levels of occupancy.
While highly variable patient bed demand makes hospital capacity management even more challenging, the utilization of ML algorithms can help hospital operations stakeholders make better decisions by combining their insights with advanced analytics. There have been recent advances in ML techniques that are utilized in predictive analytics, outperforming traditional time series techniques (described in the literature review). This study leverages ML techniques and implements a strategy in the area of predictive analytics at a non-profit integrated healthcare system in Danville, Pennsylvania.
The objective of this study is to develop a ML based strategy able to provide an accurate forecast of the inpatient bed demand for the week. The proposed model will use tailored ML techniques to achieve a minimal error in the prediction, so the hospital's capacity management team can take proactive measures, provided the best possible information, to ensure efficient patient flow, and therefore, outcomes.

Background
This project takes place at Geisinger Medical Center (GMC), a hospital located in Danville, Pennsylvania and a part of Geisinger. GMC is a tertiary/quaternary teaching hospital with approximately 350 licensed and staffed adult inpatient beds. Since GMC is the only level 1 trauma center serving a large portion of the central Pennsylvania region, it is crucial that the capacity management plan has a reliable amount of bed capacity available to provide appropriate care to all patients in the area. The adult occupancy rate of GMC is frequently above 90% (well above national average).
Previous efforts at GMC focused on the development of a Monte Carlo simulation (MC) to study the relationship between the surgical schedule and crowding, by modeling length of stay and patient flow for surgical and non-surgical patients [18]. The results of the MC simulations were then reported and used by the Geisinger Placement Services (GPS) to predict overcrowding and make management decisions. When excessive bed demand is forecasted, the GPS team can decide to take actions including decanting surgical cases to nearby affiliated hospitals to perform non-urgent surgeries, reorganizing surgical case schedules, giving priority to expedited discharge processes, and adjusting staffing levels. The MC could accept user input to run what-if scenarios based on planned changes to surgical volumes at GMC to re-estimate inpatient bed demand. When surgeries are moved to nearby affiliated hospitals, the beds at GMC can be reallocated to meet the demand of the ED, OR and other arrival sources. This approach was able to capture the logic of the problem, but fell short in performance, making it necessary to revise the model.
To improve the performance of the forecast, a predictive modeling approach was proposed. A series of neural network-based models were built to predict from 1 to 5 days ahead, improving the performance obtained by the MC.

Hospital census forecasting
The majority of work published about forecasting in hospitals relates to patient visits to the ED or other specific departments inside hospitals [13,[19][20][21][22][23][24]. An additional layer of complexity is added through the irregularity and volatility of in-patient visits when predictions are made for a hospital's in-patient census. These predictions involve patients not just in the ED but in other departments which have different process flows, service times and lengths-of-stay (LOS). ML based models overcome this input complexity by using relevant factors as predictors [25].
The analysis from one research study identified that at an occupancy level of 100%, there is a 28% chance of at least one severe event occurring and a 22% chance of more than one severe event occurring [6]. Depending on the type of strategic decision-making time horizon, there are different models developed for predicting inpatient demand. Longer term strategy needs give rise to models that predict monthly forecasts [25]. More immediate strategies need to give rise to models with shorter horizons like the next 3, 5, or 7 days [24].

Forecasting methods
There is a relatively small number of previous research developed in the context of bed demand forecasting [26], particularly using ML models. Simple autoregressive integrated moving average (ARIMA) models are among the first tools used to forecast bed demand [27]. One of the most common methodologies for estimating hospital bed demand is based on the valuation of the patient's LOS at the hospital [28,29]. Mackay et al. in 2005 proposed that these models are defective due to the complexity of the long-stay distribution, among other reasons. Hence, the authors investigated model selection and assessment in relation to hospital bed compartment flow models using The Bed Occupancy Management Planning System (BOMPS) software, in which hospital bed prediction models are developed from the simplest to the most complex, depending on their prediction horizon and model structure [30]. Ordu et al., in 2019 developed a framework for generating demand prediction models for each of the hospital areas [31]. The models are simple in nature and vary from ARIMA models to exponential smoothing, multiple linear regression (MLR), and seasonal and trend decomposition, where the prediction horizons are daily, weekly, or monthly depending on the hospital unit. The models proposed, for instance for Accident and Emergency admissions, have a low adjusted R 2 value of 60%, where the monthly MLR produced the best goodness of fit. Kutafina et al. in 2019 developed a recursive neural network model for forecasting hospital bed demand occupancy (the most similar study to ours). In this article, authors propose different training alternatives for prediction models based on different time intervals for the training data set (from 1 to 5 year) and different dates and horizons for prediction [26]. Their results show an average mean absolute percentage error (MAPE) of 7.22% (5.76-9.22%) with a mean absolute error (MAE) of 15.65 beds for yearly predictions between 2009 and 2015. As discussed in26 the model that is proposed here (K-SVR) requires only historical admissions and no private/sensitive patient information (e.g., age or vital signs), as those models, for instance, try to infer the patient's LOS. We show with our model that the decomposition of the historical demand provides a better means to generate accurate predictions of hospital bed demand for 1 and 2 days ahead.

Inpatient bed demand
We define inpatient bed demand as the number of patients occupying inpatient beds (census) plus the number of patients that should be in inpatient beds but are in another location (or in a holding pattern) such as the ED, PACU, or Cardiac Recovery Suite (CRS). The population in this study included all adult patient encounters at GMC for 5 years. The data was complemented with 2 years of the most recent data. Patient and hospital data were collected using Geisinger's electronic health record (EHR) and unified data-architecture (UDA). The data was then extracting by querying our data warehouse server and shared with our research partners via data use agreement. Our population included patients who had a patient class of inpatient, observation, or SORU at the time of their discharge. The population excluded patients who spent any time in a pediatric unit, had a level of care of psych, or did not have one of the following levels of care: critical care, step down, medical, surgical, telemetry, or surgical overnight.

K-SVR forecasting model
The predictive strategy used in this study follows the model proposed by Feijoo, Silva and Das., (2016), which combines a classification stage followed by a regression (forecast) stage. The engine seeks to assign (classification stage) the response variable, in this case, future bed demands, into one of different groups or clusters of demands. The classification stage of the forecast engine is performed using a Support Vector Machine (SVM) model. Following the classification stage, for each of the clusters, a Support Vector Regression (SVR) model is locally developed, i.e., each cluster considers a unique forecasting regression model that seeks to accurately predict bed demands of days that follow similar historical patterns (forecast stage of the forecasting engine). Hence, the forecaster engine contemplates a set of "K" SVR models (K-SVR), where K represents the number of clusters to be considered. The cluster analysis and the subsequent selection of K (number of clusters) is based on the K-means clustering method; however, any other clustering approach could be used. A schematic representation of the forecaster engine is shown below in Fig. 1. As it can be noted, the algorithm starts with the data processing followed by the cluster analysis and the feature engineering step. As mentioned above, the clustering analysis guides the selection of K clusters, while the feature engineering step (based on partial and autocorrelation functions) allows to determine lagged values and seasonal patterns used in the regression stage (SVR model). Once K, lagged, and seasonal parameters are selected, we train an SVM model (classification stage) based on the clusters identified with the K-means method. For each of these clusters, a regression model, based on SVR, is trained, considering the lagged and seasonal parameters found in the feature engineering preprocessing step. Note that the framework is general enough to work as a simple 1-day ahead prediction, or for more complex tasks of recursively forecasting n-days ahead. The SVR models are developed using a mixed-features a selection method, based on the information provided by the autocorrelation and partial autocorrelation functions (see Fig. 2 in results). These functions suggest the use of previous or lagged values for demand (previous days) and seasonal patterns for the model. On the other hand, an incremental strategy is simultaneously used for variable or feature selection, which refers to starting with an empty model and then adding potential predictors. As predictors are added to the model, this strategy seeks to explain whether and to what extent each predictor reduces unexplained variation. The list of final selected covariates used in the model, which provide the best forecast accuracy in the training process, is shown in Table 1. Hence, bed demand can be forecasted using time series information from previous demand days as well as other covariates' historical information. Mathematically, the SVR model can be represented as a regression model as follows, where Y t+1 represents the forecasted bed demand for 1 day ahead (t + 1), Y t−i represents the lagged values (l lagged values) of bed demand days, Y t−(s−1) accounts for the seasonal association (s denotes the lag for a seasonal trend), and X j t−i considers every other covariate (P covariates) used for the prediction that is not a lagged value of bed demand (see Table 1 for a list of all covariates).

Measures of forecast performance
The K-SVR model is developed and tested as follows. First, we chose 5 clusters to be used by the forecasting engine. The number of clusters is chosen based on minimizing the sum of squares within clusters (sum of cluster's individual errors). It is important to balance between the number of clusters and the available data that falls within each cluster. A large number of clusters will significantly reduce the sum of squares within clusters, however, it is possible to obtain clusters with few data points, for which the local SVR model will tend to forecast overfitted values. Here, we made the selection of 5 clusters based on the errors shown in Fig. 3. The classification SVM layer as well as the local SVR models are created using 70% of the data for training purposes, and tested on specific dates (days, weeks) that fall within the testing period. The training and testing time series do not have any overlapped data points. Additionally, the model is tested (testing time series) on four independent test weeks (without overlapping). Also, both SVM and the SVR models were developed by optimizing (tuning) their input hyper-parameters and then trained with a tenfold cross validation. Finally, using the ACF and PACF information, we selected a lagged value (l parameter in Eq. 1) of 3 and a seasonal value (s parameter in Eq. 1) of 7. The model performance was tested using the following objective measures. ( The elements Y r i and Y

Results
We now present the results from the model on distinct instances. First, we developed a model to forecast the bed demand for 1 day ahead while considering patients coming into the ED during the weekends. This approach helps to better correlate the data, even though there are no surgeries scheduled on those days. As a counterpart case, we present a model that only uses weekday (Monday to Friday) data, hence creating a different lagged pattern on the model. We then develop and present the results for a  2 day ahead bed demand prediction following the same approach data availability approach (weekends and week only data).

Data set statistical description
The K-SVR model is built based GMC data corresponding to all adult patient encounters at GMC for 5 years. The specific variables obtain from the data and used to create the K-SVR model are described next. Table 1 shows the variable definition. The data considers historical utilization of beds (demand). We used lagged values (3 days) and a weekly seasonal lag (7 days) of demand. Such lags were obtained based on the ACF and PACF functions, as shown in Fig. 2. The significance of the lagged values can also be observed on the correlation matrix as shown in Fig. 4.
We develop four different models, each corresponding to different forecasting timeframes (1 day and 2 days ahead, with and without weekends data). For all four models, we used a number of K = 5 clusters. This number is obtained based on the within-groups sum of squares results shown in Fig. 3. From the figure we observe that for a number of clusters (K) larger than 5, the reduction of the sum of squares (error or differences within clusters) is small. Also, it is important to note that a larger value of K can be chosen if variance within clusters wants to be reduced. However excessively reducing variance on each clusters may result in subsets with a small number of data points, increasing the risk of developing overfitted models. Therefore, using the standard "elbow" method in clustering analysis, K = 5 is chosen. Finally, all models are tested on four non-consecutive different test weeks (same weeks for each of the four K-SVR models developed) that do not belong to the training time series data. Results of the K-SVR models are compared to an autoregressive integrated moving average (ARIMA) model, as data satisfies the assumptions for such models, as well as a version of K-SVR with K = 3 (as compared to K = 5), to demonstrate the fact that lower values of K in fact increase the variability within each cluster, increasing the forecast error.

Prediction of bed demand for a day considering weekends
As previously mentioned, an initial model with K = 5 was developed to generate a prediction for 1 day ahead while using weekend data. The optimal K-SVR model is compared against an ARIMA model and a different version of K-SVR with lower number of clusters (non-optimal). The performance of all models was evaluated based on the MAPE, MAE, RMSE and error variance, as shown in  Figure 5 shows the high precision of the forecast results for the four test weeks.

Prediction of bed demand for 1 day ahead without considering weekends
We tested the two K-SVR (K = 5 and K = 3) and the ARIMA models on the same four sample weeks as in "Data set statisticaldescription" section. where data that is available during the weekend days is removed from   Figure 5 shows the comparison between the forecasted values and the actual values for each of the test weeks. For this case, opposite to the case where weekend data is available and for two-days ahead forecast (see next subsections), the two K-SVR models showed similar results, while the ARIMA model was again the least accurate. Generally, when compared to the previous case (1 day ahead with weekend data), we observed that the forecast accuracy slightly drops. Among the 4 weeks, the average MAPE for this case is 1.78%, compared to 1.11% when weekend data is considered, representing a 0.67% difference. In absolute terms, the difference in the (average) MAE is 2.34 beds (3.81 versus 6.15 when weekend is considered) with a RMSE difference of 3.05 beds per day (7.41 compared to 4.36). Depending on the data access and availability, the small differences between the two models allows for reliance on both models. The forecast comparison for both previous models (with and without considering weekend data) and the real demand, for the four random test weeks, are presented in Fig. 5.

Prediction of two-day bed demand considering weekends
In this section we present and explain the results for the 2 day ahead forecast model. The model characteristics are the same as those presented earlier, with the consideration that the forecast is made for 2 days ahead. Hence, proper data manipulation must be considered to correctly define the lag-values that are used as inputs for the model. We first present the results for the 2 day ahead model with the consideration of the weekend data for all three models (two K-SVR and the ARIMA model). Table 3 shows the results for the scenario described above and for four sample weeks used to test the 2 day ahead model. We obtain, on average among the 4 weeks, a MAE 4.89 beds per day, with a RMSE of 5.77 beds per day, which translates to an MAPE of 1.42% (optimal K-SVR). Note that, for a fair comparison in this case, we consider the same four random test weeks as in the case for 1 day ahead. Figure 6 shows the result of the forecasted demand and the actual values for each of the test weeks. Interestingly, when weekend data is considered, the optimal K-SVR model behaves similarly as the case of the 1 day ahead model. The lowest MAPE values were obtained for the sample week 3, followed by week 2. As previously noted, the ARIMA model again shows the least accurate results, while the K-SVR with K = 3 performs better than ARIMA but not as accurate as the case of K-SVR with 5 clusters.

Prediction of bed demand for 2 days without considering weekends
Lastly, we use both K-SVR and the ARIMA models to forecast 2 days ahead without considering weekend data. We use the same four sample weeks as in the previous three cases. As in the case for 1 day ahead, the data that can be available during weekend days is considered to fit the K-SVR model. Hence, the bed demand prediction made for the first day of the week, Monday, considers information available on the Thursday from the corresponding previous week (the real prediction is 4 days ahead when no weekend data is considered). Since this is a 2 day ahead prediction, the forecast for Tuesday is made based on data from the Friday of the previous week. This process is repeated until the forecast for the whole test week is completed. The same logic applies for the lag values 2, 3, and 7 and the remaining days of the test weeks (Wednesday to Friday). Results of the models for the 4 performance measures are shown in Table 3. It is possible to observe that for the optimal K-SVR, on average, the MAE corresponds to 11.59 beds per day, with an RMSE of 14.10 beds per day, representing a MAPE of 3.34%. As in the previous case, the best performance was obtained for test week 3, with a MAPE value of 2.75%, representing a MAE of 9.89 beds per day with an RMSE of 12.79 beds per day. Similarly, to previous cases, the alternative K-SVR (K = 3) and the ARIMA model do not achieved the accuracy levels than the best K-SVR (K = 5) model. In this particular case, when considering 3 clusters, K-SVR even performs similarly to the ARIMA model. Figure 6 shows the comparison between the forecasted values and the actual values for each of the test weeks. To facilitate the comparison, Fig. 6 also shows the results for 2 day ahead with weekend data. It is clearly observed that the results for 2 day ahead model without weekends (i.e., 4 days ahead for the first 2 days of a week) are the less accurate among all the models and cases presented here. When we compare the two cases (with and without weekends) for a 2 day ahead forecast (see Fig. 6) using the K = 5 SVR model, we observed that the forecast accuracy drops significantly for the latter case. Among the 4 weeks, the average MAPE for this case is 3.34%, compared to 1.42% when weekend data is considered, representing a 1.92% difference between these two cases. In absolute terms, the difference in the (average) MAE is of 6.7 beds per day (11.59 versus 4.89 when weekend is considered).
The reasons for the model to perform better with weekend data are twofold. First, as seen in Fig. 7 below, typically the census drops on the weekend after mid-week highs. This is primarily due to more scheduled surgeries and physician availability during the weekdays. Leaving this data out is analogous to leaving a part of the trend out from the model. For example, the lows on Sunday might play an important role on the rate of increase of census Monday through Wednesday. Including weekend data would help identify these signals. Secondly, in our analysis we saw the interactions between the previous few days as significant. Excluding weekend data would mean that we are excluding these interactions of the past few days. Excluding weekends and predicting for Monday using last Friday's data is similar as predicting 3 days ahead instead of 1 day ahead. The further into the future we forecast, the further the prediction accuracy is expected to drop. This drop in performance would also explain the performance difference between the models that used weekday and weekend data.

Discussion and conclusions
While hospitals in general would like to reduce occupancy rates to improve patient outcomes, hospitals also need to proactively plan on having enough capacity and improve processes in order to provide timely care for patients to improve patient wait times and satisfaction. Additionally, maintaining staffed beds is expensive and challenging, therefore, hospitals have an economic incentive to maintain high utilization without running out of capacity. When hospital operations decision makers have easy access to near term bed demand forecasts, combined with their expertise and real time assessment of their bed capacity, they can make proactive, data driven operational decisions. Inputs to these models consist of daily updated time-series data which is captured in our hospital's UDA. Once the model is trained, we deploy it on our servers where it makes daily forecasts with updated data. These forecasts can be made accessible to key hospital operations decision makers through interactive dashboards or other modes of communication.
The MAE and MAPE performance for the models presented in this paper are within acceptable parameters to hospital administration stakeholders, allowing them to make data driven decisions. When bed demand forecasts are high, they can begin expediting low acuity patient discharges, decant elective surgical cases to nearby affiliate hospitals or reschedule them, and mandate nursing overtime one to 3 days in advance to prepare for the increased bed demand. Mandating nursing overtime or increasing nursing staffing days in advance eliminates the need to rely on expensive, contracted nursing and support staff. Extra bed capacity can then be used to handle the increased patient demand, allow appropriate patient flow in accordance with their level of care, and minimize patient holding times in the PACU, the ED, and other patient arrival sources. When forecasts point to lower bed demand, stakeholders can ensure hospitals aren't overstaffed and potentially increase patient satisfaction with private patient rooms. We currently predict the adult census for the entire hospital and don't delineate by level of care. In the future, with access to additional granular department level data, we can extend the models presented in this paper to get detailed forecasts for individual departments within a hospital. Considering what's currently been seen with the COVID-19 pandemic, this model, with additional COVID-19 hospitalization data, could be used to help manage hospital capacity in uncertain times. Now that COVID-19 will be a part of every hospitals' bed demand, advanced bed demand models will be necessary more than ever to quantify and forecast hospital capacities for a variety of scenarios including surges and shut-downs.