A hybrid seasonal prediction model for tuberculosis incidence in China
© Cao et al.; licensee BioMed Central Ltd. 2013
Received: 8 November 2012
Accepted: 26 April 2013
Published: 2 May 2013
Tuberculosis (TB) is a serious public health issue in developing countries. Early prediction of TB epidemic is very important for its control and intervention. We aimed to develop an appropriate model for predicting TB epidemics and analyze its seasonality in China.
Data of monthly TB incidence cases from January 2005 to December 2011 were obtained from the Ministry of Health, China. A seasonal autoregressive integrated moving average (SARIMA) model and a hybrid model which combined the SARIMA model and a generalized regression neural network model were used to fit the data from 2005 to 2010. Simulation performance parameters of mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used to compare the goodness-of-fit between these two models. Data from 2011 TB incidence data was used to validate the chosen model.
Although both two models could reasonably forecast the incidence of TB, the hybrid model demonstrated better goodness-of-fit than the SARIMA model. For the hybrid model, the MSE, MAE and MAPE were 38969150, 3406.593 and 0.030, respectively. For the SARIMA model, the corresponding figures were 161835310, 8781.971 and 0.076, respectively. The seasonal trend of TB incidence is predicted to have lower monthly incidence in January and February and higher incidence from March to June.
The hybrid model showed better TB incidence forecasting than the SARIMA model. There is an obvious seasonal trend of TB incidence in China that differed from other countries.
KeywordsHybrid model Incidence Prediction Seasonality Tuberculosis
Tuberculosis (TB) is an often fatal infectious disease caused by the agent mycobacterium tuberculosis. Despite concerted worldwide efforts to control this disease, TB remains a major health issue with a high global health burden, particularly in low and middle-income countries [1, 2]. In 2009, it was estimated that there were approximately 9.4 million newly diagnosed cases, 14 million prevalent cases, and 1.7 million deaths that were attributable to TB in the world .
Since tuberculosis can be disseminated widely from active cases through aerosol droplets, it is highly cost-effective to detect a TB epidemic in its early stages in order to optimize disease control and intervention. One important characteristic of TB that assists in predicting epidemics is seasonality. A recent systematic review which reviewed studies from multiple regions around the world, noted a seasonal pattern of tuberculosis across countries. Moreover, the review noted that TB cases tended to reach their peak during the spring and summer seasons . The seasonality of TB epidemics allows more efficient and effective use of existing data and it also provides clues to explore the environmental and social factors which might influence TB epidemics.
There were some retrospective studies using routine surveillance data to describe the patterns of TB occurrence [5, 6], and the TB forecasting models have been developed in various countries to understand and predict the trajectory of a TB epidemic [7–9]. Up to the present, there has no such studies conducted in China, even though China possesses the world’s second largest tuberculosis epidemic after India.
Given the high priority to tuberculosis control, there have been large numbers of international TB prevalence studies. Nonetheless, measurements of prevalence were usually confined to the adult population and these surveys typically will not include culture-negative pulmonary TB cases . Although there are increasing numbers of countries that possess TB prevalence data from representative, population-based surveys, there has been no nationwide survey of TB incidence in any country.
In order to address these noted gaps in the literature, in this study we aim to develop a model to predict TB epidemics and to analyze its seasonality in China. There are many models which can be used in infectious disease forecasting, such as Markov chain models , Grey models, general regression models, autoregressive integrated moving average class models (ARIMA)  and artificial neural network . For better forecasting performance, hybrid models which combined two or more single models for communicable disease forecasting have also been explored, and previous findings indicate that hybrid models outperformed single models [13, 14]. We plan to compare our model with a hybrid model in order to compare their performance. The findings of this study will be useful for forecasting TB epidemics and providing reference information for TB control and intervention.
Study area and data collection
Reported monthly incidence number of tuberculosis from January 2005 to December 2011
SARIMA model construction
In the equation, B is the backward shift operator, ϵt is the estimated residual at time t with zero mean and constant variance and Xt denotes the observed value at time t (t = 1, 2… k). The process is called SARIMA (p, q, d) (P, D, Q)s(s is the length of the seasonal period). Autocorrelation function and partial autocorrelation functions were performed to identify the six main parameters preliminarily. Akaike information criterion and Schwarz Bayesian criterion were used to determine the optimal model that most closely fit the data. The SARIMA model was built using a SPSS 13.0 (SPSS Inc., Chicago, IL, USA) and p value < 0.05 was used as a cut-off for statistical significance.
Development of the hybrid model combining SARIMA model and Generalized Regression Neural Network (GRNN) model
Artificial neural network models are being applied widely in multivariate nonlinear models. There are self-organizing and self-learning processes in it . Neural networks are trained by a set of data with the outcomes that the trainer wishes the network to learn. Then the trained network can be evaluated by inputting similar but previously unseen data. Among various artificial neural network models, GRNN model is a universal approximator for smooth functions based on nonlinear regression theory. Given enough data, it is able to solve any smoothing function approximation problem. The training series of a GRNN consists of input values x, each with a corresponding value of an output y. estimated value of y can be produced minimizing the squared error. For constructing a better GRNN model, selection of an appropriate smoothing factor is very important. The smoothing factor plays a great role in matching its predictions to the data in the training patterns. The selection process is usually conducted by software.
Because the SARIMA model had been used to analyze the linear part of the actual data, the residuals should contain nonlinear relationships. In SARIMA- GRNN model, we combined the linear and nonlinear parts of the models. We selected the monthly estimated incidence number of TB at time variable t from the SARIMA model and time variable t as two input variables. There is one output variable y which was the actual reported monthly incidence number of TB. The iterations of GRNN learning and simulating data was conducted in Matlab 7.0 software package (Math Works Inc., Natick, MA, USA) to determine the relationship between the input and output variables.
Comparison between the two models in simulation performance
where Xt is the real incidence number at time t, is the estimated incidence at time t, and n is the number of predictions. Good fitness performance is demonstrated with these three indices showing as small a value as possible.
Parameter estimates and their testing results of the SARIMA model
In constructing the SARIMA-GRNN model, the simulation accuracy of the GRNN model was determined by using the smoothing factor δ.After exploring various smoothing factors from 0.1 to 1.0, we found when the smooth factor was 0.1, the hybrid model has lowest MSE, MAE and MAPE. So we selected 0.1 as an appropriate smoothing factor.
Reported and forecasted TB cases for 2011
From the model curve (Figure 3), we found that yearly TB incidence number in China showed a slightly decreasing trend. There was also a seasonal pattern in the number of new TB cases. The monthly incidence of TB was lower in January and February and higher from March to June.
We developed a SARIMA model and a SARIMA-GRNN hybrid model to predict monthly incidence of TB in China. Although both models could simulate the TB time series data well, the hybrid model, which takes both linear and nonlinear components into account, outperformed the SARIMA model. We believed that models combining ARIMA and artificial neural network contain more data characteristics than non-hybrid models and may thereby be better for forecasting.
Based upon the results of this study, we believe that there will be no obvious improvement in the high burden of TB in China in the near future. The results indicated that in the near future, the reported annual TB incidence numbers in China will decrease only slightly. Even though this study’s analysis was based only on reported cases of TB since time series data of real incidence of TB in China cannot be readily obtained, the web-based and case-based mandatory TB reporting system has been fully operational since 2005 and covers almost 100% of detected TB cases in China . Hence, the reported incidence number should closely mirror the real incidence of TB. Moreover, the trends and epidemics from reported incidence numbers are very similar to those actual incidence and epidemic situation of TB. These findings reveal that the China’s progress in TB control has been very incremental and intensified interventions are urgently needed.
In this study we concluded that there is a seasonal variation in TB incidence that showed periodicity in China. The monthly incidence number demonstrated a trough in January and February and was higher from March to June. This seasonal trend of TB incidence in China differed from that of many other nearby regions. For instance, in Hong Kong, the peak season of TB incidence is summer only . In northern India the peak time season of TB incidence occurred between April to June and in October and December; although the TB incidence was lower in other months, there was no obvious seasonal trends . One plausible explanation of these seasonal trends in China may be that the annual Spring Festival, the most important traditional festival of the year, usually falls in mid-February. During the entire month of February, there are huge population movements throughout China by various transportation modes. In view the TB incubation period, we suggest that the peak time of TB incidence may be partly attributed to the large population flows and the poor ventilation of public transportation during the Spring Festival period. The Spring Festival period may be the peak time of TB transmission. Hence, measures to prevent TB transmission within public transportation during the Spring Festival may play an important role in decreasing incidence number particularly from March to June. Other possible mechanisms for the seasonal variations in TB incidence in China need to be further studied.
There are a number of limitations in present study as follow. Firstly, climate–related data and data related to population movements were not included in the model fitting because of limitations in data availability. Secondly, China is a vast country and with a wide variation in climate, so seasonality of TB in the various geographic regions may differ. Due to lack of available data, seasonality at smaller area levels was not been analyzed. Lastly, the models were derived using data only from 2005 through 2010 and tested against only one year of data. Hence, these findings should be interpreted with caution and may be re-examined with additional time series data.
Limitations inherent in non-hybrid forecasting models can be compensated by developing applying hybrid model that combines two or more models. The resulting hybrid model may thereby be more effective than a single model in producing reliable forecasts of TB. Since the models suggest that the TB epidemic in China will not decrease markedly in the coming years, there is a need to implement greater TB control measures in China. The seasonality of the TB incidence suggested by the models also indicate the need for interventions focused on reducing infectious disease transmission on public transportation during the Spring Festival period.
Shiyi Cao: Doctoral candidate in Tongji Medical College, Huazhong University of Science and Technology
Feng Wang: Research Assistant Professor in School of Public Health and Primary Care, the Chinese University of Hong Kong
Wilson Tam: Research Assistant Professor in School of Public Health and Primary Care, the Chinese University of Hong Kong
Lap Ah Tse, Assistant Professor in School of Public Health and Primary Care, the Chinese University of Hong Kong
Jean Hee Kim, Assistant Professor in School of Public Health and Primary Care, the Chinese University of Hong Kong
Junan Liu: Associate professor in Tongji Medical College, Huazhong University of Science and Technology
Zuxun Lu: director and professor in Tongji Medical College, Huazhong University of Science and Technology
Seasonal autoregressive integrated moving average
Generalized regression neural network
Mean square error
Mean absolute error
Mean absolute percentage error
Ministry of Health of the People’s Republic of China.
This project was supported by National Natural Science Foundation of China (ID: 70973041) and Chinese Major National Science and Technology Programs (ID: 2009ZX10003-019).
- Dye C: Global epidemiology of tuberculosis. Lancet. 2006, 367 (9514): 938-40. 10.1016/S0140-6736(06)68384-0. Epub 2006/03/21View ArticlePubMedGoogle Scholar
- Corbett EL, Marston B, Churchyard GJ: Tuberculosis in sub-Saharan Africa: opportunities, challenges, and change in the era of antiretroviral treatment. Lancet. 2006, 367 (9514): 926-37. 10.1016/S0140-6736(06)68383-9. Epub 2006/03/21View ArticlePubMedGoogle Scholar
- WHO: Global tuberculosis control 2010. 2010, Geneva: World Health OrganizationGoogle Scholar
- Fares A: Seasonality of tuberculosis. J Glob Infect Dis. 2011, 3 (1): 46-55. 10.4103/0974-777X.77296. Epub 2011/05/17View ArticlePubMedPubMed CentralGoogle Scholar
- Tanaka M: Tendency of seasonal disease in Japan. Global Environ Res-English Edition. 1998, 2: 169-76.Google Scholar
- Naranbat N, Nymadawa P, Schopfer K: Seasonality of tuberculosis in an Eastern-Asian country with an extreme continental climate. Eur Respir J. 2009, 34: 921-10.1183/09031936.00035309.View ArticlePubMedGoogle Scholar
- Munoz MP, Orcau A, Cayla J: Tuberculosis in Barcelona: a predictive model based on temporal series. Revista Espanola De Salud Publica. 2009, 83 (5): 751-7. 10.1590/S1135-57272009000500016.View ArticlePubMedGoogle Scholar
- Debanne SM, Bielefeld RA, Cauthen GM: Multivariate Markovian modeling of tuberculosis: forecast for the United States. Emerg Infect Dis. 2000, 6 (2): 148-57. 10.3201/eid0602.000207. Epub 2000/04/11View ArticlePubMedPubMed CentralGoogle Scholar
- Rios M, Garcia JM, Sanchez JA: A statistical analysis of the seasonality in pulmonary tuberculosis. Eur J Epidemiol. 2000, 16 (5): 483-8. 10.1023/A:1007653329972. Epub 2000/09/21View ArticlePubMedGoogle Scholar
- WHO: global tuberculosis control 2011. 2011, Geneva: World Health OrganizationGoogle Scholar
- Zhu JM, Tang LH, Zhou SS: [Study on the feasibility for ARIMA model application to predict malaria incidence in an unstable malaria area]. Zhongguo Ji Sheng Chong Xue Yu Ji Sheng Chong Bing Za Zhi. 2007, 25 (3): 232-6. Epub 2007/11/28PubMedGoogle Scholar
- Cunha GB, Luitgards-Moura JF, Naves EL: [Use of an artificial neural network to predict the incidence of malaria in the city of Canta, state of Roraima]. Rev Soc Bras Med Trop. 2010, 43 (5): 567-70. 10.1590/S0037-86822010000500019. Epub 2010/11/19. A utilizacao de uma rede neural artificial para previsao da incidencia da malaria no municipio de Canta, estado de RoraimaView ArticlePubMedGoogle Scholar
- Zhu Y, Xia JL, Wang J: [Comparison of predictive effect between the single auto regressive integrated moving average (ARIMA) model and the ARIMA-generalized regression neural network (GRNN) combination model on the incidence of scarlet fever]. Zhonghua Liu Xing Bing Xue Za Zhi. 2009, 30 (9): 964-8. Epub 2010/03/03PubMedGoogle Scholar
- Yan W, Xu Y, Yang X: A hybrid model for short-term bacillary dysentery prediction in Yichang City, China. Jpn J Infect Dis. 2010, 63 (4): 264-70.PubMedGoogle Scholar
- Kirby SD, Eng P, Danter W: Neural network prediction of obstructive sleep apnea from clinical criteria. Chest. 1999, 116 (2): 409-15. 10.1378/chest.116.2.409. Epub 1999/08/24View ArticlePubMedGoogle Scholar
- Wan L, Cheng S, Chin DP: A new disease reporting system increases TB case detection in China. Bull World Health Organ. 2007, 85 (5): 401-10.2471/BLT.06.036376.View ArticlePubMed CentralGoogle Scholar
- Leung CC, Yew WW, Chan TY: Seasonal pattern of tuberculosis in Hong Kong. Int J Epidemiol. 2005, 34 (4): 924-30. 10.1093/ije/dyi080. Epub 2005/04/27View ArticlePubMedGoogle Scholar
- Thorpe L, Frieden T, Laserson K: Seasonality of tuberculosis in India: is it real and what does it tell us?. Lancet. 2004, 364 (9445): 1613-4. 10.1016/S0140-6736(04)17316-9.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/13/56/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.