 Research article
 Open Access
 Published:
A comparative study on the prediction of the BP artificial neural network model and the ARIMA model in the incidence of AIDS
BMC Medical Informatics and Decision Making volume 20, Article number: 143 (2020)
Abstract
Background
As a kind of widely distributed disease in China, acquired immune deficiency syndrome (AIDS) has been quickly growing each year, become a serious problem and caused serious damage to the life and health of people and the social events of China and the world because of its high fatality rate. It has been much concerned by all aspects of society. Therefore, developing early warning technology and finding the trend of early development are of quite significance to prevent and control human immunodeficiency virus (HIV)/AIDS. This study aimed to explore a suitable model for the morbidity of AIDS in China and establish a professional and feasible disease prediction model for the prevention and control works of AIDS.
Methods
At present, the traditional linear model is still utilized by most scholars to predict the incidence of HIV/AIDS. In addition, some scholars may attempt to use the nonlinear prediction model. Both prediction models showed good fitting and prediction effects. In China, the incidence of AIDS presents linear and nonlinear characteristics. In this research, the nonlinear back propagation artificial neural network (BPANN) model and the typical autoregressive integrated moving average (ARIMA) linear model were applied to predict the incidence of HIV/AIDS and compare their fitting effects.
Results
Both models were capable of predicting the expected cases of AIDS. It was seen that ARIMA and BPANN models could be used to forecast the monthly incidence of HIV/AIDS, but the fitting and forecasting effects of the nonlinear BP neural network model were better than those of the traditional linear ARIMA model.
Conclusions
In summary, it was further concluded that the BPANN model was a suitable way to monitor and predict the change trend and morbidity of AIDS in China.
Background
Human Immunodeficiency Virus (HIV) is a deadly virus weakening and attacking the immunity system, which can induce Acquired Immune Deficiency Syndrome (AIDS) that is recognized as one of notifiable communicable diseases around the world [1]. During the last decades, AIDS has been seen as an epidemic that becomes a serious public health problem and social event all over the world, causes serious damage to the life and health of people and affects all aspects of society. In the global context, 36.9 million people were carried with HIV, and 0.94 million people died of HIVassociated diseases by the end of 2017 [2]. Since 1998, the number of provinces affected by HIV/AIDS has reached 31, which still sees a rapid increase in China [3]. The epidemic of AIDS/HIV has been worsened to pose serious threats to public health. Each year, it seems that new infection cases are increasing in China [4, 5]. In 2015, about 571,000 people (15 years old and above) were infected with HIV [6].
Therefore, it is a must to prevent and control the prevalence of AIDS in China. A number of policies on the prevention and control works of HIV disease have been issued by the government. In order to supervise the spread of HIV/AIDS, the National Notifiable Disease Surveillance System was organized in 1995, and the surveillance data for primarily affected populations was collected [7, 8]. Since 2004, this system has been applied to monitor the prevalence of HIV and HIVrelated behaviors [9].
Over the past few years, mathematical models have been used to successfully predict the incidence of HIV/AIDS. In the 1980s, the model suggested by the Joint United Nations Programme on HIV/AIDS (UNAIDS) was adopted to forecast HIVinfected patients in many countries so as to identify the growing trend of the disease. The methods are the Workbook Method [10], Estimation and Projection Package (EPP) method [11], Spectrum AIDS Impact Model [12] as well as Asian Epidemic Model (AEM) [13]. Due to the changing incidence of AIDS, it is necessary to think through its influence factors. In these models, adequate indicators are required to fit in different estimation and prediction curves about the epidemic situation of HIV/AIDS. Otherwise, the results will greatly deviate from the actual situation. The features of four models are as follows, Workbook, the parameters required are some relatively fixed demographic indicators, including local adult population, gender composition, base of various highrisk groups and high and low values of infection rates, base of various lowrisk groups and high and low values of infection rates, etc. [10]. Spectrum AIDS Impact Model, HIVinfected people receive Antiretroviral Therapy (ART) to extend their survival time. The change in survival time will affect the prediction results of SPECTRUM [14]. EPP, the number of people receiving treatment has increased with the promotion and use of condoms. The improvement of treatment methods and other prevention and control works have reduced the quality and representativeness of monitored data, which exerts a direct influence on EPP’s estimation and prediction of epidemic situations [15]. AEM, its monitoring indicators have a large number of difficult items. Monitoring data has highquality requirements. Only on the premise of sufficient data and quality assurance can appropriate model parameters be obtained. Then, predictions can be made. Otherwise, major mistakes are easy to make [16].
Also known as the historical extension prediction method, the time series prediction method is a kind of historical data extension prediction that is a method of extrapolating and predicting the development trend of things, which can be reflected by time series. More common traditional time series prediction methods include the AutoRegressive Integrated Moving Average (ARIMA) model, exponential smoothing method, etc., among which ARIMA is the most representative. Considered as one of the major ways to make time series analysis, the ARIMA model involves the changes of trends, random interference and periodic variations and the invariance of other related random variables during time series analysis. Earnest et al. believed that the ARIMA model was quite easy and fast to set related parameters on the prediction of communicable diseases [17]. The establishment of the ARIMA model requires collecting relevant historical data, processing data in advance according to its stability requirements, drawing the diagram of autocorrelation coefficients and partial correlation coefficients to determine the optimal model and finally use it to predict the development trend. Nowadays, ARIMA is used to estimate the mortality of influenza, malaria and other infectious diseases.
In most cases, nonlinear structures are adopted during time series analysis as adequate results cannot be obtained from linear models. In many domains, the Artificial Neural Network (ANN) is applied due to its possibility of getting over the limitations of linear models [18] and analyzing the stronglycoupled and highlynonlinear correlations between multiple input and output variables. In nonlinear artificial neural network models, particularly the Back Propagation Artificial Neural Network (BPANN), the BPANN model can improve prediction accuracy close to various functions of arbitrary nonlinear structures [19], and accommodate more multidimensional inputs to improve the accuracy of predictions because of its inherent selflearning property, simple structure and strong simulation ability.
The data of AIDS incidence in China has shown a coexistence of linearity and nonlinearity. In this paper, it was suggested that the nonlinear relationships should exist for the monthly morbidity of AIDS while accuracy relations should not be extracted from the linear model. Two models, namely ARIMA and BPANN, were established to forecast the morbidity of HIV/AIDS during the period of 2007–2016. By comparison, the future growing trend of HIV/AIDS was described for early detection and warning.
Methods
ARIMA model
As a common linear model in time series analysis, the ARIMA model is usually constructed as ARIMA (p, d, q) (P, D, Q) _{S}, p, d, q, P, D, Q and S refer to autoregressive order, number of difference, moving average order, seasonal autoregressive order, number of seasonal difference, seasonal moving average order and timeseries of cyclical pattern respectively. Graphs of AutoCorrelation Function (ACF) and Partial AutoCorrelation Function (PACF) were utilized to determine the ARIMA model [20]. The construction of an optimal model needed to think about minimum Bayesian Information Criterions (BIC) and stable multicorrelation coefficient, statistically significant parameter estimates and residuals as white noise. The ARIMA model was constructed through former forecasting errors and past series values, and developed according to the following procedures: Diagnostic checking, estimation and identification. During the identification process, the ACF and PACF of transformed information would determine seasonal and nonseasonal orders. Conditional leastsquares modes were used to estimate parameters. During the diagnosis process, white noise tests were conducted to verify the adequacy of the model in the series and check whether residuals were independently and positively distributed. In this way, a few ARIMA models would be possibly identified [21]. Finally, a suitable model would be selected to forecast morbidity.
BPANN model
As one of artificial intelligence (AI) technologies, ANN has been generally applied to fit in nonlinear models with the capability of recognizing the principles of accurate forecasting and offering help to make decisions [22]. A large number of connected nonlinear units are contained in the ANN model for data storage selflearning process [23]. Among ANN models, the BPANN model is a type of multilayered feed forward neural network.
As a system with learning ability, ANN can develop knowledge so as to exceed the original knowledge level of designers. Its learning and training methods can be divided into two types: One is supervised or tutored learning in which given sample criteria are used for classification or imitation; the other is unsupervised or untutored learning in which only learning styles or certain rules are set and the specific learning content varies with the environment (namely the situation of input signal) of the system that can automatically find the characteristics and regularity of the environment.
ANN is an implicit mathematical processing method and a typical blackbox modeling tool. In general, it is only necessary to give the input and output data of the modeling object instead of knowing its structure, parameters and dynamic characteristics. Through the training of information samples, the neural network can have the brain’s ability of memory and recognition. Without any prior formulas or modeling, the ANN can selflearn, obtain the mapping relationship between input and output from existing data, store the mapping relationship in each neuron in the form of multigroup weights and thresholds to constitute network knowledge, and use it to predict similar factors. Neural network models are widely used in signal processing, pattern recognition, control, analysis and prediction and other aspects because of their nonlinear characteristics, numerous parallel distribution structures as well as learning and inductive ability.
Three layers of the BPANN model architecture were contained in the paper, which including the input layer, the hidden layer, and the output layer. Each layer has at least one neuron, which connects to neurons in different layers. The classic structure is shown in Fig. 1. This structure is simple, clarity and can enable each neuron to establish a suitable linear or nonlinear relationship between input and output, while without limiting in output between − 1 and 1. The core of BPANN is each neuron in the input layer as an independent variable; the hidden layer is responsible for internal operations (imitating the human brain), especially nonlinear operations; each neuron in the output layer represents a dependent variable. The calculation of BPANN is to find the minimum value of the error function.
Model validation and statistical comparisons
Based on the same training set, ARIMA and BPANN models were subsequently established to forecast exclusively experimental information. The validity of these models was evaluated by cross validation. Mean Absolute Error (MAE), Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE) were used to make a statistical comparison of forecast and real morbidity.
Information analysis based on computer software
The ARIMA model was analyzed by using software SPSS26 and Eviews6.0. Neural Network Toolbox in Matlab 2019 was used to evaluate the BPANN model. All the analysis results showed significant differences, namely P < 0.05.
Data sources
According to the report on statutory infectious diseases in China, the monthly data about China’s AIDS cases reported from January 2004 to December 2016 was collected as the original data to establish the models, to predict the incidence of AIDS in 2017. Compare forecast incidence of AIDS and actual incidence of 2017, to verify the model fitting effect.
In ARIMA model, The monthly incidence of AIDS in China from 2004 to 2016 was modeled, and predicting the monthly incidence in 2017. The actual value of monthly incidence in 2017 was used as a reference to verify the model. In BPANN model, the set of information was classified into three subsets, namely training, validation and test sets. In the training set, the incidence data of the past three years was used to predict the incidence of the fourth year in validation set. The incidence rate in January of t_{1}t_{3} years was used to estimate that in January of the t_{4} year; the incidence rate in February of t_{1}t_{3} years was used to estimate that in February of the t_{4} year, and so on. Then, the incidence rate in the same month of t_{2}t_{4} years was used to predict that in the same month of the t_{5} year, the same month of t_{3}t_{5} years was used to predict the incidence rate of the same month of the t_{6} year, in turns. Finally, the data of 2017 was selected as the test set to verify network performance. All incidence data were numbered in chronological order, with P1, P2 and P13 representing respectively the incidence data in January 2004, February 2004 and January 2005, and so on. The specific data diversity is presented in the following Table 1. Such data diversity method could be adopted to better learn and train network models, and avoid overlearning and overfitting [24].
Results
Features of time series analysis in the report rate of AIDS
According to the surveillance data from January 2004 to December 2016, the figure of monthly incidence rates showed a trend of sharp increase from 2010 to 2016 the peak incidence existed in 2012 (Fig. 2). Table 2 showed the average of monthly morbidity of AIDS at the period between 2004 and 2016. The annual incidence rate was between 0.2648 and 4.0211 per 100,000 people from 2004 to 2016. Figure 3 shown that the monthly incidence of AIDS in China was cyclical. The lowest point was generally between January and February of each year, and the highest point was generally from July to December of each year. In summary, the monthly incidence of AIDS in China during the 13year period from 2004 to 2016 had been cyclical and increasing year by year.
ARIMA model
Model identification
The time series from January 2004 to December 2016 were used to establish the model for the morbidity of AIDS, which were not stationary owing to seasonality. After the natural logarithmic transformation was performed, one general difference, one seasonal difference, time plots after transformation are shown in Fig. 4. Transformed time series appeared to be quite stationary.
ACF and PACF were used to describe the characteristics of series, select models and determine the order of key points. ACF was utilized to explain the correlation of several adjacent data as the coefficient of the relationship between series and their own historical or stagnant series. When the lag = 1, it is the firstorder autocorrelation coefficient (p = 1), which indicates that there is a correlation between adjoining points; lag = 2 means the secondorder autocorrelation coefficient (p = 2), which indicates that two adjoining points are also closely related, but generally the autocorrelation coefficient in ACF does not exceed 2. The ACF in Fig. 5a shows that the autoregressive value broke through the wireframe of confidence interval only when lag = 1, indicating that the series had a high correlation within the first order. PACF was to test whether the partial correlation coefficient of each order was statistically significant one by one from higher to lower order until the first one was significant. The order of coefficients of PACF determines the highest order in the model. As shown in the PACF diagram (Fig. 5b), the partial regression coefficient exceeded the confidence interval when lag = 1 and 2, indicating the feasibility of modeling within two orders. Therefore, this study considered that the partial regression coefficient decrease sharply after lag = 1, so neglected lag = 2.
As displayed from Fig. 5a and b, the model was initially determined as ARIMA(p, d, q) × (P, D, Q) s (General Multiplicative Seasonal Model). Since one general difference (d = 1) and one seasonal difference (D = 1) were performed in data preprocessing, ARIMA(p,1,q) × (p,1,q) _{12} models with all order combinations for all autocorrelation delay coefficients p ≤ 1(P ≤ 1) and moving average delay coefficients q ≤ 1(Q ≤ 1) were selected as primary models.
All primary models were used to simulate and model the monthly incidence of AIDS. The statistics, BIC and parameter estimates of the models obtained are shown in Table 3. The table selected stationary Rsquared and BIC with the relatively smallest value, and the model whose residual was white noise was the optimal one. According to the minimum BIC = 6.091 and white noise test for residual errors, LjungBox Q [18] =13.909, P > 0.05, which indicated that goodnessoffit considered ARIMA (0,1,1) × (0,1,1)_{12} as the most suitable model.
Forecast analysis with ARIMA
ARIMA(0,1,1) × (0,1,1)_{12} was used to predict the monthly incidence of ADIS from January to December 2017. The results are shown in Table 4. It can be seen from Fig. 6 that the change trend of monthly incidence fitted by this model was basically consistent with original data, and the fitting effect was satisfactory. With the extension of prediction time, 95% confidence interval of predicted value would widen and the accuracy of predictions saw a gradual decline, which was consistent with the conclusion of XiaoMei M [25] and LiPing R [26].
BPANN model
The set of information was divided into training, test and validation data sets in the ARIMA model. The BPANN model was established by Matlab 2019 to predict the incidence of HIV/AIDS in China in 2017.
Network architecture
The BPANN modeling process has the following three steps:
1) original data was divided into three data sets, namely training, validation and test sets. The training set was used to train models and select the optimal network; the verification set was utilized to monitor the entire network training process; the test set was applied to verify the performance of the selected optimal network model. In network training, training and validation sets are usually selected to enter the network alternately in order to avoid overfitting, which means that established network models explain not only the variation of the observed population but also the fluctuations and errors of individual samples in the population [24].
2) After centralized training, repeated learning, forward and backward propagation of information, and continuous adjustment of network weights, the mean square error (MSE) of validation set would be minimized or reach a predetermined number of iterations [27, 28].
3) As a set of data coming from the same whole with training and verification sets and failing to enter network training, the test set can be used to evaluate established network models to obtain objective and extrapolative effective results.
In the training set of this model, the incidence data of the past three years was used to learn the incidence of the fourth year, such as the incidence rate in January of t_{1}t_{3} years was used to estimate that in January of the t_{4} year, then the incidence rate in the same month of t_{2}t_{4} years was used to predict that in the same month of the t_{5} year, in turn. With such data diversity method could be adopted to better learn and train network models, and avoid overlearning and overfitting.
After dividing data into three sets, network parameters are set up, such as number of network layers, nodes and iterations, the allowable error, and the learning algorithm used.
After the data set has been partitioned, the number of network layers, number of neural nodes, number of iterations, allowable error, learning algorithms and other network parameters of the model should be set before starting training.
To determine the number of network layers. A study by Robert HechtNielsen in 1989 has shown that the feedforward network of a hidden layer can map continuous functions within all closed intervals [29]. A threelayer BP network model can complete any mapping from n to m dimensions. More than two hidden layers should only be considered when learning discontinuous functions. As long as the number of nodes in the hidden layer can be reasonably selected, the BP network model of a hidden layer has also strong nonlinear mapping capability, fast training speed, and good convergence ability. Hence, a threelayer BP network model was selected and a hidden layer was adopted in this study.
To determine the number of neurons (also called nodes) in each layer. The number of nodes in the input layer and the output layer is generally determined according to the data characteristics of the study. In this study, according to the data diversity and the predicted monthly incidence rate, the number of nodes in the input layer is 3, and the number of nodes in the output layer is 1. The number of nodes in the hidden layer has a certain influence on the performance of the neural network model. Too few neural nodes will cause small learning capacity, and failure to completely learn samples and laws of sample storage; Too many neural nodes will cause the network to be bloated, so that the learning speed may slow down and the irregular parts (such as white noise) of sample data may be stored into the network, resulting in poor network performance and generalization ability. At present, the number of nodes in the BPANN hidden layer is almost calculated and estimated by the empirical formula. Based on the literature review, this study uses four formulas and two empirical formulas that are the most commonly used to infer the approximate number of neural nodes in the hidden layer, and the formulas are as follows:
where M represents the number of input layer nodes; N represents the number of output layer nodes; “m” represents the number of hidden layer nodes; “a” is the regulation constant with values between 1 and 10. In this study, the number of nodes in hidden layer ranges from 3 to 12.
Select the learning algorithms and structures, initialize the model. Matlab provides 10 (a total of 11) BP neural network model learning algorithms, including LevenbergMarquardt algorithm (Trainlm), One Step Secant (OSS) algorithm (Trainoss), conjugate direction algorithm (Ttrainscg), PolakRibiere algorithm (Traincgp), FletcherReeves algorithm (Traincgf), resilient BP algorithm (Trainrp), selfadaptive learning rate algorithm (Traingda and Traingdx), gradient descent with momentum (Traingdm) and batch gradient descent training function (Traingd). Among them, the LevenbergMarquardt algorithm, the LM algorithm for short, is the most widely used nonlinear least square algorithm at present because of its fast convergence speed.
In this study, three years of data were randomly selected from the data set of the monthly incidence of AIDS. After the normalization of data by the PRESTD function, estimated from 3 to 12 nodes in the hidden layer and above 11 algorithms were used to combine into the neural network models of 110 structures. Small sample data was input, and each structure was iterated 20 times to calculate their MSEs respectively. The smaller MSE was, the better the fitting effect of the network model would be and the closer the neural network prediction would be to the real value. The combination of the structure and algorithm of the minimum MSE was shown in Table 5. The combined BP neural network model with the minimum MSE = 0.001863 was the optimal model, and the optimal learning algorithm was the LM algorithm.
Forecast analysis with BPANN
The BP neural network fitting curve for the incidence of HIV/AIDS in 2017 was obtained by inputting the test set into the trained BPANN and using the stored black box to operate network models (Fig. 7 and Table 6 of fitted value). By comparing the predicted value with the actual incidence, the fitted value of the BPANN model was very close to the actual monthly incidence of AIDS.
Comparative analysis
This study mainly compared and evaluated the prediction effects of the ARIMA time series model and BPANN model of the following three error evaluation indicators. In Table 7, the observed values were compared with the predicted ones in a pointtopoint manner. The modeled MSE, MAE and MAPE in the ARIMA model were 0.0020, 0.0301 and 22.4638 respectively. However, three residuals in the BPANN model were 0.0019, 0.0129 and 1.2139 respectively.
When the morbidity of HIV/AIDS from 2004 to 2016 was set as the original data, models were established to forecast the morbidity of AIDS in 2017. The predicted incidence of AIDS was compared with the actual incidence of AIDS in 2017 so as to verify the fitting effects of models. Ultimately, the ARIMA (0,1,1) (0,1,1)_{12} structure was considered to be the most suitable time series model with white noise testing LB [18] = 13.909, P > 0.05, which meant that the model was effective. In the model, error parameters were MSE = 0.0020, MAE = 0.0301 and MAPE = 22.4638. The selected BP neural network model was seen as the optimal one with the LM algorithm. In the model, MSE iterated 16 times was 0.0019, MAE was 0.0129 and MAPE was 1.2139. The fitting error of the BPANN model was significantly smaller than that of the ARIMA model while its forecasting accuracy was higher than that of the ARIMA model [30,31,32]. It was seen that the BPANN model was more effective in predicting the morbidity of AIDS in China.
In Fig. 8, the BPANN model had a fit value closer to the true value compared with the ARIMA model. Both prediction methods could be adopted to predict the incidence of AIDS in China. In terms of prediction accuracy, the BPANN model would be more suitable. The BPANNmodel could better improve forecasting duration than the ARIMA model. In this study, both methods just took into account the temporal variations of time series. However, the BPANN model was a nonlinear model, whose prediction accuracy could be enhanced by adjusting more dimensional inputs and development space was larger than that of the ARIMA model.
Discussion
Monitoring the prevalence of infectious illnesses is of great importance for conventional health education. The prediction of anticipated AIDS cases will not only detect outburst conditions or report the possibility of outburst cases, but also help decisionmakers to know about possible future change trends and past and present data [33].
Both ARIMA and BPANN models were based on the time series data prediction method with which the time series was extrapolated to the future through special development principles. In the model, morbidity could be predicted as special risk factors were not involved. Without complex transformations or additional alternative variables, autocorrelation, seasonal variations and secular change trends in the ARIMA model could be simply managed through seasonal functions, moving average, autoregression and difference. As long as the suitable model was established, it would be possible to predict anticipated cases at a given time interval in the future [34].
Both models were capable of predicting the expected cases of AIDS. It was seen that both ARIMA and BPANN models could be used to predict the monthly incidence of HIV/AIDS, but the fitting and forecasting effects of the nonlinear BPANN model were superior to those of the traditional linear ARIMA model. First, the modeling method of the BPANNmodel was simpler than that of the ARIMA model, while it was unnecessary to set up a complicated mathematical model or understand its mathematical structure and the correlation between variables. Second, the ANN was able to compute and deal with data spontaneously through a number of simple units. It was much better to fulfill the works that were involved with pattern recognition. The professional idea was compared with traditional statistics to significantly improve the precision accuracy in neural networks. The ARIMA model might be more suitable for making shortterm forecast analysis because of a gradual decline in its longterm prediction effect. As a whole, the nonlinear BPANN model forecasting the morbidity of AIDS in China was the most appropriate way for complicated dynamic and nonlinear systems [35]. Therefore, multidimensional inputs in the BP neural network would be gradually improved to find out the best model and accurately make predictions. It will be very promising in future [36].
Conclusions
In summary, an agreement was further reached that the BPANN model was a suitable way to monitor and predict the change trend and morbidity of AIDS in China. According to the prediction results, more health investments would be made during outburst periods while fewer investments would be made during lowrisk periods, which thus improved intervention effect and source scheduling.
Limitations
Several limitations still exist in this study. First of all, time series analysis was carried out without considering the factors affecting the incidence of AIDS, such as production methods, social environment, epidemic variations and humanities.
Secondly, the research objects were required to remain relatively constant in prediction models during the whole process. Meanwhile, diversified infection channels and disease prevalence would be generated for a variety of people under distinct living conditions. In local places, it was necessary to relearn and train prediction according to local conditions.
Thirdly, the BPANN model under blackbox testing would affect the possibility of extrapolation beyond its training information and the fulfillment of subjective initiatives by operators during the process of BPANN analysis.
Availability of data and materials
The data that support the findings of this study are available from China’s Statutory Infectious Disease Report of National Health Commission of the People’s Republic of China, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.
Abbreviations
 HIV:

Human Immunodeficiency Virus
 AIDS:

Acquired Immune Deficiency Syndrome
 UNAIDS:

Joint United Nations Programme on HIV/AIDS
 EPP:

Estimation and Projection Package
 AEM:

Asian Epidemic Model
 ARIMA:

AutoRegressive Integrated Moving Average
 ANN:

Artificial Neural Network
 BPANN:

Back Propagation Artificial Neural Network
 ACF:

AutoCorrelation Function
 PACF:

Partial AutoCorrelation Function
 BIC:

Bayesian Information Criterions
 AI:

Artificial Intelligence
 MAPE:

Mean Absolute Percentage Error
 MAE:

Mean Absolute Error
 MSE:

Mean Square Error
References
Gottlieb MS, Schroff R, Schanker HM, et al. Pneumocystis carinii pneumonia and mucosal candidiasis in previously healthy homosexual men: evidence of a new acquired cellular immunodeficiency [J]. N Engl J Med. 1981;305(24):1425–31.
World Health Organization. Global Health Observatory data repository [DB/OL]. http://www.who.int/hiv/data/en/. Accessed 24 June 2020.
Huang M B, Ye L, Liang B Y, et al. Characterizing the HIV/AIDS Epidemic in the United States and China [J]. Int J Environ Res Public Health, 2015, 13(1):30. doi: https://doi.org/10.3390/ijerph13010030.
WHO. Evaluation Report on China HIV/AIDS Epidemic 2011[J]. Chin J AIDS STD. 2012;18(01):1–5.
Murray CJ, Ortblad KF, Guinovart C, et al. Global, regional, and national incidence and mortality for HIV, tuberculosis, and malaria during 19902013: A systematic analysis for the global burden of disease study 2013.[J]. Lancet. 2014;384(9947):1005–70.
Wang LY, Qin QQ, Ding ZW, et al. Current Situation of AIDS epidemic in China [J]. Chin J AIDS STD. 2017;23(04):330–3.
Wu Z, Wang Y, Mao Y, et al. The integration of multiple HIV/AIDS projects into a coordinated national programme in China [J]. Bull World Health Organ. 2011;89(3):227.
Liu E, Rou K, Mcgoogan JM, et al. Factors associated with mortality of HIVpositive clients receiving methadone maintenance treatment in China.[J]. J Infect Dis. 2013;208(3):442–53.
Wang L, Guo W, Li D, et al. HIV epidemic among drug users in China: 1995–2011[J]. Addiction. 2015;110(Suppl 1(S1)):20.
Walker N, Stover J, Stanecki K, et al. The workbook approach to making estimates and projecting future scenarios of HIV/AIDS in countries with low level and concentrated epidemics [J]. Bri J Venereal Dis. 2004;80(suppl 1):i10.
Brown T, Le B, Eaton JW, et al. Improvements in prevalence trend fitting and incidence estimation in EPP 2013[J]. Aids. 2014;28(4):S415–25.
Stover J, Mckinnon R, Winfrey B. Spectrum: a model platform for linking maternal and child survival interventions with AIDS, family planning and demographic, projections [J]. Int J Epidemiol. 2010;39(Suppl 1):i7.
Lim SH, Cheung DH, Guadamuz TE, et al. Latent class analysis of substance use among men who have sex with men in Malaysia: findings from the Asian internet MSM sex survey [J]. Drug Alcohol Depend. 2015;151:31–7.
Stover J. Projecting the demographic impact of AIDS and the number of people in need of treatment: updates to the Spectrum projection package [J]. Sexually Transmitted Infections. 2006;82(suppl_3):iii45–50.
Tuhuma T, Gideon K, Japhet K, et al. Estimating and projecting HIV prevalence and AIDS deaths in Tanzania using antenatal surveillance data [J]. BMC Public Health. 2006;6(1):120.
Sharma SK, Kadhiravan T. Management of the Patient with HIV Disease[J]. Disease A Month. 2008;54(3):162–95.
Earnest A, Tan SB, Wildersmith A, et al. Comparing statistical models to predict dengue fever notifications.[J]. Computational and Mathematical Methods in Medicine, ,2012,(201238). 2012;2012(1):758674.
Yolcu U, Egrioglu E, Aladag CH. A new linear & nonlinear artificial neural network model for time series forecasting [J]. Decis Support Syst. 2013;54(3):1340–7.
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators.[J]. Neural Netw. 1989;2(5):359–66.
Yu HK, NaYoung K, Soon KS, et al. Forecasting the number of human immunodeficiency virus infections in the Korean population using the autoregressive integrated moving average model [J]. Osong Public Health & Research Perspectives. 2013;4(6):358–62.
Wang T, Zhou Y, Wang L, et al. Using an autoregressive integrated moving average model to predict the incidence of hemorrhagic fever with renal syndrome in Zibo, China, 20042014[J]. Jpn J Infect Dis. 2015;69(4):279–84.
Guan P, Huang DS, Zhou BS. Forecasting model for the incidence of hepatitis a based on artificial neural network.[J]. World J Gastroenterol. 2004;10(24):3579–82.
Connor JT, Martin RD, Atlas LE. Recurrent neural networks and robust time series prediction[J]. IEEE Trans Neural Netw. 1994;5(2):240–54.
Mcguire V, Nelson LM, Koepsell TD, et al. Assessment of occupational exposures in communitybased CASEcontrol studies [j]. Annu Rev Public Health. 1998;19(1):35–53.
Xiao M, Xu Q X, Shi ZX, et al. Application of ARIMA model in predicting monthly incidence of syphilis[J]. Journal of Xi'an Jiaotong University (Medical Sciences). 2018;39(1):131–134,152.
Li PR , Ming L , Xin GY , et al. The prediction of the Japanese encephalitis invasion based on the ARIMA model in Guizhou in 2017[J]. Modern Preventive Medicine. 2018;45(08):1349–53.
Roman J, Jameel A. Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns [C]// System Sciences, 1996, Proceedings of the TwentyNinth Hawaii International Conference on. IEEE, vol. 2; 1996. p. 454–60.
Chua CG, Goh ATC. A hybrid Bayesian backpropagation neural network approach to multivariate modelling [J]. Int J Numerical Analytical Methods Geomechanics. 2003;27(27):651–67.
Robert HN. Theory of the backpropagation neural network[J]. Proc. 1989 IEEE IJCNN. 1989;1:593–605.
Rathnayaka RMKT, Seneviratna D, Jian GW, et al. A hybrid statistical approach for stock market forecasting based on artificial neural network and ARIMA time series models[C].//2015 International Conference on Behavioral, Economic and Sociocultural Computing (BESC). IEEE. 2015:54–60.
Chuang FK, Hung CY, Chang CY, et al. Deploying Arima and artificial neural networks models to predict energy consumption in Taiwan [J]. Sens Lett. 2013.
Lewis CD. Industrial and business forecasting methods : a practical guide to exponential smoothing and curve fitting [M]; 1982.
Fan YG, Wang J, Su H, et al. Prediction on the number of HIV with models of ARIMA and GM(1,1)[J]. Chin J Control Prev. 2012;12:1100–3.
Luo J, Yang S, Zhang Q, Wang L. ARIMA model of time series for forecasting epidemic situation of AIDS [J]. Chongqing Med. 2012;13:1255–9.
Jain A, Srinivasulu S. Development of effective and Efficientra infall runoff models using integration of deterministic,real coded genetic algorithms, and artificial neural networktechniques. Water Resour Res. 2004;40:W04302.
Ran L, Ma N. Comparison of four AIDS epidemic estimation and models [J]. Chin J AIDS STD. 2012;5:347–50.
Acknowledgments
We wish to thank the timely help given by The Proofreading Team in improving the language of this article.
Funding
This work was supported by the National Natural Science Foundation of China [No.81260450].
Author information
Authors and Affiliations
Contributions
ZL contributed significantly to analysis and manuscript preparation; YL contributed to the conception of the study and helped perform the analysis with constructive discussions. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, Z., Li, Y. A comparative study on the prediction of the BP artificial neural network model and the ARIMA model in the incidence of AIDS. BMC Med Inform Decis Mak 20, 143 (2020). https://doi.org/10.1186/s12911020011573
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911020011573
Keywords
 AIDS
 Prediction
 BP artificial neural network model
 ARIMA model