A particle swarm optimization improved BP neural network intelligent model for electrocardiogram classification

Background As proven to reflect the work state of heart and physiological situation objectively, electrocardiogram (ECG) is widely used in the assessment of human health, especially the diagnosis of heart disease. The accuracy and reliability of abnormal ECG (AECG) decision depend to a large extent on the feature extraction. However, it is often uneasy or even impossible to obtain accurate features, as the detection process of ECG is easily disturbed by the external environment. And AECG got many species and great variation. What’s more, the ECG result obtained after a long time past, which can not reach the purpose of early warning or real-time disease diagnosis. Therefore, developing an intelligent classification model with an accurate feature extraction method to identify AECG is of quite significance. This study aimed to explore an accurate feature extraction method of ECG and establish a suitable model for identifying AECG and the diagnosis of heart disease. Methods In this research, the wavelet combined with four operations and adaptive threshold methods were applied to filter the ECG and extract its feature waves first. Then, a BP neural network (BPNN) intelligent model and a particle swarm optimization (PSO) improved BPNN (PSO-BPNN) intelligent model based on MIT-BIH open database was established to identify ECG. To reduce the complexity of the model, the principal component analysis (PCA) was used to minimize the feature dimension. Results Wavelet transforms combined four operations and adaptive threshold methods were capable of ECG filtering and feature extraction. PCA can significantly deduce the modeling feature dimension to minimize the complexity and save classification time. The PSO-BPNN intelligent model was suitable for identifying five types of ECG and showed better effects while comparing it with the BPNN model. Conclusion In summary, it was further concluded that the PSO-BPNN intelligent model would be a suitable way to identify AECG and provide a tool for the diagnosis of heart disease.


Background
Electrocardiogram (ECG) is a bio-electricity signal with low frequency and weak amplitude, objectively reflect the work state of the heart and the physiological situation, provides important information in the assessment of human health, especially for heart disease [1]. ECG is proven as the most accurate method to analyze and diagnose all kinds of arrhythmia [2]. People with cardiovascular disease usually tend to have abnormal heart rhythms in the early stages [3,4]. If detected in real-time and find the type of abnormal heart rhythm, proceeding early warning and targeted treatment have important implications for prevention [5]. Nowadays, ECG has become a basis detection index in the clinic. In reality, ECG is one of the leading tools to assess the extent of cardiac involvement in COVID-19 patients [6]. It is of great significance to correctly identify ECG. The accuracy and reliability of ECG decision depend to a large extent on feature extraction. However, it is often uneasy or even impossible to obtained accurate features, for the ECG detection process is very easily disturbed by the external environment, and abnormal ECG (AECG) has many species and great variation. Even the same AECG of different patients also have differences. Therefore, it becomes the focus of ECG research to recognize ECG and reach the purpose of early warning or real-time disease diagnosis. Developing an intelligent classification model with an accurate feature extraction method is of quite significance to identify AECG. For ECG signal filtering, even though there is a corresponding filter bank in the system to filter out the noise in the acquisition process, the hardware denoising has certain limitations. Software de-noising mainly include three kinds of method, i.e., designing a digital filter, wavelet filter, and neural network and mathematical morphology [7][8][9]. In general, digital filters is difficult to design, has relatively poor execution efficiency and larger calculation. The filtering effect of the neural network is easily affected by the characteristic waveform and operation longer. For feature extraction, the detection of other characteristic waves depends on the premise of accurate detection of QRS. The method for detecting the QRS wave mainly includes mathematical morphology, difference threshold method, template matching method [10][11][12]. The plate matching method is based on the amplitude-frequency characteristics of the signal. The differential threshold method is simple and fast, but the detection accuracy is relatively low. The accuracy of mathematical morphology pretreatment is very high, but with complicated calculation. Overall, wavelet presents good results in signal filtering and feature extraction [13].
After feature extraction, there is remaining a big challenge for researchers to develop an intelligent and reliable system to recognize the AECG. Researchers have studied the classification method and or with a feature deduce algorithm to build a model and reduce its complication. While most ECG diagnostic systems established can not get the diagnosis result but only plays an auxiliary diagnostic role due to the complexity of ECG and difference of similar ECG. In a dynamic ECG, the classification of arrhythmia types of ECG signals has not achieved the expected real-time recognition and accurate recognition to meet clinical requirements. The analysis and recognition of ECG still need further study. A pressing problem remains in the development of an accurate method to classify AECG. In recent years, deep learning has developed rapidly, has achieved considerable progress in the research fields such as image and speech processing [14,15]. Many scholars have begun to explore the application of deep learning methods to the detection of atrial fibrillation (AF) or other arrhythmia classification, which shows superior performance [16][17][18][19][20][21][22][23][24]. By using convolutional neural network (CNN), the classification accuracy only up to 83%, there remains the problem that further handle the imbalance problems of CNN frameworks and accuracy further improve [25]. The ECG classification accuracy of the model based on adversarial domain adaptation reaches 92.3% [26]. Some scholars found that the classification effect of the support vector machine method is poor, the training of linear discriminant analysis is easy to overfit, and the training time of deep learning algorithms is too long [27][28][29]. Scholars' study shows that building a back-propagation neural network (BPNN) model to classify AECG, the classification accuracy is only 72.27% [30], is not suitable for detecting cardiovascular disease as the CNN. However, BPNN has strong self-learning ability and high classification and recognition ability has been used in predicting the expected cases of acquired immune deficiency syndrome (AIDS) and shows good fitting and forecasting effects [31]. And improved BPNN also shows good effects in hand-motion recognition and water temperature forecasting [32,33]. In this way, BPNN improved can be suitable to establish a classification model for detection AECG and cardiovascular disease, but how to design the ECG classifier based on BPNN with superior performance is worth further study.
BPNN is a multi-layer feed-forward neural network, consists of an input layer, hidden layer, and output layer. It belongs to supervised learning with the main characteristics of signal forward propagation and the error backpropagation. In general, it utilizes the difference between a theoretical value and an experimental value as a supervised signal, with an error generated in response when the output differs from that expected [34]. BPNN has the ability of simplicity, robust learning capability, and good solutions for nonlinear problems. However, it has the disadvantage of sensitivity to the weight of the initial network, is prone to a local minimum with slow convergence. And also, its lack of theoretical guidance and the selection of training samples will affect the generalization performance of the classifier [35]. Therefore, a lot of improved algorithms emerged to cope with the practical application. It mainly includes two kinds of ways, that are heuristic learning algorithm and optimization algorithm. Among them, the first method is simple and easy to use, but the performance characteristics are not very easy to set up. The other has a lot to improve the convergence speed but increased the complexity of network computing. In this study, we choose the first method. particle swarm optimization (PSO) algorithm originated from the research on artificial intelligence and the hunting behavior of birds [36]. Based on the global search strategy of the population, PSO is optimized through cooperation and competition among particles of many populations. Nowadays, PSO has been widely used in many fields, for example, function optimization, image processing, and so on [37]. Raj Sandeep and Garcia Gabriel, et al. have researched cardiac arrhythmia beat classification using PSO tuned support vector machine (SVM) and got an accuracy of 89.10% for five classes [13,38]. Liu Zhishuai has used two-dimensionality reduction methods principal component analysis (PCA) and time window selection to get better performance in classify ECG [39]. In theory, the advantage of PSO can fill a gap of BPNN, and can be suitable to classify ECG. In this study, we mainly aimed to explore an accurate de-noising and feature extraction method of ECG based on a wavelet and perform intelligent modeling to classify AECG based on PSO optimized BPNN with combining the advantage of BPNN and PSO. With the consideration of the training time of deep learning algorithms is too long [27][28][29], the feature dimension reduction are also under consideration to reduce the complexity of the model to save class time and up accuracy.

BPNN model
To a neural network (NN) model with only one hidden layer, the process of BPNN is mainly divided into two stages. The first stage is the forward propagation of the signal, which passes through the hidden layer from the input layer to the output layer. The second stage is the error backpropagation, from the output layer to the hidden layer, and finally to the input layer, in turn, adjust the weights and bias of the hidden layer to the output layer, weights, and bias of the input layer to the hidden layer. The BP learning algorithm adjusts the weight along the direction of the negative gradient, which refers to the direction in which the function goes down the fastest [40]. The learning process of BPNN is shown in Fig. 1. The weight value is revised by Eq. (1).
where x k is weight and threshold matrix, g k is the gradient of the function, a k is the learning rate.
The derivation analysis process is as follows [41]. Define x i as input layer vector, y j as hidden layer vector, z l as output layer vector, ω ji as the weight vector between the input layer and hidden layer, ν lj as the weight vector between the hidden layer and output layer. When the prospect output vector is t 1 , the output vector of the hidden layer and the output layer is given as Eqs. (2) and (3): Then, the error between the expected output value and the actual output value is given as Eq. (4): The derivative of the error function concerning the output vector is given as Eq. (5): The derivative of the error function concerning the hidden vector is given as Eq. (6): The weight correction function is given as Eqs. (7) and (8): The supervised learning process of BP neural network in output node z l backpropagation to hidden layer node y j through weight ν lj . The corresponding threshold correction formula is given as Eqs. (9) and (10): where η , η ′ represents the learning rate of the hidden layer, the output layer, respectively. When the transfer function is a binary type . Therefore, f ′ (net l ) and f ′ (net j ) can be given as Eqs. (11) and (12):

Modeling process of BPNN
An artificial neural network (ANN) is a new cross-discipline, which is a nonlinear information processing system developed to simulate the structure and function of the human brain. A BPNN is a multilevel NN with anticipation. The most common transfer function adopted by BPNN is nonlinear transformation function (sigmoid function) in layers before the output layer, while linear function in the output layer. In a BPNN, the signal is transmitted via forwarding propagation and error by backward propagation. The BPNN is part of a multilayered network. The most widely used ANN, the BPNN, comprises an input layer, a hidden layer, and an output layer. In this study, the structure of the BPNN model includes multi-hidden layers, as shown in Fig. 2. The process of modeling by BPNN is as follows: We adopt a wavelet of common different wavelet basis functions to decompose the collected ECG signals to layer 8, to obtain the corresponding detail coefficient and approximate coefficient. From the wavelet principle, the detail coefficients of layer 1 and 2 include most of the high-frequency noise, and the approximate coefficients of layer 8 include baseline drift [42]. Therefore, we set the detail coefficient of layer 1-2 to 0 and the approximate coefficient of layer 8 to 0 to eliminate the noise. After corresponding wavelet reconstruction, we can obviously get the de-noising signal with no high-frequency noise and baseline drift. To evaluate the effects of filtering, we adopt indexes include minimum mean square error (MSE, smaller is better), signal-to-noise ratio (SNR, the closer to 1, the better), and waveform with no change to evaluate the filtering results. Based on the Matlab2019b and the first 10-s wave of record 100 ECG from the MIT-BIH arrhythmia database, the filter effects result of a wavelet of common wavelet basis function, as shown in Table 1. From the results, we found the wavelet of the sym2 basis function is the best to get smooth and pure ECG with keeping the original information, and the D-value of MSE is negligible than that of sym8.

Heartbeat segmentation and data integration
As recognition of ECG mainly depends on the time difference and amplitude of the feature waves and heart rate [43,44]. In AECG, there is always an abnormal heartbeat arise, which contains effective information for the diagnosis of heart disease. In this way, we performed cardiac beat segmentation to the collected data after filtering, according to the label. Then, combining the same labeled beats to form the type of ECG data to further treatment. The heartbeat of four different types of AECG collected was shown in Fig. 3. The Fig. 3a is the heartbeat of the ventricular premature beat (Vpb), Fig. 3b is the heartbeat of right bundle branch block (Rbbb), Fig. 3c is the heartbeat of atrial premature beats (Apb), Fig. 3d is the heartbeat of left bundle branch block (Lbbb).
Through the heartbeat segmentation, the number of five different types of ECG we collected from 23 records (records 100-233, each 30minutes) are 4615 (Lbbb), 4347 (Rbbb), 1596 (Vpb), 3016 (Apb), and 23,826 (Nb, the heartbeat of normal ECG), respectively. To minimize the calculated amount and uniform the number of beats, we selected the first 30 heartbeats into two groups to further study.

Feature extraction
The time difference and amplitude of the characteristic waves and heart rate of ECG are the basis of diagnosis. ECG mainly includes P wave, QRS complex wave, T wave, and U wave. The normal heart rate is between 60 and 100 bpm. The P-wave represents the potential change of atrial depolarization. The PR interval represents the time when the atrium begins to depolarize, is from the beginning of the P-wave to the beginning of the QRS group. The QRS group represents the potential change of ventricular depolarization. ST-segment is the line segment from the end of the QRS group to the beginning of the T-wave, represents the process of slow ventricular repolarization. T-wave represents the potential change during rapid ventricular repolarization. QT interval represents the time required for the whole process of ventricular depolarization and repolarization, is from the beginning of the QRS wave group to the end of the T-wave. U-wave is right after the T-wave, represents the potential of ventricular follow-up. Specific changes in ECG always occur during the proceeding of disease. In clinical diagnosis and treatment, the index used to diagnosis whether the ECG is normal or not mainly includes heart rate, PR interval, QRS interval, and so on.
To obtain the diagnosis basis of ECG, accurate detection of QRS is the premise of feature extraction. To detection of QRS, the R-wave needs to be addressed first. In this study, we present a wavelet of type sym2 combined with four operations and adaptive threshold methods to perform feature extraction. We found that the energy of the R-wave is mainly contained in the detail coefficient of layer 3-5 while observing the ECG decomposed into 8 layers with wavelet sym2. Therefore, the decomposed signals at layers 3, 4, and 5 will be used for reconstruction. R-wave is oscillatory, it is still difficult to detect it after QRS composite wave is detected. Through further analysis, we found that the negative energy in the position of the Q-wave of layer 5 nearly counteracts the positive energy in the same position of layers 3 and 4. And the energy of layers 3 and 5 together nearly has the same direction of energy in layer 4. Thus, we infer that if add layers 3-5 together, the positive energy in the R-wave position can be concentrated while canceled Q-wave and S-wave that are not currently considered. And if the addition results of layers 3 and 5 multiply layer 4, there will be only positive energy left concentrated in the position of R-wave. Through experiments, we found that it is useful for making four operations to the decomposed signals at layers 3-5 to enhance R-wave by Eqs. (13) and (14): where d 3 represents the decomposed signals in layer 3, d 4 represents the decomposed signals in layer 4, d 5 represents the decomposed signals in layer 5.
After the four operations, the energy of the R-wave enriched and other characteristic waves eliminated. In this way, only the position value of the R-wave is retained. To avoid R-wave missing and false detection, an adaptive threshold to extract R-wave was used as follows: firstly, set a fixed window with width and step length of 215 Epochs. Then, obtain the maximum value within the window and use the 60% of the maximum value as the threshold value. Lastly, sliding the window to extract the R-wave. To evaluate the extraction effect of R-wave, we use sensitivity (Se) and precision (P) to perform, by using Eq. (15). The results tested by using the first group collected data of five different types of ECG were shown in Table 2. For the Se and P of the five types of ECG are all over 93%, we think wavelet of type sym2 combined with four operations and adaptive threshold method is suitable to extract R-wave. Same as the R-wave extraction, we located all other feature waves like P-wave, Q-wave, S-wave, and T wave. Then, the time difference and amplitude of the characteristic waves and heart rate can be (13)

Fig. 4 Flow chart of feature wave location
After all the character waves extracted, we get the feature vector consists of the time difference, amplitude, and heart rate extracted from ECG, show as Eq. (16).
where PQ represents the time difference (TTD) between the peak values (TPV) of P-wave and Q-wave, PR represents TTD between TPV of P-wave and R-wave, PS represents TTD between TPV of P-wave and S-wave, PT represents TTD between TPV of P-wave and T-wave, QR represents TTD between TPV of Q-wave and R-wave, QS represents TTD between TPV of Q-wave and S-wave, QT represents TTD between TPV of Q-wave and T-wave, RS represents TTD between TPV of R-wave and S-wave, RT represents TTD between TPV of R-wave and T-wave, ST represents TTD between TPV of S-wave and T-wave, ampP represents TPV of P-wave, ampQ represents TPV of Q-wave, ampR represents TPV of R-wave, ampS represents TPV of S-wave, ampT represents TPV of T-wave, H represents the heartbeat of ECG.
Through the above procedure, the feature of 30 sets of each type of ECG was extracted. Set the feature data of each type of ECG into two groups, and then combine each group as BPNN modeling training data set and test data set. Before modeling, the data normalized. Then the BPNN model of classifying AECG was established through the process of modeling by BPNN.

PSO-BPNN model PSO algorithm
PSO algorithm originated from the research on artificial intelligence and the hunting behavior of birds [45]. Based on the global search strategy of the population, the PSO algorithm is optimized through cooperation and competition among particles of many populations [46]. Nowadays, PSO has been widely used in many fields such as function optimization, image processing, prediction, and so on, due to its simple operation, fast convergence, and global optimization capability [47,48]. The particle swarm optimization algorithm described as follows [49]: (16)  In an n-dimensional search space, m particles are forming a population X = (X 1 , X 2 , . . . , X m ) T , the position of the i-th particle is X i = (X i1 , X i2 , . . . , X in ) T , the speed The global extreme value of the population is P g = P g1 , P g2 , . . . , P gn T . After finding the individual extreme value and the global extreme value, the particle updates its speed and position respectively according to Eqs. (17) and (18).
where c 1 and c 2 both are non-negative constants, which are called learning factor, rand() is a random constant between (0,1), V k id and X k id are the velocity and position values of particle i in the d dimension in the k-th iteration respectively, P k id is the position of the individual extremum of particle i in the d-dimension, P k gd is the position of the global extreme value of the group in the d dimension.

Initial weights and thresholds optimization of BPNN
In this study, we use the PSO algorithm with an introduced speed adjustment factors to optimize the initial weight and threshold of BPNN and the conventional BP neural network model. In this way, the optimized BPNN model will have both the global optimization ability of the PSO algorithm and the local search ability of the BP algorithm.
To accelerated convergence and balance the global velocity versus local velocity of the particle, introduce inertia weight w and shrinkage factor k into the above PSO algorithm. The particle updates its speed according to Eq. (19) And we use the newff function to create network objects: where PR is R × 2 matrix to define the minimum and maximum values of R input vectors, S i express the number of layer i neurons, TF i is the transfer function of layer i , the default function is tansig, BTF is the training function, the default function is trainlm function, while the traingdx function is more applicable to pattern classification, BLF is the weight/threshold learning function, default function is the learngdm function, PF is the performance function, the default function is the MSE function. The training net and parameters are shown as follows: net = newff(pr, [5,16] The steps of the PSO algorithm optimizing the conventional BPNN is shown in Fig. 5. According to the modeling process of BPNN, we adopt the same data set for PSO-BPNN modeling. The initial and training parameters that used in modeling as shown in Table 3.  where ̄ ŷ i is a predictive sequence and y i is a sequence of true values.

Feature dimension reduction and impact on modeling analysis
PCA is one of the most widely used data dimension reduction algorithms. Its main idea is to map the N-dimensional feature to the K-dimension, which is a new orthogonal feature, known as the principal component, and a re-constructed K-dimensional feature based on the original N-dimensional feature [50]. In this study, to reduce the computation and network complexity in classification, we used PCA to directly reduce the dimensions of each type of ECG heartbeat information to form the feature matrix. In theory, it is applicable to choose the dimension of accumulative contribution a > 85% [51]. To keep as much information as possible, we choose the dimension of accumulative contribution a > 85% and a > 99%. Then, use the left dimension containing nearly all the information to establish a model by using BPNN and PSO-BPNN, and to analyze the impact on modeling.  where X ′ represents the number of predicted value equal to the actual value, X represents the number of actual value. From the experiments, we can obtain the accuracy of the BPNN model is only 77.33%, lower than 81% of using the convolutional neural network (CNN) [52]. The time spent is 1334 s. It is not meet the practical application demand.

PSO-BPNN model validation and performance analysis
To evaluate the PSO-BPNN model and make it a comparison to the BPNN model, we used the same data and platform. The results were shown in Fig. 7. The accuracy of the PSO-BPNN model is up to 96%. The time has been reduced to 899 s. Even the accuracy meets the practical application demand, but the time is still too long to realize a real-time diagnosis.

Dimension reduction and impact on modeling analysis
The results of dimension reduction was shown in Table 4. We can see the when a > 99% , the feature dimension reduces to half part of the original feature dimension.
Therefore, to analysis the impact on modeling time, seven feature dimensions were selected for modeling by BPNN and PSO-BPNN, respectively. The results are shown in Figs. 8 and 9.
The accuracy of the BPNN model is up to 80% and the time spent to cut down to 91 s. The accuracy of the PSO-BPNN model is up to 97% and the time spent to cut down to 65 s.

Discussion
In this study, through experiments on the same platform with the same data, the PSO-BPNN model improves the accuracy and reduces the time cost, compared with the BPNN model. It indicates that the PSO algorithm has made up the defect of BPNN, which is sensitive to the weight of the initial network and prone to a local minimum. However, the classification time is still too slow to meet the actual application. The main reason could be what scholars have found, that is training time of deep learning algorithms is too long [13,37,38]. Except for the parameter adjustment and convergence rate of the BP neural network, the data for modeling always contained some information redundancy. In reality, too many inputs always complicate the model, which is also one reason to spend time more. To further study the reason and applicability of the PSO-BPNN model in classifying the arrhythmia. PCA, one of the feature dimension reduction algorithm was used. The reduced time and improved accuracy mean that data redundancy increases the complexity of the model. The PSO-BPNN model established in this study is suitable for ECG classification.  Several limitations still exist in this study. First of all, based on our results, future work will aim to establish data communication combined with more capable computer software to form a more complete, advanced, and simple operation. In the NN model, the study of the impact of the learning rate on the NNs has been insufficient. In general, if the learning rate is overly high, the learning process will be unstable. On the contrary, if the learning rate is too low, training will take an extended length of time. Secondly, the selection of the appropriate learning rate is the next step in optimizing the NN. The classification precision and time are analyzed qualitatively according to the given error. In future work, more test data will be added for model training to obtain a more perfect, more accurate, and more rapid soft measurement model of ECG classification.

Conclusions
Algorithm selection has a significant impact on the accuracy of classifying results. This study employs wavelet transform combined with four operations and adaptive threshold method to perform filtering and feature extraction of ECG. Then, employ a BPNN algorithm for analysis and classification of ECG, for BPNN can make use of several optimization methods. To make up for its defect, the PSO optimization algorithm employed is extensive and expected to enhance prediction result accuracy, reduce errors incurred during experimentation owing to flawed model design, and hinder the final determination of ECG. The experiments manifest that the PSO optimized BPNN intelligent model indicating greater accuracy and better classification results than that of the conventional BPNN model.
Above all, to analyze the reason for a long time, PCA was adopted to minimize the feature dimension to re-evaluate the performance of the BPNN and PSO-BPNN model. The results show that the PCA algorithm can effectively extract the key feature dimensions and minimize the complexity of modeling. The BPNN and PSO-BPNN intelligent classification model spent less classification time but with higher accuracy. The PSO-BPNN intelligent classification model shows better effects compared with the BPNN model when identifying the five types of ECG. In conclusion, the PSO-BPNN intelligent classification model will be a suitable method to recognize ECG and provide a tool for the diagnosis of heart disease.