Atrial fibrillation classification based on convolutional neural networks

Background The global age-adjusted mortality rate related to atrial fibrillation (AF) registered a rapid growth in the last four decades, i.e., from 0.8 to 1.6 and 0.9 to 1.7 per 100,000 for men and women during 1990–2010, respectively. In this context, this study uses convolutional neural networks for classifying (diagnosing) AF, employing electrocardiogram data in a general hospital. Methods Data came from Anam Hospital in Seoul, Korea, with 20,000 unique patients (10,000 normal sinus rhythm and 10,000 AF). 30 convolutional neural networks were applied and compared for the diagnosis of the normal sinus rhythm vs. AF condition: 6 Alex networks with 5 convolutional layers, 3 fully connected layers and the number of kernels changing from 3 to 256; and 24 residual networks with the number of residuals blocks (or kernels) varying from 8 to 2 (or 64 to 2). Results In terms of the accuracy, the best Alex network was one with 24 initial kernels (i.e., kernels in the first layer), 5,268,818 parameters and the training time of 89 s (0.997), while the best residual network was one with 6 residual blocks, 32 initial kernels, 248,418 parameters and the training time of 253 s (0.999). In general, the performance of the residual network improved as the number of its residual blocks (its depth) increased. Conclusion For AF diagnosis, the residual network might be a good model with higher accuracy and fewer parameters than its Alex-network counterparts.


Background
Heart disease is the leading cause of disease burden in the world and Korea [1][2][3][4][5][6]. Cardiovascular disease accounted for the greatest part of global mortality in Year 2013 (Y2013 hereafter), i.e., 32% (17 million) of 54 million deaths in the world [1]. The global age-adjusted mortality rate per 100,000 related to atrial fibrillation (AF), the most common form of irregular heartbeat, registered a rapid growth from 0.8 to 1.6 (or 0.9 to 1.7) for men (or women) during 1990-2010 [2]. This global pattern is consistent with its local counterpart in Korea. Heart disease was the second cause of death in Korea for Y2016 (58.2 per 100, 000) [3] and the third cause of disease burden in the nation for Y2010 (562 disease-adjusted life years per 100, 000) [4]. Indeed, hospitalization for AF in the nation increased by 420% from 767 to 3986 per 1 million from Y2006 to Y2015 [5].
In the context above, an increasing amount of research has used deep neural networks to classify (diagnose) AF and other types of arrhythmia, given their superior performance compared to other machine learning methods [6][7][8][9][10][11][12]. This line of research applied convolutional neural networks (i.e., Alex, Residual) [6][7][8][9][10], their recurrent counterparts (i.e., long short term memory) [11] or both [12] to achieve the accuracy range of 80-99% with varying numbers of class numbers for electrocardiogram data. Most of these studies employed public data such as Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia data. Also, more comparison might be needed for a variety of deep neural networks with different degrees of their depth (i.e., numbers of their layers) and varying numbers of their kernels for the diagnosis of arrhythmia.
For this reason, this study used electrocardiogram (ECG) data in a general hospital and various convolutional neural networks for diagnosing arrhythmia. ECG is a graph of heartbeat in voltage versus time recorded by electrodes on the chest and the limbs. A normal ECG wave consists of five parts, i.e., P (atrial contraction), Q (downward deflection immediately before ventricular contraction), R (the peak of ventricular contraction), S (downward deflection immediately after ventricular contraction) and T (ventricular recovery) (Additional file 1: Figure S1A). Its AF counterpart shows an irregular pattern, for example, lacking a P part with an irregularly irregular QRS part (Additional file 1: Figure S1B). These ECG waves are arranged in a grid of four columns and three rows, i.e., the first column for "limb leads" (or voltage differences measured by limb electronodes) [I, II, III in Additional file 1: Figure S2], the second column for "augmented limb leads" (or voltage differences measured by limb electronodes with a different combination so called Goldberger's central terminal) [aVR, aVL, aVF in Additional file 1: Figure S2] and the last two columns for "precordial leads" (or voltage differences measured by chest electronodes) [V1-V6 in Additional file 1: Figure S2]. Based on the ECG data in a general hospital, 30 convolutional neural networks were applied and compared in this study for the diagnosis of the normal sinus rhythm (NSR) vs. AF condition: 6 Alex networks with 5 convolutional layers, 3 fully connected layers and the number of kernels changing from 3 to 256; and 24 residual networks with the number of residuals blocks (or kernels) varying from 8 to 2 (or 64 to 2).

Methods
Data came from Anam Hospital in Seoul, Korea, with 20,000 unique participants (10,000 NSR and 10,000 AF). Preprocessing processes for the ECG data are shown in Additional file 1: Figures S2A, B and C, i.e., removing the background grid, selecting target signals and getting numerical values, respectively. Here, selecting target signals consists of three sub-processes based on OpenCV functions such as connectedComponents: (1) computing connected components in a binary image with 8connectivity; (2) computing the bounding rectangle for each connected component; and (3) selecting the component whose bounding rectangle has the longest width. In Tables 1 and 2, input dimensions and the number of kernels are described for 30 convolutional neural networks (i.e., 6 Alex networks and 24 residual networks) for the diagnosis of the NSR vs. AF condition in this study: Alex 1-6 with 5 convolutional layers, 3 fully connected layers and the number of kernels changing from 3 to 256; and Residual 1-1 to Residual 4-6 with the number of residuals blocks (or kernels) varying from 8 to 2 (or 64 to 2). The original Alex network, which consists of 5 convolutional layers and 3 fully connected layers with Rectified Linear Unit (ReLU) activation functions, topped the 2012 ImageNet Large Scale Visual Recognition Challenge and demonstrated its superior performance over its traditional sigmoid activation function counterparts [13]. In convolutional layers of the Alex network, a kernel (or feature detector) slides across input data and operates "convolution", i.e., calculating the dot product of its elements and their input-data counterparts, detecting specific features of the input data, e.g., the shape of a dog's ear which differentiates it from a cat.
Then, many scholars tried to improve the original Alex network by deepening it (or adding more layers to it). However, this attempt turned out to be futile given that it brings back the old problem of gradient vanishing (the gradient of the loss with respect to the weight becomes 0 quickly) [14]. For this reason, several scholars introduced the original residual network with new features of residual mapping and shortcut connection, which managed its  [15]. Residual mapping and shortcut connection are a way of avoiding additional parameters and extra model complexity both by using simpler residual functions instead of their more complicated originals and skipping one or more layers. These methods contributed for the original residual network to achieve a lower error than and eight times as many layers as the Virtual Geometry Group network, the winner of the 2014 ImageNet Large Scale Visual Recognition Challenge, i.e., 3.57% and 152 layers, respectively. This study modified the original Alex and residual networks by changing the input, output and kernel dimensions from 224x224x3, 1000 (multi-class) and 3 (color image) to 1x2000x1, 2 (binary-class) and 2 (signal), respectively. All ECGs were reviewed manually by two cardiologists in the hospital. This retrospective study got approved by the Institutional Review Board of Korea University Anam Hospital on February 12, 2018 (2018AN0037). Informed consent was waived by the IRB given that data were de-identified.

Results
Accuracy measures, epoch numbers and training time for the 30 convolutional networks in this study are displayed in Table 3. In terms of the accuracy, the best network among Alex 1-6 was Alex 3 (0.997) with 24 initial kernels (i.e., kernels in the first layer), 5,268,818 parameters and the training time of 89 s while the best network among Residual 1-1, …, Residual 4-6 was Residual 2-2 (0.999) with 6 residual blocks, 32 initial kernels, 248,418 parameters and the training time of 253 s. Fig. 1 how the accuracy of the residual network changes as the numbers of residual blocks and initial kernels change. The performance of the residual network improved as the number of its residual blocks (its depth) increased. The results in Table 3 and Fig. 1 suggest that (1) the number of kernels might not be as significant as that of residual blocks in the case of the residual network and (2) the residual network might be a good model for AF diagnosis with higher accuracy and fewer parameters than its Alex-network counterparts.

Main finding of this study
In terms of the accuracy, the best Alex network was one with 24 initial kernels (i.e., kernels in the first layer), 5, 268,818 parameters and the training time of 89 s (0.997) while the best residual network was one with 6 residual blocks, 32 initial kernels, 248,418 parameters and the training time of 253 s (0.999). In general, the performance of the residual network improved as the number of its residual blocks (its depth) increased.

What is already known on this topic
An increasing amount of research has used deep neural networks to diagnose AF and other types of arrhythmia, given their superior performance compared to other machine learning methods. This line of research applied convolutional neural networks (i.e., Alex, Residual), their recurrent counterparts (i.e., long short term memory) or both to achieve the accuracy range of 80-99% with varying numbers of class numbers for ECG data. Most of these studies employed public data such as MIT-BIH arrhythmia data.

What this study adds
Based on ECG data in a general hospital, this study used more various convolutional neural networks and achieved higher accuracy measures compared to the existing literature for diagnosing arrhythmia as the basis of clinical decision support [6][7][8][9][10]. Specifically, 30 convolutional neural networks were applied and compared for the diagnosis of the NSR vs. AF condition with the accuracy range of 99.00-99.99%: 6 Alex networks with the number of kernels changing from 3 to 256; and 24 residual networks with the number of residuals blocks (or kernels) varying from 8 to 2 (or 64 to 2). Eight cases misspecified by the best residual networks in this study are shown in Additional file 1: Figures S3A -H, e.g., Additional file 1: Figure S3A, AF misspecified as normal by Residual 1-3, 1-4, 3-1 and 3-2. Indeed, six cases specified correctly by all residual networks in this study are shown in Additional file 1: Figures S3I-N. According to these figures, it can be noted that convolutional neural networks find some regular patterns human experts miss. For example, it might be the case in Additional file 1: Figure S3C that (1) the convolutional neural network took four or five beats as a basic unit and predicted the signal as NSR but (2) the human expert considered a single beat as a basic unit and made an opposite diagnosis of AF. It will be an interesting and important topic to understand better how deep neural networks look at signal data and make a diagnosis.

Limitations of this study
Firstly, this study focused on binary diagnosis of the NSR vs. AF condition. Expanding this study to other arrhythmia conditions might add a great contribution to this line of research. Secondly, comparisons between the convolutional neural networks and their recurrent counterparts in terms of model performance and training time might expand the horizon of research on this topic. Thirdly, a recent review indicates that the standardization of ECG diagnostic criteria is expected to improve the agreement of clinical experts and the performance of computer algorithms regarding ECG interpretation [16]. Much more effort needs to be made in this direction, given that even experienced clinical experts, the gold standard, often disagree in their ECG interpretation. Finally, this study used the training and test sets only. Including the validation set (whose element is not preselected into two rhythm types) might be an important next step for advancing science and its clinical practice.

Conclusions
For AF diagnosis, the residual network might be a good model with higher accuracy and fewer parameters than its Alex-network counterparts.
Additional file 1: Figure S1. Electrocardiogram Wave. A Normal. B Atrial Fibrillation vs. Normal. The atrial-fibrillation rhythm in the top does not have a P wave (purple arrow) of the normal rhythm in the bottom.