This article has Open Peer Review reports available.
A new methodology for assessment of the performance of heartbeat classification systems
- John M Darrington^{1}Email author and
- Livia C Hool^{2}
https://doi.org/10.1186/1472-6947-8-7
© Darrington and Hool; licensee BioMed Central Ltd. 2008
Received: 10 October 2007
Accepted: 30 January 2008
Published: 30 January 2008
Abstract
Background
The literature presents many different algorithms for classifying heartbeats from ECG signals. The performance of the classifier is normally presented in terms of sensitivity, specificity or other metrics describing the proportion of correct versus incorrect beat classifications. From the clinician's point of view, such metrics are however insufficient to rate the performance of a classifier.
Methods
We propose a new methodology for the presentation of classifier performance, based on Bayesian classification theory. Our proposition lets the investigators report their findings in terms of beat-by-beat comparisons, and defers the role of assessing the utility of the classifier to the statistician. Evaluation of the classifier's utility must be undertaken in conjunction with the set of relative costs applicable to the clinicians' application. Such evaluation produces a metric more tuned to the specific application, whilst preserving the information in the results.
Results
By way of demonstration, we propose a set of costs, based on clinical data from the literature, and examine the results of two published classifiers using our method. We make recommendations for reporting classifier performance, such that this method can be used for subsequent evaluation.
Conclusion
The proportion of misclassified beats contains insufficient information to fully evaluate a classifier. Performance reports should include a table of beat-by-beat comparisons, showing not-only the number of misclassifications, but also the identity of the classes involved in each inaccurate classification.
Keywords
Background
In recent years there has been a surge of interest in computer implementations of automatic beat classification algorithms. The impetus for this research stems partly from the advances and miniaturisation of electronics, which allows portable, wearable and implantable devices to perform greater functionality than was achievable in the past, but also from the desire to automate tasks currently performed by intensive care and operating room staff.
There are several principles on which classifiers operate, and many variations and implementations of each. With the large number of published algorithms comes the need to analyse and compare their performance. The literature to date, compares techniques on a low level, parameter-by-parameter basis [1–3]. ANSI standard EC57 [4] attempts to formalise methods for reporting such comparisons. These comparisons are of interest to those working on the development of new algorithms or the enhancement of existing ones, but are of little interest to a clinician when making a decision about which algorithm suits his purpose.
Problems with current classification methods
where:
TP = number of true positives
FP = number of false positives
TN = number of true negatives
FN = number of false negatives.
However these values are defined only for binary classification, and do not readily lend themselves to problems involving more than two classes, {ω _{1}, ω _{2} ... ω _{ n }}. Nevertheless, in the literature, one often sees beat classifier performance reports where sensitivity and specificity are freely quoted. Whilst the definitions of these measures for the context are normally not given, they appear to use the following extended definitions:
TP_{ j }= number of beats correctly identified as belonging to class ω _{ j }
FP_{ j }= number of beats incorrectly identified as belonging to class ω _{ j }
TN_{ j }= number of beats correctly identified as belonging to a class other than ω _{ j }
FN_{ j }= number of beats incorrectly identified as belonging to a class other than ω _{ j }
and hence Se_{ j }, Sp_{ j }and +P_{ j }can be defined accordingly.
A number of problems become apparent when using such statistics to evaluate the performance of beat classifiers:
1. They do not take into account the a priori probabilities of the beat classes.
2. They do not take into account the relative costs of false classification.
3. They can be presented only as a multi-dimensional value, even where only two classes are being considered. There is no obvious single ordinal value.
Problem 1 has been recognised in the medical literature [6]. We are not aware of any previous attempt to deal with problem 2. Problem 3 makes these reports particularly unhelpful from the point of view of the clinician trying to compare systems with a view to adopting one for use. For an n class classifier, there are 2n scalar quantities, so ranking classifiers using these quantities is not possible.
We propose a new method, which overcomes these problems and aims to be generally useful for the quantitative comparison of beat classification schemes.
Proposed methodology
A system's utility as a prognostic medical tool is a measure of the benefit afforded by selecting it against other alternatives. Choosing a system involves maximising the benefit, or alternatively, minimising the risk. A measure for the overall risk associated with making a decision based upon the output of a beat classifier is a useful measure of its performance. Risk is characterised by the probability of error and the costs associated with making a decision based upon the erroneous classification. We have used Bayesian decision theory to determine a method of calculating the risk associated with a beat classifier.
Introduction to Bayesian risk
Thus, a perfect classifier has a $\widehat{\mathcal{R}}$ value of zero and, at the opposite extreme, unity.
Results
ANSI EC57 section 4.3 identifies 5 classes of beats which are recommended in performance reports, viz: normal beats, Supra-Ventricular Ectopic Beats, Ventricular Ectopic Beats, fusions of normal and Ventricular Ectopic Beats and other unclassified beats. From secondary data sources, we derived a priori probabilities and costs of decisions for these classes. A detailed description of the derivation and the data sources are below.
A priori probabilities derived from the MIT-BIH Arrhythmia Database.
j | ω _{ j } | c _{ j } | P(ω _{ j }) |
---|---|---|---|
0 | Not a Beat | 577 | . |
1 (N) | Normal Beat | 46097 | 0.9670 |
2 (SVEB) | Supra Ventricular Ectopic Beat | 192 | 0.0040 |
3 (VEB) | Ventricular Ectopic Beat | 1345 | 0.2800 |
4 (F) | Fusion of Normal and VEB | 13 | 0.0002 |
5 (Q) | Unclassified Beat | 0 | 0.00 |
Costs of false classification in 1000s of AUD
λ | ω _{N} | ω _{SVEB} | ω _{VEB} | ω _{F} | ω _{Q} |
---|---|---|---|---|---|
α _{N} | 0 | 38.63^{1} | 170.19^{2} | 170.19^{3} | 0 |
α _{SVEB} | 0 | 0 | 170.19^{4} | 170.19^{5} | 0 |
α _{VEB} | 2.15^{6} | 2.15^{7} | 0 | 0 | 0 |
α _{F} | 2.15^{8} | 2.15^{9} | 0 | 0 | 0 |
α _{Q} | 0 | 0 | 0 | 0 | 0 |
Comparative performance of two classifiers
Chazal et al. | Melo et al. | |
---|---|---|
R(α _{N}) | 2.05 | 1.67 |
R(α _{SVEB}) | 65.46 | 17.91 |
R(α _{VEB}) | 0.07 | 0.03 |
R(α _{F}) | 1.79 | 0.71 |
$R({\alpha}_{k})={\displaystyle \sum _{j=1}^{n}\lambda ({\alpha}_{k}|{\omega}_{j})P({\omega}_{j}|{\alpha}_{k})}.$ | 6.67 | 1.71 |
$\widehat{\mathcal{R}}$ | 0.126 | 0.033 |
Discussion
We do not presume that the costs presented herein, or the propositions used in their calculation are universally applicable. Rather, we seek to demonstrate how, given the class conditional matrix, an ordinal measure for a beat classifier may be determined, applicable to any particular situation. Others might disagree with our cost calculation methods, or the application might demand consideration for classes other than those we have investigated. Whilst we have used monetary units to measure costs, we recognise the ethical issues raised by doing so, and our methodology imposes no requirement on the nature of the units of cost. Any unit acceptable to the community of interest may be used. In such cases, given the class conditional matrix, potential users may conduct their own studies and assess the performance of the classifier using the method we have described.
To facilitate such studies, we urge biomedical engineers to report more than the relative numbers of correct versus incorrectly classified beats, but also the identity of the misclassified beats in the form of a class conditional matrix. ANSI EC57 describes how to compile such a matrix, but makes no recommendation for its publication. It is trivial to calculate sensitivity and specificity from such a matrix if desired, and allows for more useful measures of performance as described herein. We recommend publication of the class conditional matrix and/or a table of beat-by-beat comparisons.
Conclusion
The utility of a beat classifier cannot be fully quantified in terms of the number of correct and incorrect beats. Instead, the number of misclassifications for each class is required. Together with the a priori probabilities and the costs of misclassification, quantitative measures of a classifier's utility can be determined.
A system which claims to classify beats into more than two classes is not a binary classifier, and performance should not be reported as if it is. Instead of reporting sensitivity/specificity/predictivity for each of the n classes, a n × n matrix of beat classifications (the class conditional frequencies) should be reported.
Clinicians wishing to assess a classifier need to obtain estimates for the costs of misclassification, and calculate the overall risk of reliance.
Methods
Equation (5) comprises the terms P(α _{ k }|ω _{ i }), P(ω _{ j }) and λ(α _{ k }|ω _{ j }). P(α _{ k }|ω _{ i }) are parameters of the classifier and can be tested experimentally. P(ω _{ j }) and λ(α _{ k }|ω _{ j }) are parameters of the classes of interest. They are respectively the a priori probabilities and the costs of making decisions. In this section we examine a number of secondary sources to determine values for P(ω _{ j }) and λ(α _{ k }|ω _{ j }).
A priori probabilities
We used the MIT-BIH Arrhythmia Database to extract a priori probabilities. The records chosen were the first group of records (numbers 100–124) from the database. We omitted the second group (numbers 200–234), since these were deliberately selected by the authors of the database to contain "rare but clinically important phenomena", whereas the first group was randomly selected so as to "serve as a representative sample of the variety of waveforms and artifact that an arrhythmia detector might encounter in routine clinical use".
Table 1 shows the data extracted from the database. Class 0 was disregarded, since these annotations are not beats, but are used to mark other interesting features in the signal. c _{ j }are the counts of beats of class ω _{ j }. P(ω _{ j }) was calculated by dividing c _{ j }by $\sum}_{1}^{5}{c}_{j$, Note that c _{5} = 0 and we therefore conclude that beat classes other than 1–4 are sufficiently rare to have negligible effect on the utility of a system.
Extrapolation of data
Where possible, we referred to longitudinal studies, giving data gathered over a period of 10 years or longer. In many instances, the data was available only in graphical form in which case trapezoidal approximation of integrals was used. Where data over a 10 year period was not available, we used data gathered over a shorter period and extrapolated by the following method.
We presume survival to be described by an exponential expression
s = κe ^{ βt }.
where κ and β are constants for which β < 0 and 0 <κ ≤ 1. Equation (10) implies that the mortality fraction m is
m = 1-κe ^{ βt }.
where X = ln κ. Thus equation (10) implies
ln(s) = X + βt.
Hence, X and β can be found from the data by linear regression of t against ln(s), and the total expected mortality over 10 years from equation (13).
Costs of incorrect classification
In determining costs for incorrect decisions, we have presumed standard clinical treatment of abnormal beats, or non-treatment of normal beats according to the system's output. We have then investigated the costs, in monetary terms, of taking that course of action under each state of nature. We have endeavoured to report 'costs' as the general cost to society, rather than the cost to any particular entity. A summary of these figures is presented in Table 2. All costs have been normalised the year 2006, and are in Australian dollars except where otherwise noted.
Each application however may have different ancillary parameters, and these may affect the costs involved. The methods and figures provided in this study reflect the most general situation, as best as we could determine. Propositions we have made in calculations of costs we have stated herein, and these should be examined when applying the figures.
Proposition 1 The clinical treatment for fusions of normal and Ventricular Ectopic Beats (class ω _{F} ) is identical to that for Ventricular Ectopic Beats.
We make this proposition on the basis that sustained Ventricular Ectopic Beats are potentially life threatening, and must be treated. To a clinician, the fact that the polarisation of the waveform coincides with the preceding beat is merely incidental.
Proposition 2 The maximum future projection which may affect costs of misclassification is 10 years.
10 years is chosen as a reasonable period beyond which the advances in medical technology can be expected to invalidate the results of future prediction.
Datum 1 The expected loss of life of a healthy subject, projected over the next 10 years, is 1.21 years.
This datum is from the control group of Benjamin et al. [11]. This was a longitudinal study which investigated the mortality of subjects who had developed Atrial Fibrillation. We integrated the survival results presented by Figure A of that paper to determine the expected number of years of life lost by a healthy subject.
Datum 2 The probability that a subject with Supra-Ventricular Ectopic Beats will develop Atrial Fibrillation is 0.324.
Datum 2 comes from the results of Frost et al. [12].
Datum 3 The expected loss of life due to a person suffering from Atrial Fibrillation is 2.69 years (projected over 10 years).
Expected Loss of life due to Atrial Fibrillation
Women without Atrial Fibrillation | 0.87 |
Men without Atrial Fibrillation | 1.55 |
Mean Average | 1.21 |
Women with Atrial Fibrillation | 3.75 |
Men with Atrial Fibrillation | 4.05 |
Mean Average | 3.90 |
Expected Loss Due to AF | 2.69 |
Datum 4 A person's contribution to society is $44,320 per annum.
Datum 4 is the mean average wage in Australia for the year 2006[13].
Datum 5 The cost of misdiagnosing a Supra-Ventricular Ectopic Beat as a normal beat, λ(α _{N}|ω _{SVEB}) is $38,627.
This figure is the product of Data 2, 3 and 4.
Datum 6 The probability of initially surviving Ventricular Fibrillation is $\frac{146}{886}=0.164$.
From the text of Baum et al.[14], we know that, in a 3 year study of Ventricular Fibrillation cases, 146 patients out of 886 initially survived Ventricular Fibrillation.
Datum 7 The total expected loss of life by a person who initially survives Ventricular Fibrillation is 6.33 years (projected over 10 years).
To derive Datum 7 we used the results of Baum et al.[14]. In Figure 2 of that paper, survival curves are presented for subjects who initially survived Ventricular Fibrillation. Survival data are presented for only 24 months. We extrapolated survival data over a 10 year period by the method described above.
Datum 8 The expected loss of life, attributable to Ventricular Fibrillation, by a subject who suffers Ventricular Fibrillation is 8.19 years (projected over 10 years).
Expected Loss of life due to Ventricular Fibrillation
Group | P | loss of life | Expectation | ||
---|---|---|---|---|---|
Survival | 0.16^{1} | × | 6.36^{2} | = | 1.04 |
Non-survival | 0.84 | × | 10 | = | 8.40 |
Healthy subject | (1.21)^{3} | ||||
Total loss due to VF | 8.19 |
Datum 9 The expected loss of life, attributable to Ventricular Tachycardia by a subject who suffers Ventricular Tachycardia is 2.59 years (projected over 10 years).
This datum was obtained from Doval et al.[15] by the data extrapolation method described previously. In that study of 516 subjects both with and without non-sustained Ventricular Tachycardia, the projected loss over 10 years for the group with Ventricular Tachycardia was 9.24 years, whereas the projected loss for the group without Ventricular Tachycardia was 6.65 years. The difference is 2.59 years.
Datum 10 The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Fibrillation is 0.34.
Datum 11 The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Tachycardia is 0.41.
Datum 10 and Datum 11 are implied from Carrim and Khan [16]. In that study of 44 subjects exhibiting Ventricular Ectopic Beats, V T is the set of subjects developing Ventricular Tachycardia and V F is the set of subjects developing Ventricular Fibrillation. From their data, we are given:
from which we can deduce that |V T ∩ V F| = 12, |V T| = 18 and |V F| = 15.
Datum 12 The expected loss of life, when a Ventricular Ectopic Beat is misdiagnosed as a normal beat is 3.84 years.
Expected loss of life due to Ventricular Ectopic Beats
Group | P | loss of life | Expectation | ||
---|---|---|---|---|---|
Tachycardia | 0.41^{1} | × | 2.59 ^{2} | = | 1.06 |
Fibrillation | 0.34^{3} | × | 8.19 ^{4} | = | 2.78 |
Total | 3.84 |
Datum 13 The cost of misdiagnosing a Ventricular Ectopic Beat as a normal beat, λ(α _{N}|ω _{VEB}) is $170,189.
This is the product of Datum 12 and Datum 4. By Proposition 1 we attribute the same cost to λ(α _{N}|ω _{F}).
Costs of misclassification of a normal beat as abnormal
Ventricular Ectopic Beats are of potential concern to a physician. If a system misclassifies a normal beat (ω _{N}), or a Supra-Ventricular Ectopic Beat (ω _{SVEB}) as Ventricular Ectopic Beat or a fusion beat (decisions α _{VEB} and α _{F}), the likely result is that the patient will be unnecessarily detained in observation, pending further examination. Thus, the cost of misclassification, in this case, is the cost of retaining a patient in intensive care for one day.
Datum 14 The cost of retaining a patient in intensive care for 1 day is $2146
This datum is the mean average of figures obtained from two independent Australian health insurance providers. A case study by Rechner and Lipman[17] for the year 2003 cites the figure $2670. This study however was conducted within a teaching hospital, and we therefore expect the costs to be higher than average.
We use the cost of retaining a patient in intensive care for one day as the cost of misclassification of a normal beat as the penalty in such instances. Thus λ(α _{VEB}|ω _{N}) = λ(α _{F}|ω _{N}) = λ(α _{VEB}|ω _{SVEB}) = λ(α _{F}|ω _{SVEB}) = $2146.
From Proposition 1 we conclude that λ(λ(α _{F}|ω _{VEB}) = λ(α _{VEB}|ω _{F}) = 0.
Erroneous classification of beats as Supra-Ventricular Ectopic Beats
Unlike Ventricular Ectopic Beats, Supra-Ventricular Ectopic Beats are typically not life threatening, and are therefore not normally treated unless recurrent [18]. Thus, the cost of misclassification as a normal beat (λ(α _{SVEB}|ω _{N})) is zero. By the same token, misclassification of Ventricular Ectopic Beats or fusion beats as Supra-Ventricular Ectopic Beats bears the same penalty as an erroneous classification as a normal beat. Thus λ(α _{SVEB}|ω _{VEB}) = λ(α _{N}|ω _{VEB}) and λ(α _{SVEB}|ω _{F}) = λ(α _{N}|ω _{F}).
Upper bound of risk
From Tables 1 and 2, the value of ${\mathcal{R}}_{\mathrm{max}}$ was calculated as described in section as $52,817.
Declarations
Acknowledgements
The authors would like to express their thanks to Amitava Datta for his advice during this study.
Authors’ Affiliations
References
- Christov I, Bortolan G: Ranking of pattern recognition parameters for premature ventricular contractions classification by neural networks. Physiological Measurement. 2004, 25: 1281-1290. 10.1088/0967-3334/25/5/017.View ArticlePubMedGoogle Scholar
- de Chazal P, Reilly RB: A Comparison of the ECG Classification Performance of Different Feature Sets. Computers in Cardiology. 2000, 27: 327-330.Google Scholar
- Christov I, Gómez-Herrero G, Krasteva V, Jekova I, Gotchev A, Egiazarian K: Comparative study of morphological and time-frequency ECG descriptors for heartbeat classification. Medical Engineering and Physics. 2006, 28 (9): 876-887. 10.1016/j.medengphy.2005.12.010.View ArticlePubMedGoogle Scholar
- Testing and reporting performance results of cardiac rhythm and ST-segment measuring algorithms. 1988, Arlington, VA, USA, Published as American National Standard ANSI/AAMI EC57:1988Google Scholar
- Schluter P, Peterson S, Moody G, Siegal L, Jackson C, Perry D, Acarturk E, Aumiller J, Blake S, Blaustein A, Conrad C, Heller G, Malagold M, Mark R, Miklozek C: MIT-BIH Arrhythmia Database Directory. Online database. 1987, [http://www.physionet.org/physiobank/database/html/mitdbdir/mitdbdir.htm]Google Scholar
- Altman DG, Bland JM: Statistics Notes: Diagnostic tests 2: predictive values. British Medical Journal. 1994, 309 (6947): 102-View ArticlePubMedPubMed CentralGoogle Scholar
- Schlesinger MI, Hlaváč V: Ten Lectures on Statistical and Structural Pattern Recognition. 2002, Kluwer Academic Publishers, 24 (chap 1): 1-22.Google Scholar
- Duda RO, Hart PE: Pattern Classification and Scene Analysis. 1973, Kluwer Academic Publishers, 10-39. 1, chap 2Google Scholar
- de Chazal P, O'Dwyer M, Reilly RB: Automatic Classification of Heartbeats Using ECG Morphology and Heartbeat Interval Features. IEEE Transactions on Biomedical Engineering. 2004, 51 (7): 1196-1206. 10.1109/TBME.2004.827359.View ArticlePubMedGoogle Scholar
- Melo SL, Calôba LP, Nadal J: Arrhythmia Analysis Using Artificial Neural Network and Decimated Electrocardiographic Data. Computers in Cardiology. 2000, 27: 73-76.Google Scholar
- Benjamin EJ, Wolf PA, D'Agostino RB, Silbershatz H, Kannel WB, Levy D: Impact of Atrial Fibrillation on the Risk of Death: The Framingham Heart Study. Circulation. 1998, 98: 946-952.View ArticlePubMedGoogle Scholar
- Frost LHM, Christiansen EH, Jacobsen CJ, Allermand H, Thomsen PEB: Low vagal tone and supraventricular ectopic activity predict atrial fibrillation and flutter after coronary artery bypass grafting. European Heart Journal. 1995, 16: 825-831.PubMedGoogle Scholar
- Australian Bureau of Statistics: 6306.0 – Employee Earnings and Hours, Australia, May 2006. 2616, Belconnen, ACT Australia, [http://www.abs.gov.au/AUSSTATS/abs@.nsf/ProductsbyCatalogue/27641437D6780D1FCA2568A9001393DF?OpenDocument#]Google Scholar
- Baum RS, III HA, Cobb LA: Survival after Resuscitation from Out-of-Hospital Ventricular Fibrillation. Circulation. 1974, 50: 1231-1235.View ArticlePubMedGoogle Scholar
- Doval H, Nul D, Grancelli H, Varini S, Soifer S, Corrado G, Dubner S, Scapin O, Perrone S: Nonsustained Ventricular Tachycardia in Severe Heart Failure: Independent Marker of Increased Mortality due to Sudden Death. Circulation. 1996, 94 (12): 3198-3203.View ArticlePubMedGoogle Scholar
- Carrim ZI, Khan AA: Mean Frequency of Premature Ventricular Complexes and Predictor of Malignant Ventricular Arrythmias. The Mount Sinai Journal of Medicine. 2005, 72 (6): 374-380.PubMedGoogle Scholar
- Rechner IJ, Lipman J: The costs of caring for patients in a tertiary refereral Australian Intensive Care Unit. Anaesthesia Intensive Care. 2005, 33 (4): 477-482.PubMedGoogle Scholar
- Lundqvist CB: ACC/AHA/ESC Guidelines for the Management of Patients with Supraventricular Arrhythmias – executive summary a report of the American college of cardiology/American heart association task force on practice guidelines and the European society of cardiology committee for practice guidelines (writing committee to develop guidelines for the management of patients with supraventricular arrhythmias). Journal of the American College of Cardiology. 2003, 42 (8): 1493-1531. 10.1016/j.jacc.2003.08.013. Developed in Collaboration with NASPE-Heart Rhythm SocietyView ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/8/7/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.