BMC Medical Informatics and Decision Making

Background: The literature presents many different algorithms for classifying heartbeats from ECG signals. The performance of the classifier is normally presented in terms of sensitivity, specificity or other metrics describing the proportion of correct versus incorrect beat classifications. From the clinician's point of view, such metrics are however insufficient to rate the performance of a classifier.


Background
In recent years there has been a surge of interest in computer implementations of automatic beat classification algorithms. The impetus for this research stems partly from the advances and miniaturisation of electronics, which allows portable, wearable and implantable devices to perform greater functionality than was achievable in the past, but also from the desire to automate tasks currently performed by intensive care and operating room staff.
There are several principles on which classifiers operate, and many variations and implementations of each. With the large number of published algorithms comes the need to analyse and compare their performance. The literature to date, compares techniques on a low level, parameterby-parameter basis [1][2][3]. ANSI standard EC57 [4] attempts to formalise methods for reporting such comparisons. These comparisons are of interest to those working on the development of new algorithms or the enhancement of existing ones, but are of little interest to a clinician when making a decision about which algorithm suits his purpose.

Problems with current classification methods
The contemporary method for reporting the performance of a beat classification algorithm involves beat-by-beat comparisons between the class as indicated by the algorithm and as indicated by some reference. The MIT-BIH database [5] is a commonly used reference source. Performance is reported either by a table giving counts of correctly and incorrectly classified beats, or by way of statistics inferred from such a table. Common statistics are sensitivity (Se), specificity (Sp) and positive predictivity (+P). These are defined as: However these values are defined only for binary classification, and do not readily lend themselves to problems involving more than two classes, {ω 1 , ω 2 ... ω n }. Nevertheless, in the literature, one often sees beat classifier performance reports where sensitivity and specificity are freely quoted. Whilst the definitions of these measures for the context are normally not given, they appear to use the following extended definitions: TP j = number of beats correctly identified as belonging to class ω j FP j = number of beats incorrectly identified as belonging to class ω j TN j = number of beats correctly identified as belonging to a class other than ω j FN j = number of beats incorrectly identified as belonging to a class other than ω j and hence Se j , Sp j and +P j can be defined accordingly.
A number of problems become apparent when using such statistics to evaluate the performance of beat classifiers: 1. They do not take into account the a priori probabilities of the beat classes.
2. They do not take into account the relative costs of false classification.
3. They can be presented only as a multi-dimensional value, even where only two classes are being considered. There is no obvious single ordinal value. Problem 1 has been recognised in the medical literature [6]. We are not aware of any previous attempt to deal with problem 2. Problem 3 makes these reports particularly unhelpful from the point of view of the clinician trying to compare systems with a view to adopting one for use. For an n class classifier, there are 2n scalar quantities, so ranking classifiers using these quantities is not possible.
We propose a new method, which overcomes these problems and aims to be generally useful for the quantitative comparison of beat classification schemes.

Proposed methodology
A system's utility as a prognostic medical tool is a measure of the benefit afforded by selecting it against other alternatives. Choosing a system involves maximising the benefit, or alternatively, minimising the risk. A measure for the overall risk associated with making a decision based upon the output of a beat classifier is a useful measure of its performance. Risk is characterised by the probability of error and the costs associated with making a decision based upon the erroneous classification. We have used Bayesian decision theory to determine a method of calculating the risk associated with a beat classifier.

Introduction to Bayesian risk
Bayesian decision theory is presented in many texts on statistics and classification theory [7,8] and will be introduced here only briefly. In a system which is claimed to recognise n different classes of beats {ω 1  The quantity P(ω j ) is called the a priori probability, P(α k |ω j ) the likelihood or class conditional probability and P(ω j |α k ) the a posteriori probability. Note that P(α j |ω j ) ≡ Se j . From (1) and (2) we can write: Let λ(α k |ω j ) be the cost incurred for making decision α k when ω j is the true beat class. Therefore, the risk of making decision α k based upon the classifier's output (the risk of reliance) is: Combining the above gives The overall risk of relying on a classifier is or equivalently We propose that {R(α 1 ), R(α 2 ) ... R(α n )} be used in the consideration of a classifier's utility, and that be used for overall rating of classifiers. has the range (0, ∞) and its units are dollars (or whatever units have been chosen for λ(α k |ω j )). We envisage that a unitless measure, having the range (0, 1) is more useful in many circumstances. Accordingly, we also propose a normalised metric: where is the value obtained from equation (5) when the class conditional probabilities are set to Thus, a perfect classifier has a value of zero and, at the opposite extreme, unity.

Results
ANSI EC57 section 4.3 identifies 5 classes of beats which are recommended in performance reports, viz: normal beats, Supra-Ventricular Ectopic Beats, Ventricular Ectopic Beats, fusions of normal and Ventricular Ectopic Beats and other unclassified beats. From secondary data sources, we derived a priori probabilities and costs of decisions for these classes. A detailed description of the derivation and the data sources are below. Table 1 shows the a priori probabilities. Table 2 shows the costs. Together with the values of P(α k |ω j ), these tables enable the risk to be calculated. Unfortunately, in many cases the literature presents neither the values for P(α k |ω j ), nor the table of beat-by-beat comparisons from which they could be deduced. In the literature, we were able to find only two classifiers for which this data was reported. These are the classifiers of de Chazal et al. [9] and of Melo et al. [10]. Melo et al. publishes separate results for aberrated atrial premature beats. For the purposes of comparison, we have regarded aberrated and non-aberrated atrial premature beats as a single class (ω SVEB ).
The results are shown in Table 3. In these results, the overall risk is significantly lower for the Melo classifier, and the risks of reliance R(α i ) is also lower for all i. In other words, this classifier dominates in all respects. In general

Discussion
We do not presume that the costs presented herein, or the propositions used in their calculation are universally applicable. Rather, we seek to demonstrate how, given the class conditional matrix, an ordinal measure for a beat classifier may be determined, applicable to any particular situation. Others might disagree with our cost calculation methods, or the application might demand consideration for classes other than those we have investigated. Whilst we have used monetary units to measure costs, we recognise the ethical issues raised by doing so, and our methodology imposes no requirement on the nature of the units of cost. Any unit acceptable to the community of interest may be used. In such cases, given the class conditional matrix, potential users may conduct their own studies and assess the performance of the classifier using the method we have described.
To facilitate such studies, we urge biomedical engineers to report more than the relative numbers of correct versus incorrectly classified beats, but also the identity of the misclassified beats in the form of a class conditional matrix. ANSI EC57 describes how to compile such a matrix, but makes no recommendation for its publication. It is trivial to calculate sensitivity and specificity from such a matrix if desired, and allows for more useful measures of performance as described herein. We recommend publication of the class conditional matrix and/or a table of beat-by-beat comparisons.

Conclusion
The utility of a beat classifier cannot be fully quantified in terms of the number of correct and incorrect beats. Instead, the number of misclassifications for each class is required. Together with the a priori probabilities and the costs of misclassification, quantitative measures of a classifier's utility can be determined.
A system which claims to classify beats into more than two classes is not a binary classifier, and performance should not be reported as if it is. Instead of reporting sensitivity/specificity/predictivity for each of the n classes, a n × n matrix of beat classifications (the class conditional frequencies) should be reported.
Clinicians wishing to assess a classifier need to obtain estimates for the costs of misclassification, and calculate the overall risk of reliance.

Methods
Equation (5) comprises the terms P(α k |ω i ), P(ω j ) and λ(α k |ω j ). P(α k |ω i ) are parameters of the classifier and can be tested experimentally. P(ω j ) and λ(α k |ω j ) are parameters of the classes of interest. They are respectively the a priori probabilities and the costs of making decisions. In this section we examine a number of secondary sources to determine values for P(ω j ) and λ(α k |ω j ).

A priori probabilities
We used the MIT-BIH Arrhythmia Database to extract a priori probabilities. The records chosen were the first group of records (numbers 100-124) from the database. We omitted the second group (numbers 200-234), since these were deliberately selected by the authors of the database to contain "rare but clinically important phenomena", whereas the first group was randomly selected so as to "serve as a representative sample of the variety of waveforms and artifact that an arrhythmia detector might encounter in routine clinical use". Table 1 shows the data extracted from the database. Class 0 was disregarded, since these annotations are not beats, but are used to mark other interesting features in the signal. c j are the counts of beats of class ω j . P(ω j ) was calcu- λα ω ω α = = ∑ 1  lated by dividing c j by , Note that c 5 = 0 and we therefore conclude that beat classes other than 1-4 are sufficiently rare to have negligible effect on the utility of a system.

Extrapolation of data
Where possible, we referred to longitudinal studies, giving data gathered over a period of 10 years or longer. In many instances, the data was available only in graphical form in which case trapezoidal approximation of integrals was used. Where data over a 10 year period was not available, we used data gathered over a shorter period and extrapolated by the following method.
We presume survival to be described by an exponential expression s = κe βt .
Equation (10) implies that the mortality fraction m is Integrating m over 10 years gives the total loss: which can be expressed as where X = ln κ. Thus equation (10) implies ln(s) = X + βt.
Hence, X and β can be found from the data by linear regression of t against ln(s), and the total expected mortality over 10 years from equation (13).

Costs of incorrect classification
In determining costs for incorrect decisions, we have presumed standard clinical treatment of abnormal beats, or non-treatment of normal beats according to the system's output. We have then investigated the costs, in monetary terms, of taking that course of action under each state of nature. We have endeavoured to report 'costs' as the general cost to society, rather than the cost to any particular entity. A summary of these figures is presented in We make this proposition on the basis that sustained Ventricular Ectopic Beats are potentially life threatening, and must be treated. To a clinician, the fact that the polarisation of the waveform coincides with the preceding beat is merely incidental.

Proposition 2
The maximum future projection which may affect costs of misclassification is 10 years.
10 years is chosen as a reasonable period beyond which the advances in medical technology can be expected to invalidate the results of future prediction.

Datum 1 The expected loss of life of a healthy subject, projected over the next 10 years, is 1.21 years.
This datum is from the control group of Benjamin et al. [11]. This was a longitudinal study which investigated the mortality of subjects who had developed Atrial Fibrillation. We integrated the survival results presented by Figure  A of that paper to determine the expected number of years of life lost by a healthy subject.

Datum 2 The probability that a subject with Supra-Ventricular Ectopic Beats will develop Atrial Fibrillation is 0.324.
Datum 2 comes from the results of Frost et al. [12].

Datum 3 The expected loss of life due to a person suffering from Atrial Fibrillation is 2.69 years (projected over 10 years).
Datum 3 is determined from Benjamin et al. in a similar fashion to Datum 1: The calculations are presented in Table 4.

Datum 4 A person's contribution to society is $44,320 per annum.
Datum 4 is the mean average wage in Australia for the year 2006 [13].

Datum 6 The probability of initially surviving Ventricular
Fibrillation is .
From the text of Baum et al. [14], we know that, in a 3 year study of Ventricular Fibrillation cases, 146 patients out of 886 initially survived Ventricular Fibrillation.

Datum 7 The total expected loss of life by a person who initially survives Ventricular Fibrillation is 6.33 years (projected over 10 years).
To derive Datum 7 we used the results of Baum et al. [14].
In Figure 2 of that paper, survival curves are presented for subjects who initially survived Ventricular Fibrillation. Survival data are presented for only 24 months. We extrapolated survival data over a 10 year period by the method described above.

Datum 8 The expected loss of life, attributable to Ventricular Fibrillation, by a subject who suffers Ventricular Fibrillation is 8.19 years (projected over 10 years).
We know from Datum 3 that the expected loss of life of healthy subjects is 1.21 years. Hence we can calculate the loss due to Ventricular Fibrillation as per Table 5.

Datum 9 The expected loss of life, attributable to Ventricular Tachycardia by a subject who suffers Ventricular Tachycardia is 2.59 years (projected over 10 years).
This datum was obtained from Doval et al. [15] by the data extrapolation method described previously. In that study of 516 subjects both with and without non-sustained Ventricular Tachycardia, the projected loss over 10 years for the group with Ventricular Tachycardia was 9.24 years, whereas the projected loss for the group without Ventricular Tachycardia was 6.65 years. The difference is 2.59 years.

Datum 10
The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Fibrillation is 0.34.

Datum 11
The probability that a subject who experiences one or more episodes of Ventricular Ectopic Beats, will develop Ventricular Tachycardia is 0.41.
Datum 10 and Datum 11 are implied from Carrim and Khan [16]. In that study of 44 subjects exhibiting Ventricular Ectopic Beats, V T is the set of subjects developing Ventricular Tachycardia and V F is the set of subjects developing Ventricular Fibrillation. From their data, we are given:   This is the product of Datum 12 and Datum 4. By Proposition 1 we attribute the same cost to λ(α N |ω F ).

Costs of misclassification of a normal beat as abnormal
Ventricular Ectopic Beats are of potential concern to a physician. If a system misclassifies a normal beat (ω N ), or a Supra-Ventricular Ectopic Beat (ω SVEB ) as Ventricular Ectopic Beat or a fusion beat (decisions α VEB and α F ), the likely result is that the patient will be unnecessarily detained in observation, pending further examination. Thus, the cost of misclassification, in this case, is the cost of retaining a patient in intensive care for one day.

Datum 14 The cost of retaining a patient in intensive care for 1 day is $2146
This datum is the mean average of figures obtained from two independent Australian health insurance providers. A case study by Rechner and Lipman [17] for the year 2003 cites the figure $2670. This study however was conducted within a teaching hospital, and we therefore expect the costs to be higher than average.

Erroneous classification of beats as Supra-Ventricular Ectopic Beats
Unlike Ventricular Ectopic Beats, Supra-Ventricular Ectopic Beats are typically not life threatening, and are therefore not normally treated unless recurrent [18]. Thus, the cost of misclassification as a normal beat (λ(α SVEB |ω N )) is zero. By the same token, misclassification of Ventricular Ectopic Beats or fusion beats as Supra-Ventricular Ectopic Beats bears the same penalty as an erroneous classification as a normal beat. Thus λ(α SVEB |ω VEB ) = λ(α N |ω VEB ) and λ(α SVEB |ω F ) = λ(α N |ω F ).

Upper bound of risk
From Tables 1 and 2, the value of was calculated as described in section as $52,817.
 max