A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

Carrington, André M.; Fieguth, Paul W.; Qazi, Hammad; Holzinger, Andreas; Chen, Helen H.; Mayr, Franz; Manuel, Douglas G.

doi:10.1186/s12911-019-1014-6

BMC Medical Informatics and Decision Making

Table 1 An overview of definitions for proposed measures and concepts in sections that follow with the same name

From: A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

1. The horizontal partial area under the curve (a section that follows) This partial area denoted pAUC_x, was suggested by Walter [2] and is defined for part or an ROC curve r(·) defined by TPR = [y1, y2] with inverse function r⁻¹(·): \( {pAUC}_x:={\int}_{y_1}^{y_2}1-{r}^{-1}(y) dy \)
2. The concordant partial area under the curve (a section that follows) This partial area denoted pAUC_c (Fig 1b) is defined for part of an ROC curve r(·) defined by FPR = [× 1, × 2] and TPR = [y1, y2], with inverse function r⁻¹(·): \( {\displaystyle \begin{array}{c} pAU{C}_c\triangleq \frac{1}{2} pAU C+\frac{1}{2} pAU{C}_x\\ {}=\frac{1}{2}{\int}_{x_1}^{x_2}r(x) dx+\frac{1}{2}{\int}_{y_1}^{y_2}1-{r}^{-1}(y) dy\end{array}} \)
3. The concordance matrix for ROC data (a section that follows) A matrix that depicts the exact relationship between the unique scores of positives and negatives in data and their corresponding points along a matrix border that exactly matches the (empirical) ROC curve. It geometrically and procedurally equates area measures AUC and pAUC_c to the statistics c and c_∆.
4. The partial c statistic for ROC data (a section that follows) This statistic denoted c_∆ is defined for ROC data with P actual positives {p_1…P} and N actual negatives {n_1…N} and a partial curve specified by a subset of J positives and K negatives, i.e., \( \left\{{p}_{1\dots J}^{\prime}\right\} \) and \( \left\{{n}_{1\dots K}^{\prime}\right\} \); with Heaviside function H(·) and classification scores g(·). We present simple c_∆ (the non-interpolated version) here: \( {\displaystyle \begin{array}{c}\mathrm{simple}\ {\mathrm{c}}_{\varDelta}\triangleq \frac{1}{2 JN}\sum \limits_{j=1}^J\sum \limits_{k=1}^NH\left(g\left({p}_j^{\prime}\right)-g\left({n}_k\right)\right)\\ {}+\frac{1}{2 PK}{\sum}_{j=1}^P{\sum}_{k=1}^KH\left(g\left({p}_j\right)-g\left({n}_k^{\prime}\right)\right)\end{array}} \)

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com