序分类评价指标大全

DeniuHe

已于 2022-03-31 13:42:27 修改

阅读量448

点赞数

分类专栏：算法文章标签： ordinal

于 2021-04-27 09:49:56 首次发布

本文链接：https://blog.csdn.net/DeniuHe/article/details/116191052

版权

算法专栏收录该内容

193 篇文章 2 订阅

订阅专栏

这篇博客探讨了四种评估序分类模型的方法：Mean Zero-one Error (MZE)、Mean Absolute Error (MAE)、标准差以及C指数。MZE是错误率，MAE衡量预测排名的平均绝对偏差。标准差用于评估类别间性能不平衡。C指数考虑观察类标签与预测类标签的关系，避免对结果正确性的过度约束。作者指出，C指数可能过于宽容。文章还提供了一个C指数的Python实现。

摘要由CSDN通过智能技术生成

Different measures can be considered for evaluating ordinal regression models.

However, the most common ones are the Mean Zero-one Error (MZE) and the Mean Absolute Error(MAE).

1. MZE

MZE is the error rate of the classifier:

$\LARGE MZE=\frac{1}{N}\sum_{i=1}^N\delta(y_i^*\neq y_i)=1-Acc$

2. MAE

MAE is the average deviation in absolute value of the predicted rank ( $\LARGE \mathcal{O}(y_i^*)$ ) from the true one( $\LARGE \mathcal{O}(y_i)$ ):

$\LARGE MAE=\frac{1}{N}\sum_{i=1}^N|\mathcal{O}(y_i)-\mathcal{O}(y_i^*)|$

3. SD (Standard Deviation)

Meanwhile, in order to measure the performance imbalance among classes by the CNN model, we adopt the standard deviation[10]; it is a measure of the dispersion of data distribution. A samller standard deviation implies lesser deviation of the valus from the average, and vice versa.

We express it as:

$\LARGE \LARGE Standard \, Deviation = \sqrt{\frac{1}{N}\sum_{m=1}^M(AR^m-AR^{avg})^2}$

Where $\LARGE AR^{avg}$ is the average accuracy on all the classes and $\LARGE AR^m$ is the classification accuracy of the m-th class by the CNN model $\LARGE \mathcal{M}$ on the test set.

4. C-index

The value of MAE ranges from 0 to r-1(maximum absolute error between classes). Because the real distances among the class labels are unknown, the nuerical representation of the class labels has a strong impact on the MAE performance.

In order to avoid the above-mentioned impact, a more suitable approach is to consider the relation between the observed class label and the predicted class label.

Here we use the concordance index or C-index to represent these relations. The C-index is computed as the proportion of the number of concordant pairs to the number of comparable pairs[1].

$\LARGE C-index = \frac{1}{\sum_{C_p\prec C_q}T_{C_p}T_{C_q}}\sum_{y_i\prec y_j}\left [\delta(y^*_i \prec y^*_j) +0.5\delta(y^*_i=y^*_j) \right ]$

[1]W. Waegeman, Learning to rank: a ROC-based graph-theoretic approach, 675 Ph.D. thesis, Springer (2009)

我没找到python中相应的库函数，所以就自己写了一个：

通过撸码可以明显感觉到C-index这个指标对预测结果进行极大的约束放松，极大的降低了对结果正确性的约束。

该指标用于序分类性能的评估，本人觉得：太宽松了！

'''Daniel He'''
import numpy as np
from sklearn.metrics import accuracy_score, mean_absolute_error, f1_score, recall_score
from sklearn.metrics import balanced_accuracy_score
from collections import OrderedDict
from itertools import combinations
y_true = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6]
y_pred = [0,0,1,0,1,2,1,2,2,2,3,5,2,4,4,5,5,6,6,6,6]
def C_index(y_true,y_pred):
    labels = np.sort(np.unique(y_true))
    pair_set = OrderedDict()
    for label_pair in combinations(labels,2):
        pair_set[label_pair] = []
    for idx_pair in combinations(range(len(y_true)),2):
        if y_true[idx_pair[0]] < y_true[idx_pair[1]]:
            pair_set[(y_true[idx_pair[0]],y_true[idx_pair[1]])].append((idx_pair[0],idx_pair[1]))
        elif y_true[idx_pair[0]] > y_true[idx_pair[1]]:
            pair_set[(y_true[idx_pair[1]], y_true[idx_pair[0]])].append((idx_pair[1], idx_pair[0]))
        else:
            continue
    nPairs = 0
    nResult = 0
    for label_pair,idx_pair_list in pair_set.items():
        nPairs += len(idx_pair_list)
        for pair in idx_pair_list:
            if y_pred[pair[0]] < y_pred[pair[1]]:
                nResult += 1
            elif y_pred[pair[0]] == y_pred[pair[1]]:
                nResult += 0.5

    # print(nResult/nPairs)
    # print(accuracy_score(y_true=y_true,y_pred=y_pred))
    # print(mean_absolute_error(y_true=y_true,y_pred=y_pred))
    return nResult/nPairs

r = C_index(y_true=y_true,y_pred=y_pred)
print(r)