多分类多标签模型的评估方式(定义+numpy代码实现)

一、Multi-Class Multi-Label问题定义

所谓多分类(Multi-Class)是区别于二分类的一个概念,在二分类问题当中,数据的标签只是0,1二值类型,比如“是否”是一只狗,“是否”患病。而多分类则对应于更多的类别,比如判断物体是猫,狗,鸟,兔…判断病人患的是A,B,C,D中的某一种病。值得注意的是,多分类问题中常常只有一个类别是正确的。

什么是多标签(Multi-Label)呢?简单来说,就是一个样本同时具有多个标签,例如一张风景图,里面有天空、猫、狗、鸟、树,如果这些类别都属于当前任务所需要识别的类别之内,那么它就具有多个标签。显然,多标签任务的难度要高的多。

二、评估方式

参考[1] [2],多分类多标签模型的评估指标通常分为两大类: example-based metrics, label-based metrics。

Example-based Metrics

  1. Subset accuracy
    s u b s e t a c c ( h ) = 1 p ∑ i = 1 p I [ h ( x i ) = Y i ] subsetacc(h)=\frac{1}{p} \sum_{i=1}^{p} I\left[h\left(x_{i}\right)=Y_{i}\right] subsetacc(h)=p1i=1pI[h(xi)=Yi]

其中 h ( ⋅ ) h(\cdot) h()指代一个多标签分类器 h : X → 2 Y h: X \rightarrow 2^{Y} h:X2Y h ( x ) h(x) h(x)返回预测的标签集合, p p p为样本个数。

# gt为真实标签,predict为预测标签
# 形式例如:gt=[[1,0,0,1]], predict=[[1,0,1,1]]
def example_subset_accuracy(gt, predict):
    ex_equal = np.all(np.equal(gt, predict), axis=1).astype("float32")
    return np.mean(ex_equal)
  1. Example accuracy
    Accuracy ⁡ exam ⁡ ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ Y i ∪ h ( x i ) ∣ \operatorname{Accuracy}_{\operatorname{exam}}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|Y_{i} \cup h\left(x_{i}\right)\right|} Accuracyexam(h)=p1i=1pYih(xi)Yih(xi)
def example_accuracy(gt, predict):
    ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
    ex_or = np.sum(np.logical_or(gt, predict), axis=1).astype("float32")
    return np.mean(ex_and / (ex_or+epsilon))
  1. Example precision
     Precision  exam  ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ h ( x i ) ∣ \text { Precision }_{\text {exam }}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|h\left(x_{i}\right)\right|}  Precision exam (h)=p1i=1ph(xi)Yih(xi)
def example_precision(gt, predict):
    ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
    ex_predict = np.sum(predict, axis=1).astype("float32")
    return np.mean(ex_and / (ex_predict + epsilon))
  1. Example recall
     Recall  e x a m ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ Y i ∣ \text { Recall }_{\text exam}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|Y_{i}\right|}  Recall exam(h)=p1i=1pYiYih(xi)
def example_recall(gt, predict):
    ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
    ex_gt = np.sum(gt, axis=1).astype("float32")
    return np.mean(ex_and / (ex_gt + epsilon))
  1. Example F1 (带 β \beta β)
    β > 0 \beta>0 β>0度量查全率(recall)对查准率(precision)的相对重要性, β = 1 \beta=1 β=1时退化为标准的F1, β > 1 \beta>1 β>1时查全率有更大影响, β < 1 \beta<1 β<1时查准率有大更影响。
    F exam  β ( h ) = ( 1 + β 2 ) ⋅  Precsion  exam  ( h ) ⋅  Recall  exam  ( h ) β 2 ⋅  Precision  exam  ( h ) +  Recall  exam  ( h ) F_{\text {exam }}^{\beta}(h)=\frac{\left(1+\beta^{2}\right) \cdot \text { Precsion }_{\text {exam }}(h) \cdot \text { Recall }_{\text {exam }}(h)}{\beta^{2} \cdot \text { Precision }_{\text {exam }}(h)+\text { Recall }_{\text {exam }}(h)} Fexam β(h)=β2 Precision exam (h)+ Recall exam (h)(1+β2) Precsion exam (h) Recall exam (h)
def example_f1(gt, predict, beta=1):
    p = example_precision(gt, predict)
    r = example_recall(gt, predict)
    return ((1+beta**2) * p * r) / ((beta**2)*(p + r + epsilon))

Label-based Metrics

在计算label-based metrics之前,需要计算所需的基本统计量

  • T P , T N , F P , F N TP,TN,FP,FN TP,TN,FP,FN的计算
    T P j = ∣ { x i ∣ y j ∈ Y i ∧ y j ∈ h ( x i ) , 1 ≤ i ≤ p } ∣ F P j = ∣ { x i ∣ y j ∉ Y i ∧ y j ∈ h ( x i ) , 1 ≤ i ≤ p } ∣ T N j = ∣ { x i ∣ y j ∉ Y i ∧ y j ∉ h ( x i ) , 1 ≤ i ≤ p } ∣ F N j = ∣ { x i ∣ y j ∈ Y i ∧ y j ∉ h ( x i ) , 1 ≤ i ≤ p } ∣ \begin{array}{l}T P_{j}=\left|\left\{x_{i} \mid y_{j} \in Y_{i} \wedge y_{j} \in h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ F P_{j}=\left|\left\{x_{i} \mid y_{j} \notin Y_{i} \wedge y_{j} \in h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ T N_{j}=\left|\left\{x_{i} \mid y_{j} \notin Y_{i} \wedge y_{j} \notin h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ F N_{j}=\left|\left\{x_{i} \mid y_{j} \in Y_{i} \wedge y_{j} \notin h\left(x_{i}\right), 1 \leq i \leq p\right\}\right|\end{array} TPj={xiyjYiyjh(xi),1ip}FPj={xiyj/Yiyjh(xi),1ip}TNj={xiyj/Yiyj/h(xi),1ip}FNj={xiyjYiyj/h(xi),1ip}

其中 p p p代表样本个数, y j y_j yj代表第 j j j个类别的真实标签, T P j , F P j , T N j , F N j TP_j,FP_j,TN_j,FN_j TPj,FPj,TNj,FNj四类基本参数代表各自类别的二元分类性能,满足 T P j + F P j + T N j + F N j = p TP_j+FP_j+TN_j+FN_j=p TPj+FPj+TNj+FNj=p

def _label_quantity(gt, predict):
    tp = np.sum(np.logical_and(gt, predict), axis=0)
    fp = np.sum(np.logical_and(1-gt, predict), axis=0)
    tn = np.sum(np.logical_and(1-gt, 1-predict), axis=0)
    fn = np.sum(np.logical_and(gt, 1-predict), axis=0)
    return np.stack([tp, fp, tn, fn], axis=0).astype("float")
  • Accuracy, Precision, Recall,F1的计算
     Accuracy  ( T P j , F P j , T N j , F N j ) = T P j + T N j T P j + F P j + T N j + F N j  Precision  ( T P j , F P j , T N j , F N j ) = T P j T P j + F P j Recall ⁡ ( T P j , F P j , T N j , F N j ) = T P j T P j + F N j F β ( T P j , F P j , T N j , F N j ) = ( 1 + β 2 ) ⋅ T P j ( 1 + β 2 ) T P j + β 2 ⋅ F N j + F P j \begin{array}{c}\text { Accuracy }\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}+T N_{j}}{T P_{j}+F P_{j}+T N_{j}+F N_{j}} \\ \text { Precision }\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}}{T P_{j}+F P_{j}} \\ \operatorname{Recall}\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}}{T P_{j}+F N_{j}} \\ F^{\beta}\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{\left(1+\beta^{2}\right) \cdot T P_{j}}{\left(1+\beta^{2}\right) T P_{j}+\beta^{2} \cdot F N_{j}+F P_{j}}\end{array}  Accuracy (TPj,FPj,TNj,FNj)=TPj+FPj+TNj+FNjTPj+TNj Precision (TPj,FPj,TNj,FNj)=TPj+FPjTPjRecall(TPj,FPj,TNj,FNj)=TPj+FNjTPjFβ(TPj,FPj,TNj,FNj)=(1+β2)TPj+β2FNj+FPj(1+β2)TPj

  • Marco平均与Micro平均

    B macro ⁡ ( h ) = 1 q ∑ j = 1 q B ( T P j , F P j , T N j , F N j ) B micro  ( h ) = B ( ∑ j = 1 q T P j , ∑ j = 1 q F P j , ∑ j = 1 q T N j , ∑ j = 1 q F N j ) B_{\operatorname{macro}}(h)=\frac{1}{q} \sum_{j=1}^{q} B\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right) \\ B_{\text {micro }}(h)=B\left(\sum_{j=1}^{q} T P_{j}, \sum_{j=1}^{q} F P_{j}, \sum_{j=1}^{q} T N_{j}, \sum_{j=1}^{q} F N_{j}\right) Bmacro(h)=q1j=1qB(TPj,FPj,TNj,FNj)Bmicro (h)=B(j=1qTPj,j=1qFPj,j=1qTNj,j=1qFNj)
    其中, B ( T P j , F P j , T N j , F N j ) B\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right) B(TPj,FPj,TNj,FNj)指代一种计算方法 B ∈ {  Accuracy, Precision, Recall,  F β } B \in\left\{\text { Accuracy, Precision, Recall, } F^{\beta}\right\} B{ Accuracy, Precision, Recall, Fβ}。Macro指代对类作平均,Micro指代对样本作平均, q q q为总的类别数。

  1. Label accuracy
  • Macro
def label_accuracy_macro(gt, predict):
    quantity = _label_quantity(gt, predict)
    tp_tn = np.add(quantity[0], quantity[2])
    tp_fp_tn_fn = np.sum(quantity, axis=0)
    return np.mean(tp_tn / (tp_fp_tn_fn + epsilon))
  • Micro
def label_accuracy_micro(gt, predict):
    quantity = _label_quantity(gt, predict)
    sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
    return (sum_tp + sum_tn) / (
            sum_tp + sum_fp + sum_tn + sum_fn + epsilon)
  1. Label precision
  • Macro
def label_precision_macro(gt, predict):
    quantity = _label_quantity(gt, predict)
    tp = quantity[0]
    tp_fp = np.add(quantity[0], quantity[1])
    return np.mean(tp / (tp_fp + epsilon))
  • Micro
def label_precision_micro(gt, predict):
    quantity = _label_quantity(gt, predict)
    sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
    return sum_tp / (sum_tp + sum_fp + epsilon)
  1. Label recall
  • Macro
def label_recall_macro(gt, predict):
    quantity = _label_quantity(gt, predict)
    tp = quantity[0]
    tp_fn = np.add(quantity[0], quantity[3])
    return np.mean(tp / (tp_fn + epsilon))
  • Micro
def label_recall_micro(gt, predict):
    quantity = _label_quantity(gt, predict)
    sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
    return sum_tp / (sum_tp + sum_fn + epsilon)
  1. Label F1
  • Macro
def label_f1_macro(gt, predict, beta=1):
    quantity = _label_quantity(gt, predict)
    tp = quantity[0]
    fp = quantity[1]
    fn = quantity[3]
    return np.mean((1 + beta**2) * tp / ((1 + beta**2) * tp + beta**2 * fn + fp + epsilon))
  • Micro
def label_f1_micro(gt, predict, beta=1):
    quantity = _label_quantity(gt, predict)
    tp = np.sum(quantity[0])
    fp = np.sum(quantity[1])
    fn = np.sum(quantity[3])
    return (1 + beta**2) * tp / ((1 + beta**2) * tp + beta**2 * fn + fp + epsilon)

:epsilon设置为如1e-8的常数防止zero-division的情况发生。

Reference

[1] M. Zhang and Z. Zhou, “A Review on Multi-Label Learning Algorithms,” in IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 8, pp. 1819-1837, Aug. 2014, doi: 10.1109/TKDE.2013.39.
[2]Wei Long, Yang Yang, Hong-Bin Shen, ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images, Bioinformatics, Volume 36, Issue 7, 1 April 2020, Pages 2244–2250, https://doi.org/10.1093/bioinformatics/btz909
[3]https://github.com/Outliers1106/Multi-Label-Metrics

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值