一、Multi-Class Multi-Label问题定义
所谓多分类(Multi-Class)是区别于二分类的一个概念,在二分类问题当中,数据的标签只是0,1二值类型,比如“是否”是一只狗,“是否”患病。而多分类则对应于更多的类别,比如判断物体是猫,狗,鸟,兔…判断病人患的是A,B,C,D中的某一种病。值得注意的是,多分类问题中常常只有一个类别是正确的。
什么是多标签(Multi-Label)呢?简单来说,就是一个样本同时具有多个标签,例如一张风景图,里面有天空、猫、狗、鸟、树,如果这些类别都属于当前任务所需要识别的类别之内,那么它就具有多个标签。显然,多标签任务的难度要高的多。
二、评估方式
参考[1] [2],多分类多标签模型的评估指标通常分为两大类: example-based metrics, label-based metrics。
Example-based Metrics
- Subset accuracy
s u b s e t a c c ( h ) = 1 p ∑ i = 1 p I [ h ( x i ) = Y i ] subsetacc(h)=\frac{1}{p} \sum_{i=1}^{p} I\left[h\left(x_{i}\right)=Y_{i}\right] subsetacc(h)=p1i=1∑pI[h(xi)=Yi]
其中 h ( ⋅ ) h(\cdot) h(⋅)指代一个多标签分类器 h : X → 2 Y h: X \rightarrow 2^{Y} h:X→2Y, h ( x ) h(x) h(x)返回预测的标签集合, p p p为样本个数。
# gt为真实标签,predict为预测标签
# 形式例如:gt=[[1,0,0,1]], predict=[[1,0,1,1]]
def example_subset_accuracy(gt, predict):
ex_equal = np.all(np.equal(gt, predict), axis=1).astype("float32")
return np.mean(ex_equal)
- Example accuracy
Accuracy exam ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ Y i ∪ h ( x i ) ∣ \operatorname{Accuracy}_{\operatorname{exam}}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|Y_{i} \cup h\left(x_{i}\right)\right|} Accuracyexam(h)=p1i=1∑p∣Yi∪h(xi)∣∣Yi∩h(xi)∣
def example_accuracy(gt, predict):
ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
ex_or = np.sum(np.logical_or(gt, predict), axis=1).astype("float32")
return np.mean(ex_and / (ex_or+epsilon))
- Example precision
Precision exam ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ h ( x i ) ∣ \text { Precision }_{\text {exam }}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|h\left(x_{i}\right)\right|} Precision exam (h)=p1i=1∑p∣h(xi)∣∣Yi∩h(xi)∣
def example_precision(gt, predict):
ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
ex_predict = np.sum(predict, axis=1).astype("float32")
return np.mean(ex_and / (ex_predict + epsilon))
- Example recall
Recall e x a m ( h ) = 1 p ∑ i = 1 p ∣ Y i ∩ h ( x i ) ∣ ∣ Y i ∣ \text { Recall }_{\text exam}(h)=\frac{1}{p} \sum_{i=1}^{p} \frac{\left|Y_{i} \cap h\left(x_{i}\right)\right|}{\left|Y_{i}\right|} Recall exam(h)=p1i=1∑p∣Yi∣∣Yi∩h(xi)∣
def example_recall(gt, predict):
ex_and = np.sum(np.logical_and(gt, predict), axis=1).astype("float32")
ex_gt = np.sum(gt, axis=1).astype("float32")
return np.mean(ex_and / (ex_gt + epsilon))
- Example F1 (带
β
\beta
β)
β > 0 \beta>0 β>0度量查全率(recall)对查准率(precision)的相对重要性, β = 1 \beta=1 β=1时退化为标准的F1, β > 1 \beta>1 β>1时查全率有更大影响, β < 1 \beta<1 β<1时查准率有大更影响。
F exam β ( h ) = ( 1 + β 2 ) ⋅ Precsion exam ( h ) ⋅ Recall exam ( h ) β 2 ⋅ Precision exam ( h ) + Recall exam ( h ) F_{\text {exam }}^{\beta}(h)=\frac{\left(1+\beta^{2}\right) \cdot \text { Precsion }_{\text {exam }}(h) \cdot \text { Recall }_{\text {exam }}(h)}{\beta^{2} \cdot \text { Precision }_{\text {exam }}(h)+\text { Recall }_{\text {exam }}(h)} Fexam β(h)=β2⋅ Precision exam (h)+ Recall exam (h)(1+β2)⋅ Precsion exam (h)⋅ Recall exam (h)
def example_f1(gt, predict, beta=1):
p = example_precision(gt, predict)
r = example_recall(gt, predict)
return ((1+beta**2) * p * r) / ((beta**2)*(p + r + epsilon))
Label-based Metrics
在计算label-based metrics之前,需要计算所需的基本统计量
-
T
P
,
T
N
,
F
P
,
F
N
TP,TN,FP,FN
TP,TN,FP,FN的计算
T P j = ∣ { x i ∣ y j ∈ Y i ∧ y j ∈ h ( x i ) , 1 ≤ i ≤ p } ∣ F P j = ∣ { x i ∣ y j ∉ Y i ∧ y j ∈ h ( x i ) , 1 ≤ i ≤ p } ∣ T N j = ∣ { x i ∣ y j ∉ Y i ∧ y j ∉ h ( x i ) , 1 ≤ i ≤ p } ∣ F N j = ∣ { x i ∣ y j ∈ Y i ∧ y j ∉ h ( x i ) , 1 ≤ i ≤ p } ∣ \begin{array}{l}T P_{j}=\left|\left\{x_{i} \mid y_{j} \in Y_{i} \wedge y_{j} \in h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ F P_{j}=\left|\left\{x_{i} \mid y_{j} \notin Y_{i} \wedge y_{j} \in h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ T N_{j}=\left|\left\{x_{i} \mid y_{j} \notin Y_{i} \wedge y_{j} \notin h\left(x_{i}\right), 1 \leq i \leq p\right\}\right| \\ F N_{j}=\left|\left\{x_{i} \mid y_{j} \in Y_{i} \wedge y_{j} \notin h\left(x_{i}\right), 1 \leq i \leq p\right\}\right|\end{array} TPj=∣{xi∣yj∈Yi∧yj∈h(xi),1≤i≤p}∣FPj=∣{xi∣yj∈/Yi∧yj∈h(xi),1≤i≤p}∣TNj=∣{xi∣yj∈/Yi∧yj∈/h(xi),1≤i≤p}∣FNj=∣{xi∣yj∈Yi∧yj∈/h(xi),1≤i≤p}∣
其中 p p p代表样本个数, y j y_j yj代表第 j j j个类别的真实标签, T P j , F P j , T N j , F N j TP_j,FP_j,TN_j,FN_j TPj,FPj,TNj,FNj四类基本参数代表各自类别的二元分类性能,满足 T P j + F P j + T N j + F N j = p TP_j+FP_j+TN_j+FN_j=p TPj+FPj+TNj+FNj=p。
def _label_quantity(gt, predict):
tp = np.sum(np.logical_and(gt, predict), axis=0)
fp = np.sum(np.logical_and(1-gt, predict), axis=0)
tn = np.sum(np.logical_and(1-gt, 1-predict), axis=0)
fn = np.sum(np.logical_and(gt, 1-predict), axis=0)
return np.stack([tp, fp, tn, fn], axis=0).astype("float")
-
Accuracy, Precision, Recall,F1的计算
Accuracy ( T P j , F P j , T N j , F N j ) = T P j + T N j T P j + F P j + T N j + F N j Precision ( T P j , F P j , T N j , F N j ) = T P j T P j + F P j Recall ( T P j , F P j , T N j , F N j ) = T P j T P j + F N j F β ( T P j , F P j , T N j , F N j ) = ( 1 + β 2 ) ⋅ T P j ( 1 + β 2 ) T P j + β 2 ⋅ F N j + F P j \begin{array}{c}\text { Accuracy }\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}+T N_{j}}{T P_{j}+F P_{j}+T N_{j}+F N_{j}} \\ \text { Precision }\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}}{T P_{j}+F P_{j}} \\ \operatorname{Recall}\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{T P_{j}}{T P_{j}+F N_{j}} \\ F^{\beta}\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right)=\frac{\left(1+\beta^{2}\right) \cdot T P_{j}}{\left(1+\beta^{2}\right) T P_{j}+\beta^{2} \cdot F N_{j}+F P_{j}}\end{array} Accuracy (TPj,FPj,TNj,FNj)=TPj+FPj+TNj+FNjTPj+TNj Precision (TPj,FPj,TNj,FNj)=TPj+FPjTPjRecall(TPj,FPj,TNj,FNj)=TPj+FNjTPjFβ(TPj,FPj,TNj,FNj)=(1+β2)TPj+β2⋅FNj+FPj(1+β2)⋅TPj -
Marco平均与Micro平均
B macro ( h ) = 1 q ∑ j = 1 q B ( T P j , F P j , T N j , F N j ) B micro ( h ) = B ( ∑ j = 1 q T P j , ∑ j = 1 q F P j , ∑ j = 1 q T N j , ∑ j = 1 q F N j ) B_{\operatorname{macro}}(h)=\frac{1}{q} \sum_{j=1}^{q} B\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right) \\ B_{\text {micro }}(h)=B\left(\sum_{j=1}^{q} T P_{j}, \sum_{j=1}^{q} F P_{j}, \sum_{j=1}^{q} T N_{j}, \sum_{j=1}^{q} F N_{j}\right) Bmacro(h)=q1j=1∑qB(TPj,FPj,TNj,FNj)Bmicro (h)=B(j=1∑qTPj,j=1∑qFPj,j=1∑qTNj,j=1∑qFNj)
其中, B ( T P j , F P j , T N j , F N j ) B\left(T P_{j}, F P_{j}, T N_{j}, F N_{j}\right) B(TPj,FPj,TNj,FNj)指代一种计算方法 B ∈ { Accuracy, Precision, Recall, F β } B \in\left\{\text { Accuracy, Precision, Recall, } F^{\beta}\right\} B∈{ Accuracy, Precision, Recall, Fβ}。Macro指代对类作平均,Micro指代对样本作平均, q q q为总的类别数。
- Label accuracy
- Macro
def label_accuracy_macro(gt, predict):
quantity = _label_quantity(gt, predict)
tp_tn = np.add(quantity[0], quantity[2])
tp_fp_tn_fn = np.sum(quantity, axis=0)
return np.mean(tp_tn / (tp_fp_tn_fn + epsilon))
- Micro
def label_accuracy_micro(gt, predict):
quantity = _label_quantity(gt, predict)
sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
return (sum_tp + sum_tn) / (
sum_tp + sum_fp + sum_tn + sum_fn + epsilon)
- Label precision
- Macro
def label_precision_macro(gt, predict):
quantity = _label_quantity(gt, predict)
tp = quantity[0]
tp_fp = np.add(quantity[0], quantity[1])
return np.mean(tp / (tp_fp + epsilon))
- Micro
def label_precision_micro(gt, predict):
quantity = _label_quantity(gt, predict)
sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
return sum_tp / (sum_tp + sum_fp + epsilon)
- Label recall
- Macro
def label_recall_macro(gt, predict):
quantity = _label_quantity(gt, predict)
tp = quantity[0]
tp_fn = np.add(quantity[0], quantity[3])
return np.mean(tp / (tp_fn + epsilon))
- Micro
def label_recall_micro(gt, predict):
quantity = _label_quantity(gt, predict)
sum_tp, sum_fp, sum_tn, sum_fn = np.sum(quantity, axis=1)
return sum_tp / (sum_tp + sum_fn + epsilon)
- Label F1
- Macro
def label_f1_macro(gt, predict, beta=1):
quantity = _label_quantity(gt, predict)
tp = quantity[0]
fp = quantity[1]
fn = quantity[3]
return np.mean((1 + beta**2) * tp / ((1 + beta**2) * tp + beta**2 * fn + fp + epsilon))
- Micro
def label_f1_micro(gt, predict, beta=1):
quantity = _label_quantity(gt, predict)
tp = np.sum(quantity[0])
fp = np.sum(quantity[1])
fn = np.sum(quantity[3])
return (1 + beta**2) * tp / ((1 + beta**2) * tp + beta**2 * fn + fp + epsilon)
注:epsilon设置为如1e-8的常数防止zero-division的情况发生。
Reference
[1] M. Zhang and Z. Zhou, “A Review on Multi-Label Learning Algorithms,” in IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 8, pp. 1819-1837, Aug. 2014, doi: 10.1109/TKDE.2013.39.
[2]Wei Long, Yang Yang, Hong-Bin Shen, ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images, Bioinformatics, Volume 36, Issue 7, 1 April 2020, Pages 2244–2250, https://doi.org/10.1093/bioinformatics/btz909
[3]https://github.com/Outliers1106/Multi-Label-Metrics