ML/DL-复习笔记【三】- 算法的评价指标

本节为ML/DL-复习笔记【三】- 算法的评价指标,主要内容包括:错误率、精度、查全率、查准率、F-Score、P-R曲线、ROC曲线、AUC 、(m)AP、(m)IoU、(m)PA、fwIoU及其python实现。

1. 错误率和精度

# 以二分类问题为例
import numpy as np

y_true = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
y_pred = np.array([0, 0, 1, 1, 0, 1, 1, 1, 0, 0])

  错误率就是分类错误的样本时占样本总数的比例,精度则是分类正确的样本数占样本总数的比例。错误率计算公式如下,精度则是 1 − E ( f , D ) 1-E(f,D) 1E(f,D)
E ( f , D ) = 1 m Σ i = 1 m ( f ( x i ) ≠ y i ) E(f,D)=\frac1m\Sigma_{i=1}^m(f(x_i)\neq y_i) E(f,D)=m1Σi=1m(f(xi)=yi)

  代码实现:

## 1.错误率和精度
precision = np.mean(y_pred == y_true)
error = 1 - precision
print(precision, error)

from sklearn.metrics import accuracy_score
# 返回准确率
precision = accuracy_score(y_true, y_pred, normalize=True)
# 返回正确分类的数量
precision_num = accuracy_score(y_true, y_pred, normalize=False)
print(precision, precision_num)

2. 查全率、查准率、F-Score

  对于二分类问题,定义如下混淆矩阵:
在这里插入图片描述
  查准率关心的是“检索出的信息中有多大的比例是用户感兴趣的”,定义如下:
P = T P T P + F P P=\frac{TP}{TP+FP} P=TP+FPTP

  查准率/召回率关心的是“用户感兴趣的信息中有多少被检索出来了”,定义如下:
R = T P T P + F N R=\frac{TP}{TP+FN} R=TP+FNTP

  一般地,查准率高查全率往往偏低,查准率低而查全率往往偏高,只有再一些简单任务中,才可能使查全率和查准率都很高,代码如下:

## 2. 查准率、查全率
from sklearn.metrics import precision_score, recall_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
print(precision, recall)

  将样本按置信度由大到小排序,然后遍历样本,每一次都以当前样本为阈值,之前的属于正例,之后的属于负例。每一个阈值都会得到一组P-R值,全部绘制到图上就是P-R曲线。P-R曲线可以直观的显示学习器在样本总体上的查全率、查准率,如下图所示,随着更多的样本被分为正样本,查全率不断增大,查准率降低。通常,当一个学习器的P-R曲线被另一个学习器的曲线完全包住时,则可断言后者的性能优于前者。例如下图中A的学习期性能优于C。但若曲线发生了交叉,那么需要借助F-Score。
在这里插入图片描述
  P-R曲线绘制代码如下,使用包中自带的数据集:

## 3. P-R曲线的绘制
from sklearn.metrics import precision_recall_curve
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np

iris = load_iris()
X = iris.data
y = iris.target
y = label_binarize(y, classes=[0, 1, 2])  # one-hot
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
# 训练模型
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 绘制P-R曲线
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
precision = {}
recall = {}
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i])
    ax.plot(recall[i], precision[i], label='target=%s' % i)
ax.set_xlabel("Recall Score")
ax.set_ylabel("Precision Score")
ax.set_title("P-R")
ax.legend(loc='best')
ax.set_xlim(0, 1.0)
ax.set_ylim(0, 1.0)
ax.grid()
plt.show()

  F-Score能够综合查准率和查全率,其中F1指标是基于二者的调和平均定义的,即:
1 F 1 = 1 2 ( 1 P + 1 R ) \frac1{F_1}=\frac12(\frac1P+\frac1R) F11=21(P1+R1)

  即:
F 1 = 2 P R P + R = 2 T P N + T P − T N F_1=2\frac{PR}{P+R}=2\frac{TP}{N+TP-TN} F1=2P+RPR=2N+TPTNTP

  其中N是样本总数。

  在有些应用中,对查准率和查全率的重视程度不同,有了F-Score的一般形式 F β F_{\beta} Fβ,为 P P P R R R的加权调和平均,即:
1 F β = 1 1 + β 2 ( 1 P + β 2 R ) \frac1{F_{\beta}}=\frac1{1+\beta^2}(\frac1P+\frac{\beta^2}R) Fβ1=1+β21(P1+Rβ2)

  即:
F β = ( 1 + β 2 ) P R β 2 P + R F_{\beta}=(1+\beta^2)\frac{PR}{\beta^2P+R} Fβ=(1+β2)β2P+RPR

  其中 β > 0 \beta > 0 β>0衡量了查全率对查准率的相对重要性, β > 1 \beta>1 β>1说明查全率更重高,反之查准率更重要。

3.ROC与AUC

  按样本置信度由大到小进行排序,再逐个样本选择阈值,该样本之前的属于正例,之后的属于负例。每一个样本作为阈值时都会得到对应的真正例率TPR和假正例率FPR,定义分别为:
T P R = T P T P + F N TPR=\frac{TP}{TP+FN} TPR=TP+FNTP
F P R = F P T N + F P FPR=\frac{FP}{TN+FP} FPR=TN+FPFP

  然后以FPR为横轴,TPR为纵轴绘制得到ROC曲线。对于 ( 0 , 0 ) (0,0) (0,0)点,取的阈值大于所有样本的最大置信度,所有样本都会被预测为负例,此时TP、FP为0,即TPR、FPR为0。然后逐步降低阈值,直到所有样本都被预测为正例。

  对于随机猜测,理想情况下有 T P R = F P R TPR=FPR TPR=FPR,此时对于ROC曲线就是对角线。当一个学习器的ROC曲线被另一个学习器的ROC曲线完全包住,则后者性能更优。若发生交叉,可用ROC曲线下的面积进行判断,称为AUC。若ROC曲线的坐标由点集合 { ( x 1 , y 1 ) , . . . , ( x N , y N ) } \{(x_1,y_1),...,(x_N,y_N)\} {(x1,y1),...,(xN,yN)}连成,则AUC可估算为:
A U C = 1 2 Σ i = 1 N − 1 ( x i + 1 − x i ) ( y i + y i + 1 ) AUC=\frac12\Sigma_{i=1}^{N-1}(x_{i+1}-x_i)(y_i+y_{i+1}) AUC=21Σi=1N1(xi+1xi)(yi+yi+1)

  绘制ROC曲线计算AUG的代码如下:

## 5. ROC与AUC
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np

# 加载数据
iris = load_iris()
X = iris.data
y = iris.target
# one-hot
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# 训练模型
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 获取ROC
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
fpr = {}
tpr = {}
roc_auc = {}
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    ax.plot(fpr[i], tpr[i], label="target=%s,auc=%s" % (i, roc_auc[i]))
ax.plot([0, 1], [0, 1], 'k--')
ax.set_xlabel("FPR")
ax.set_ylabel("TPR")
ax.set_title("ROC")
ax.legend(loc="best")
ax.set_xlim(0, 1.1)
ax.set_ylim(0, 1.1)
ax.grid()
plt.show()

4. 目标检测任务中IoU和mAP

  目标检测任务中常用的评价指标即各类别平均精度,在了解并计算mAP之前,先看一下IoU也即交并比的基本知识。

4.1 IoU

  IoU的计算公式如下:
I o U = A r e a   o f   O v e r l a p A r e a   o f   U n i o n = A p r e d ⋂ A t r u e A p r e d ⋃ A t r u e IoU=\frac{Area\ of\ Overlap}{Area\ of\ Union}=\frac{A_{pred}\bigcap A_{true}}{A_{pred}\bigcup A_{true}} IoU=Area of UnionArea of Overlap=ApredAtrueApredAtrue

  python实现如下,假定我们给出两个矩形框,每个矩形框包含左上和右下坐标或者中心点坐标和宽高尺寸:

## 6. IoU
import numpy as np

def compute_iou(box1, box2, wh=False):
    """
    compute the iou of two boxes.
    Args:
        box1, box2: [xmin, ymin, xmax, ymax] (wh=False) or [xcenter, ycenter, w, h] (wh=True)
        wh: the format of coordinate.
    Return:
        iou: iou of box1 and box2.
    """
    if wh == False:
        xmin1, ymin1, xmax1, ymax1 = box1
        xmin2, ymin2, xmax2, ymax2 = box2
    else:
        xmin1, ymin1 = int(box1[0] - box1[2] / 2.0), int(box1[1] - box1[3] / 2.0)
        xmax1, ymax1 = int(box1[0] + box1[2] / 2.0), int(box1[1] + box1[3] / 2.0)
        xmin2, ymin2 = int(box2[0] - box2[2] / 2.0), int(box2[1] - box2[3] / 2.0)
        xmax2, ymax2 = int(box2[0] + box2[2] / 2.0), int(box2[1] + box2[3] / 2.0)

    ## 获取矩形框交集对应的左上和右下的坐标
    xx1 = np.max([xmin1, xmin2])
    yy1 = np.max([ymin1, ymin2])
    xx2 = np.min([xmax1, xmax2])
    yy2 = np.min([ymax1, ymax2])

    ## 计算两个矩形框面积
    area1 = (xmax1 - xmin1) * (ymax1 - ymin1)
    area2 = (xmax2 - xmin2) * (ymax2 - ymin2)

    ## 计算交集面积
    inter_area = np.max([0, xx2 - xx1]) * np.max([0, yy2 - yy1])

    ## 计算交并比
    IoU = inter_area / (area1 + area2 - inter_area)
    return IoU

4.2 mAP

  假设现在我们有一组目标检测的实验结果,包含三个数据,每个数据都由两个矩形框和一个置信度组成,模型预测的框记为 p r e i pre_i prei,真实的标签矩形框记为 l a b e l i label_i labeli i = 1 , 2 , 3 i=1,2,3 i=1,2,3,假设三个 p r e pre pre的置信度分别为 0.9 , 0.8 0.9,0.8 0.9,0.8 0.7 0.7 0.7

  首先我们计算每个数据中 p r e pre pre l a b e l label label的IoU,现以0.5为一个阈值,当 I o U IoU IoU大于0.5则这个 p r e pre pre为混淆矩阵中的 T P TP TP,否则为 F P FP FP。假设我们的三个数据 p r e 1 pre1 pre1 p r e 3 pre3 pre3 T P TP TP p r e 2 pre2 pre2 F P FP FP

  然后根据置信度排序,这里 p r e 1 pre1 pre1 p r e 2 pre2 pre2 p r e 3 pre3 pre3正好是从高到低。

  然后在不同置信度阈值下计算Precision和Recall。首先设置阈值0.9,则无视所有小于0.9的pre,此时检测器的pre框即TP+FP=1,且pre1是TP,即Precision=1,而label数目为3,所以Recall=1/3。同理得到其他两组P、R值,分别为(1/2,1/3)和(2/3,2/3)。

  绘制PR曲线,然后每个峰值点往左画一条线段直到与上一个峰值点的垂直线相交,这样红色线段和坐标轴围起来的面积就是AP值,如下图所示,mAP就是每个类的AP值相加取平均即可,
在这里插入图片描述
  python代码:

# -*- coding: utf-8 -*-
# @File    : https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/master/pytorchyolo/utils/utils.py
# @Desc    :
def ap_per_class(tp, conf, pred_cls, target_cls):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:    True positives (list).
        conf:  Objectness value from 0-1 (list).
        pred_cls: Predicted object classes (list).
        target_cls: True object classes (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """

    # Sort by objectness
    i = np.argsort(-conf)
    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]

    # Find unique classes
    unique_classes = np.unique(target_cls)

    # Create Precision-Recall curve and compute AP for each class
    ap, p, r = [], [], []
    for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
        i = pred_cls == c
        n_gt = (target_cls == c).sum()  # Number of ground truth objects
        n_p = i.sum()  # Number of predicted objects

        if n_p == 0 and n_gt == 0:
            continue
        elif n_p == 0 or n_gt == 0:
            ap.append(0)
            r.append(0)
            p.append(0)
        else:
            # Accumulate FPs and TPs
            fpc = (1 - tp[i]).cumsum()
            tpc = (tp[i]).cumsum()

            # Recall
            recall_curve = tpc / (n_gt + 1e-16)
            r.append(recall_curve[-1])

            # Precision
            precision_curve = tpc / (tpc + fpc)
            p.append(precision_curve[-1])

            # AP from recall-precision curve
            ap.append(compute_ap(recall_curve, precision_curve))

    # Compute F1 score (harmonic mean of precision and recall)
    p, r, ap = np.array(p), np.array(r), np.array(ap)
    f1 = 2 * p * r / (p + r + 1e-16)

    return p, r, ap, f1, unique_classes.astype("int32")


def compute_ap(recall, precision):
    """ Compute the average precision, given the recall and precision curves.
    Code originally from https://github.com/rbgirshick/py-faster-rcnn.
    # Arguments
        recall:    The recall curve (list).
        precision: The precision curve (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    # correct AP calculation
    # first append sentinel values at the end
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([0.0], precision, [0.0]))

    # compute the precision envelope
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap

5. 图像分割任务中的PA、mIoU、FwIoU

  假设共有 k + 1 k+1 k+1个类,其中包含一个背景类, p i j p_{ij} pij表示本属于类别 i i i但是被预测为类 j j j的像素数量,也就是说 p i i p_{ii} pii是真正的数量。

  像素准确率Pixel Accuracy表示标记正确的像素占总像素的比例:
P A = Σ i = 0 k p i i Σ i = 0 k Σ j = 0 k p i j PA=\frac{\Sigma_{i=0}^kp_{ii}}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}} PA=Σi=0kΣj=0kpijΣi=0kpii

  平均像素准确率Mean Pixel Accuracy,计算每个类被正确分类的像素数的比例,之后求平均:
M P A = 1 k + 1 Σ i = 0 k p i i Σ j = 0 k p i j MPA=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}} MPA=k+11Σi=0kΣj=0kpijpii

  平均交并比Mean Intersection over Union,最常用的,计算两个集合的交集和并集之比,在语义分割任务中,两个集合分别为真实值和预测值,这个比例可以变形为真正数闭上真正、假负、假正之和,在每个类上计算IoU,之后平均:
m I o U = 1 k + 1 Σ i = 0 k p i i Σ j = 0 k p i j + Σ j = 0 k ( p i j − p i i ) mIoU=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})} mIoU=k+11Σi=0kΣj=0kpij+Σj=0k(pijpii)pii

  频权交并比Frequency Weighted Intersection over Union,根据每个类出现的频率为其设置权重:
F W I o U = 1 Σ i = 0 k Σ j = 0 k p i j Σ i = 0 k p i i Σ j = 0 k p i j + Σ j = 0 k ( p i j − p i i ) FWIoU=\frac{1}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})} FWIoU=Σi=0kΣj=0kpij1Σi=0kΣj=0kpij+Σj=0k(pijpii)pii

  实现代码如下,工程链接:

# -*- coding: utf-8 -*-
# @Time    : 19-1-10 下午11:03
# @Author  : Zhao Lei
# @File    : metrics.py
# @Desc    :

import numpy as np


class Evaluator(object):
    def __init__(self, num_class):
        self.num_class = num_class
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def Pixel_Accuracy(self):
        Acc = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
        return Acc

    def Pixel_Accuracy_Class(self):
        Acc = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
        Acc = np.nanmean(Acc)
        return Acc

    def Mean_Intersection_over_Union(self):
        MIoU = np.diag(self.confusion_matrix) / (
                np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
                np.diag(self.confusion_matrix))
        MIoU = np.nanmean(MIoU)
        return MIoU

    def Frequency_Weighted_Intersection_over_Union(self):
        freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)
        iu = np.diag(self.confusion_matrix) / (
                np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
                np.diag(self.confusion_matrix))

        FWIoU = (freq[freq > 0] * iu[freq > 0]).sum()
        return FWIoU

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class ** 2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def add_batch(self, gt_image, pre_image):
        assert gt_image.shape == pre_image.shape
        self.confusion_matrix += self._generate_matrix(gt_image, pre_image)

    def reset(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)


欢迎扫描二维码关注微信公众号 深度学习与数学[每天获取免费的大数据、AI等相关的学习资源、经典和最新的深度学习相关的论文研读,算法和其他互联网技能的学习,概率论、线性代数等高等数学知识的回顾]
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值