ML/DL-复习笔记【三】- 算法的评价指标

最新推荐文章于 2023-05-21 09:31:36 发布

不会算命的赵半仙

最新推荐文章于 2023-05-21 09:31:36 发布

阅读量511

点赞数

文章标签：机器学习深度学习深度学习复习笔记

本文链接：https://blog.csdn.net/kevin_zhao_zl/article/details/107207413

版权

本节为ML/DL-复习笔记【三】- 算法的评价指标，主要内容包括：错误率、精度、查全率、查准率、F-Score、P-R曲线、ROC曲线、AUC 、(m)AP、(m)IoU、(m)PA、fwIoU及其python实现。

1. 错误率和精度

# 以二分类问题为例
import numpy as np

y_true = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
y_pred = np.array([0, 0, 1, 1, 0, 1, 1, 1, 0, 0])

错误率就是分类错误的样本时占样本总数的比例，精度则是分类正确的样本数占样本总数的比例。错误率计算公式如下，精度则是 $1 - E (f, D)$ ：
$E(f,D)=\frac1m\Sigma_{i=1}^m(f(x_i)\neq y_i)$

代码实现：

## 1.错误率和精度
precision = np.mean(y_pred == y_true)
error = 1 - precision
print(precision, error)

from sklearn.metrics import accuracy_score
# 返回准确率
precision = accuracy_score(y_true, y_pred, normalize=True)
# 返回正确分类的数量
precision_num = accuracy_score(y_true, y_pred, normalize=False)
print(precision, precision_num)

2. 查全率、查准率、F-Score

对于二分类问题，定义如下混淆矩阵：
在这里插入图片描述
查准率关心的是“检索出的信息中有多大的比例是用户感兴趣的”，定义如下：
$P=\frac{TP}{TP+FP}$

查准率/召回率关心的是“用户感兴趣的信息中有多少被检索出来了”，定义如下：
$R=\frac{TP}{TP+FN}$

一般地，查准率高查全率往往偏低，查准率低而查全率往往偏高，只有再一些简单任务中，才可能使查全率和查准率都很高，代码如下：

## 2. 查准率、查全率
from sklearn.metrics import precision_score, recall_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
print(precision, recall)

将样本按置信度由大到小排序，然后遍历样本，每一次都以当前样本为阈值，之前的属于正例，之后的属于负例。每一个阈值都会得到一组P-R值，全部绘制到图上就是P-R曲线。P-R曲线可以直观的显示学习器在样本总体上的查全率、查准率，如下图所示，随着更多的样本被分为正样本，查全率不断增大，查准率降低。通常，当一个学习器的P-R曲线被另一个学习器的曲线完全包住时，则可断言后者的性能优于前者。例如下图中A的学习期性能优于C。但若曲线发生了交叉，那么需要借助F-Score。
在这里插入图片描述
P-R曲线绘制代码如下，使用包中自带的数据集：

## 3. P-R曲线的绘制
from sklearn.metrics import precision_recall_curve
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np

iris = load_iris()
X = iris.data
y = iris.target
y = label_binarize(y, classes=[0, 1, 2])  # one-hot
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
# 训练模型
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 绘制P-R曲线
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
precision = {}
recall = {}
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i])
    ax.plot(recall[i], precision[i], label='target=%s' % i)
ax.set_xlabel("Recall Score")
ax.set_ylabel("Precision Score")
ax.set_title("P-R")
ax.legend(loc='best')
ax.set_xlim(0, 1.0)
ax.set_ylim(0, 1.0)
ax.grid()
plt.show()

F-Score能够综合查准率和查全率，其中F1指标是基于二者的调和平均定义的，即：
$\frac1{F_1}=\frac12(\frac1P+\frac1R)$

即：
$F_1=2\frac{PR}{P+R}=2\frac{TP}{N+TP-TN}$

其中N是样本总数。

在有些应用中，对查准率和查全率的重视程度不同，有了F-Score的一般形式 $F_{\beta}$ ，为 $P$ 和 $R$ 的加权调和平均，即：
$\frac1{F_{\beta}}=\frac1{1+\beta^2}(\frac1P+\frac{\beta^2}R)$

即：
$F_{\beta}=(1+\beta^2)\frac{PR}{\beta^2P+R}$

其中 $\beta > 0$ 衡量了查全率对查准率的相对重要性， $\beta>1$ 说明查全率更重高，反之查准率更重要。

3.ROC与AUC

按样本置信度由大到小进行排序，再逐个样本选择阈值，该样本之前的属于正例，之后的属于负例。每一个样本作为阈值时都会得到对应的真正例率TPR和假正例率FPR，定义分别为：
$TPR=\frac{TP}{TP+FN}$
$FPR=\frac{FP}{TN+FP}$

然后以FPR为横轴，TPR为纵轴绘制得到ROC曲线。对于 $(0, 0)$ 点，取的阈值大于所有样本的最大置信度，所有样本都会被预测为负例，此时TP、FP为0，即TPR、FPR为0。然后逐步降低阈值，直到所有样本都被预测为正例。

对于随机猜测，理想情况下有 $T P R = F P R$ ，此时对于ROC曲线就是对角线。当一个学习器的ROC曲线被另一个学习器的ROC曲线完全包住，则后者性能更优。若发生交叉，可用ROC曲线下的面积进行判断，称为AUC。若ROC曲线的坐标由点集合 ${(x_1,y_1),...,(x_N,y_N)\}$ 连成，则AUC可估算为：
$AUC=\frac12\Sigma_{i=1}^{N-1}(x_{i+1}-x_i)(y_i+y_{i+1})$

绘制ROC曲线计算AUG的代码如下：

## 5. ROC与AUC
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np

# 加载数据
iris = load_iris()
X = iris.data
y = iris.target
# one-hot
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# 训练模型
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 获取ROC
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
fpr = {}
tpr = {}
roc_auc = {}
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])
    ax.plot(fpr[i], tpr[i], label="target=%s,auc=%s" % (i, roc_auc[i]))
ax.plot([0, 1], [0, 1], 'k--')
ax.set_xlabel("FPR")
ax.set_ylabel("TPR")
ax.set_title("ROC")
ax.legend(loc="best")
ax.set_xlim(0, 1.1)
ax.set_ylim(0, 1.1)
ax.grid()
plt.show()

4. 目标检测任务中IoU和mAP

目标检测任务中常用的评价指标即各类别平均精度，在了解并计算mAP之前，先看一下IoU也即交并比的基本知识。

4.1 IoU

IoU的计算公式如下：
$IoU=\frac{Area\ of\ Overlap}{Area\ of\ Union}=\frac{A_{pred}\bigcap A_{true}}{A_{pred}\bigcup A_{true}}$

python实现如下，假定我们给出两个矩形框，每个矩形框包含左上和右下坐标或者中心点坐标和宽高尺寸：

## 6. IoU
import numpy as np

def compute_iou(box1, box2, wh=False):
    """
    compute the iou of two boxes.
    Args:
        box1, box2: [xmin, ymin, xmax, ymax] (wh=False) or [xcenter, ycenter, w, h] (wh=True)
        wh: the format of coordinate.
    Return:
        iou: iou of box1 and box2.
    """
    if wh == False:
        xmin1, ymin1, xmax1, ymax1 = box1
        xmin2, ymin2, xmax2, ymax2 = box2
    else:
        xmin1, ymin1 = int(box1[0] - box1[2] / 2.0), int(box1[1] - box1[3] / 2.0)
        xmax1, ymax1 = int(box1[0] + box1[2] / 2.0), int(box1[1] + box1[3] / 2.0)
        xmin2, ymin2 = int(box2[0] - box2[2] / 2.0), int(box2[1] - box2[3] / 2.0)
        xmax2, ymax2 = int(box2[0] + box2[2] / 2.0), int(box2[1] + box2[3] / 2.0)

    ## 获取矩形框交集对应的左上和右下的坐标
    xx1 = np.max([xmin1, xmin2])
    yy1 = np.max([ymin1, ymin2])
    xx2 = np.min([xmax1, xmax2])
    yy2 = np.min([ymax1, ymax2])

    ## 计算两个矩形框面积
    area1 = (xmax1 - xmin1) * (ymax1 - ymin1)
    area2 = (xmax2 - xmin2) * (ymax2 - ymin2)

    ## 计算交集面积
    inter_area = np.max([0, xx2 - xx1]) * np.max([0, yy2 - yy1])

    ## 计算交并比
    IoU = inter_area / (area1 + area2 - inter_area)
    return IoU

4.2 mAP

假设现在我们有一组目标检测的实验结果，包含三个数据，每个数据都由两个矩形框和一个置信度组成，模型预测的框记为 $pre_i$ ，真实的标签矩形框记为 $label_i$ ， $i = 1, 2, 3$ ，假设三个 $p r e$ 的置信度分别为 $0.9, 0.8$ 和 $0.7$ 。

首先我们计算每个数据中 $p r e$ 和 $l a b e l$ 的IoU，现以0.5为一个阈值，当 $I o U$ 大于0.5则这个 $p r e$ 为混淆矩阵中的 $T P$ ，否则为 $F P$ 。假设我们的三个数据 $p r e 1$ 和 $p r e 3$ 为 $T P$ ， $p r e 2$ 为 $F P$ 。

然后根据置信度排序，这里 $p r e 1$ 、 $p r e 2$ 和 $p r e 3$ 正好是从高到低。

然后在不同置信度阈值下计算Precision和Recall。首先设置阈值0.9，则无视所有小于0.9的pre，此时检测器的pre框即TP+FP=1，且pre1是TP，即Precision=1，而label数目为3，所以Recall=1/3。同理得到其他两组P、R值，分别为(1/2,1/3)和(2/3,2/3)。

绘制PR曲线，然后每个峰值点往左画一条线段直到与上一个峰值点的垂直线相交，这样红色线段和坐标轴围起来的面积就是AP值，如下图所示，mAP就是每个类的AP值相加取平均即可，
在这里插入图片描述
python代码:

# -*- coding: utf-8 -*-
# @File    : https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/master/pytorchyolo/utils/utils.py
# @Desc    :
def ap_per_class(tp, conf, pred_cls, target_cls):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:    True positives (list).
        conf:  Objectness value from 0-1 (list).
        pred_cls: Predicted object classes (list).
        target_cls: True object classes (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """

    # Sort by objectness
    i = np.argsort(-conf)
    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]

    # Find unique classes
    unique_classes = np.unique(target_cls)

    # Create Precision-Recall curve and compute AP for each class
    ap, p, r = [], [], []
    for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
        i = pred_cls == c
        n_gt = (target_cls == c).sum()  # Number of ground truth objects
        n_p = i.sum()  # Number of predicted objects

        if n_p == 0 and n_gt == 0:
            continue
        elif n_p == 0 or n_gt == 0:
            ap.append(0)
            r.append(0)
            p.append(0)
        else:
            # Accumulate FPs and TPs
            fpc = (1 - tp[i]).cumsum()
            tpc = (tp[i]).cumsum()

            # Recall
            recall_curve = tpc / (n_gt + 1e-16)
            r.append(recall_curve[-1])

            # Precision
            precision_curve = tpc / (tpc + fpc)
            p.append(precision_curve[-1])

            # AP from recall-precision curve
            ap.append(compute_ap(recall_curve, precision_curve))

    # Compute F1 score (harmonic mean of precision and recall)
    p, r, ap = np.array(p), np.array(r), np.array(ap)
    f1 = 2 * p * r / (p + r + 1e-16)

    return p, r, ap, f1, unique_classes.astype("int32")


def compute_ap(recall, precision):
    """ Compute the average precision, given the recall and precision curves.
    Code originally from https://github.com/rbgirshick/py-faster-rcnn.
    # Arguments
        recall:    The recall curve (list).
        precision: The precision curve (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    # correct AP calculation
    # first append sentinel values at the end
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([0.0], precision, [0.0]))

    # compute the precision envelope
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap

5. 图像分割任务中的PA、mIoU、FwIoU

假设共有 $k + 1$ 个类，其中包含一个背景类， $p_{ij}$ 表示本属于类别 $i$ 但是被预测为类 $j$ 的像素数量，也就是说 $p_{ii}$ 是真正的数量。

像素准确率Pixel Accuracy表示标记正确的像素占总像素的比例:
$PA=\frac{\Sigma_{i=0}^kp_{ii}}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}}$

平均像素准确率Mean Pixel Accuracy，计算每个类被正确分类的像素数的比例，之后求平均：
$MPA=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}}$

平均交并比Mean Intersection over Union，最常用的，计算两个集合的交集和并集之比，在语义分割任务中，两个集合分别为真实值和预测值，这个比例可以变形为真正数闭上真正、假负、假正之和，在每个类上计算IoU，之后平均：
$mIoU=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})}$

频权交并比Frequency Weighted Intersection over Union，根据每个类出现的频率为其设置权重：
$FWIoU=\frac{1}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})}$

实现代码如下，工程链接：

# -*- coding: utf-8 -*-
# @Time    : 19-1-10 下午11:03
# @Author  : Zhao Lei
# @File    : metrics.py
# @Desc    :

import numpy as np


class Evaluator(object):
    def __init__(self, num_class):
        self.num_class = num_class
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

    def Pixel_Accuracy(self):
        Acc = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
        return Acc

    def Pixel_Accuracy_Class(self):
        Acc = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
        Acc = np.nanmean(Acc)
        return Acc

    def Mean_Intersection_over_Union(self):
        MIoU = np.diag(self.confusion_matrix) / (
                np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
                np.diag(self.confusion_matrix))
        MIoU = np.nanmean(MIoU)
        return MIoU

    def Frequency_Weighted_Intersection_over_Union(self):
        freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)
        iu = np.diag(self.confusion_matrix) / (
                np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
                np.diag(self.confusion_matrix))

        FWIoU = (freq[freq > 0] * iu[freq > 0]).sum()
        return FWIoU

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class ** 2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def add_batch(self, gt_image, pre_image):
        assert gt_image.shape == pre_image.shape
        self.confusion_matrix += self._generate_matrix(gt_image, pre_image)

    def reset(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

欢迎扫描二维码关注微信公众号深度学习与数学[每天获取免费的大数据、AI等相关的学习资源、经典和最新的深度学习相关的论文研读，算法和其他互联网技能的学习，概率论、线性代数等高等数学知识的回顾]
在这里插入图片描述

不会算命的赵半仙

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ML/DL-复习笔记【三】- 算法的评价指标

本节为ML/DL-复习笔记【三】- 算法的评价指标，主要内容包括：错误率、精度、查全率、查准率、F-Score、R-R曲线、ROC曲线、AUC 、(m)AP、(m)IoU、(m)PA、fwIoU及其python实现。
复制链接

扫一扫