本节为ML/DL-复习笔记【三】- 算法的评价指标,主要内容包括:错误率、精度、查全率、查准率、F-Score、P-R曲线、ROC曲线、AUC 、(m)AP、(m)IoU、(m)PA、fwIoU及其python实现。
1. 错误率和精度
# 以二分类问题为例
import numpy as np
y_true = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
y_pred = np.array([0, 0, 1, 1, 0, 1, 1, 1, 0, 0])
错误率就是分类错误的样本时占样本总数的比例,精度则是分类正确的样本数占样本总数的比例。错误率计算公式如下,精度则是
1
−
E
(
f
,
D
)
1-E(f,D)
1−E(f,D):
E
(
f
,
D
)
=
1
m
Σ
i
=
1
m
(
f
(
x
i
)
≠
y
i
)
E(f,D)=\frac1m\Sigma_{i=1}^m(f(x_i)\neq y_i)
E(f,D)=m1Σi=1m(f(xi)=yi)
代码实现:
## 1.错误率和精度
precision = np.mean(y_pred == y_true)
error = 1 - precision
print(precision, error)
from sklearn.metrics import accuracy_score
# 返回准确率
precision = accuracy_score(y_true, y_pred, normalize=True)
# 返回正确分类的数量
precision_num = accuracy_score(y_true, y_pred, normalize=False)
print(precision, precision_num)
2. 查全率、查准率、F-Score
对于二分类问题,定义如下混淆矩阵:
查准率关心的是“检索出的信息中有多大的比例是用户感兴趣的”,定义如下:
P
=
T
P
T
P
+
F
P
P=\frac{TP}{TP+FP}
P=TP+FPTP
查准率/召回率关心的是“用户感兴趣的信息中有多少被检索出来了”,定义如下:
R
=
T
P
T
P
+
F
N
R=\frac{TP}{TP+FN}
R=TP+FNTP
一般地,查准率高查全率往往偏低,查准率低而查全率往往偏高,只有再一些简单任务中,才可能使查全率和查准率都很高,代码如下:
## 2. 查准率、查全率
from sklearn.metrics import precision_score, recall_score
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
print(precision, recall)
将样本按置信度由大到小排序,然后遍历样本,每一次都以当前样本为阈值,之前的属于正例,之后的属于负例。每一个阈值都会得到一组P-R值,全部绘制到图上就是P-R曲线。P-R曲线可以直观的显示学习器在样本总体上的查全率、查准率,如下图所示,随着更多的样本被分为正样本,查全率不断增大,查准率降低。通常,当一个学习器的P-R曲线被另一个学习器的曲线完全包住时,则可断言后者的性能优于前者。例如下图中A的学习期性能优于C。但若曲线发生了交叉,那么需要借助F-Score。
P-R曲线绘制代码如下,使用包中自带的数据集:
## 3. P-R曲线的绘制
from sklearn.metrics import precision_recall_curve
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np
iris = load_iris()
X = iris.data
y = iris.target
y = label_binarize(y, classes=[0, 1, 2]) # one-hot
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
# 训练模型
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 绘制P-R曲线
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
precision = {}
recall = {}
for i in range(n_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i])
ax.plot(recall[i], precision[i], label='target=%s' % i)
ax.set_xlabel("Recall Score")
ax.set_ylabel("Precision Score")
ax.set_title("P-R")
ax.legend(loc='best')
ax.set_xlim(0, 1.0)
ax.set_ylim(0, 1.0)
ax.grid()
plt.show()
F-Score能够综合查准率和查全率,其中F1指标是基于二者的调和平均定义的,即:
1
F
1
=
1
2
(
1
P
+
1
R
)
\frac1{F_1}=\frac12(\frac1P+\frac1R)
F11=21(P1+R1)
即:
F
1
=
2
P
R
P
+
R
=
2
T
P
N
+
T
P
−
T
N
F_1=2\frac{PR}{P+R}=2\frac{TP}{N+TP-TN}
F1=2P+RPR=2N+TP−TNTP
其中N是样本总数。
在有些应用中,对查准率和查全率的重视程度不同,有了F-Score的一般形式
F
β
F_{\beta}
Fβ,为
P
P
P和
R
R
R的加权调和平均,即:
1
F
β
=
1
1
+
β
2
(
1
P
+
β
2
R
)
\frac1{F_{\beta}}=\frac1{1+\beta^2}(\frac1P+\frac{\beta^2}R)
Fβ1=1+β21(P1+Rβ2)
即:
F
β
=
(
1
+
β
2
)
P
R
β
2
P
+
R
F_{\beta}=(1+\beta^2)\frac{PR}{\beta^2P+R}
Fβ=(1+β2)β2P+RPR
其中 β > 0 \beta > 0 β>0衡量了查全率对查准率的相对重要性, β > 1 \beta>1 β>1说明查全率更重高,反之查准率更重要。
3.ROC与AUC
按样本置信度由大到小进行排序,再逐个样本选择阈值,该样本之前的属于正例,之后的属于负例。每一个样本作为阈值时都会得到对应的真正例率TPR和假正例率FPR,定义分别为:
T
P
R
=
T
P
T
P
+
F
N
TPR=\frac{TP}{TP+FN}
TPR=TP+FNTP
F
P
R
=
F
P
T
N
+
F
P
FPR=\frac{FP}{TN+FP}
FPR=TN+FPFP
然后以FPR为横轴,TPR为纵轴绘制得到ROC曲线。对于 ( 0 , 0 ) (0,0) (0,0)点,取的阈值大于所有样本的最大置信度,所有样本都会被预测为负例,此时TP、FP为0,即TPR、FPR为0。然后逐步降低阈值,直到所有样本都被预测为正例。
对于随机猜测,理想情况下有
T
P
R
=
F
P
R
TPR=FPR
TPR=FPR,此时对于ROC曲线就是对角线。当一个学习器的ROC曲线被另一个学习器的ROC曲线完全包住,则后者性能更优。若发生交叉,可用ROC曲线下的面积进行判断,称为AUC。若ROC曲线的坐标由点集合
{
(
x
1
,
y
1
)
,
.
.
.
,
(
x
N
,
y
N
)
}
\{(x_1,y_1),...,(x_N,y_N)\}
{(x1,y1),...,(xN,yN)}连成,则AUC可估算为:
A
U
C
=
1
2
Σ
i
=
1
N
−
1
(
x
i
+
1
−
x
i
)
(
y
i
+
y
i
+
1
)
AUC=\frac12\Sigma_{i=1}^{N-1}(x_{i+1}-x_i)(y_i+y_{i+1})
AUC=21Σi=1N−1(xi+1−xi)(yi+yi+1)
绘制ROC曲线计算AUG的代码如下:
## 5. ROC与AUC
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import label_binarize
import numpy as np
# 加载数据
iris = load_iris()
X = iris.data
y = iris.target
# one-hot
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
# 添加噪声
np.random.seed(0)
n_samples, n_features = X.shape
X = np.c_[X, np.random.randn(n_samples, 200 * n_features)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# 训练模型
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, random_state=0))
clf.fit(X_train, y_train)
y_score = clf.fit(X_train, y_train).decision_function(X_test)
# 获取ROC
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
fpr = {}
tpr = {}
roc_auc = {}
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
ax.plot(fpr[i], tpr[i], label="target=%s,auc=%s" % (i, roc_auc[i]))
ax.plot([0, 1], [0, 1], 'k--')
ax.set_xlabel("FPR")
ax.set_ylabel("TPR")
ax.set_title("ROC")
ax.legend(loc="best")
ax.set_xlim(0, 1.1)
ax.set_ylim(0, 1.1)
ax.grid()
plt.show()
4. 目标检测任务中IoU和mAP
目标检测任务中常用的评价指标即各类别平均精度,在了解并计算mAP之前,先看一下IoU也即交并比的基本知识。
4.1 IoU
IoU的计算公式如下:
I
o
U
=
A
r
e
a
o
f
O
v
e
r
l
a
p
A
r
e
a
o
f
U
n
i
o
n
=
A
p
r
e
d
⋂
A
t
r
u
e
A
p
r
e
d
⋃
A
t
r
u
e
IoU=\frac{Area\ of\ Overlap}{Area\ of\ Union}=\frac{A_{pred}\bigcap A_{true}}{A_{pred}\bigcup A_{true}}
IoU=Area of UnionArea of Overlap=Apred⋃AtrueApred⋂Atrue
python实现如下,假定我们给出两个矩形框,每个矩形框包含左上和右下坐标或者中心点坐标和宽高尺寸:
## 6. IoU
import numpy as np
def compute_iou(box1, box2, wh=False):
"""
compute the iou of two boxes.
Args:
box1, box2: [xmin, ymin, xmax, ymax] (wh=False) or [xcenter, ycenter, w, h] (wh=True)
wh: the format of coordinate.
Return:
iou: iou of box1 and box2.
"""
if wh == False:
xmin1, ymin1, xmax1, ymax1 = box1
xmin2, ymin2, xmax2, ymax2 = box2
else:
xmin1, ymin1 = int(box1[0] - box1[2] / 2.0), int(box1[1] - box1[3] / 2.0)
xmax1, ymax1 = int(box1[0] + box1[2] / 2.0), int(box1[1] + box1[3] / 2.0)
xmin2, ymin2 = int(box2[0] - box2[2] / 2.0), int(box2[1] - box2[3] / 2.0)
xmax2, ymax2 = int(box2[0] + box2[2] / 2.0), int(box2[1] + box2[3] / 2.0)
## 获取矩形框交集对应的左上和右下的坐标
xx1 = np.max([xmin1, xmin2])
yy1 = np.max([ymin1, ymin2])
xx2 = np.min([xmax1, xmax2])
yy2 = np.min([ymax1, ymax2])
## 计算两个矩形框面积
area1 = (xmax1 - xmin1) * (ymax1 - ymin1)
area2 = (xmax2 - xmin2) * (ymax2 - ymin2)
## 计算交集面积
inter_area = np.max([0, xx2 - xx1]) * np.max([0, yy2 - yy1])
## 计算交并比
IoU = inter_area / (area1 + area2 - inter_area)
return IoU
4.2 mAP
假设现在我们有一组目标检测的实验结果,包含三个数据,每个数据都由两个矩形框和一个置信度组成,模型预测的框记为 p r e i pre_i prei,真实的标签矩形框记为 l a b e l i label_i labeli, i = 1 , 2 , 3 i=1,2,3 i=1,2,3,假设三个 p r e pre pre的置信度分别为 0.9 , 0.8 0.9,0.8 0.9,0.8和 0.7 0.7 0.7。
首先我们计算每个数据中 p r e pre pre和 l a b e l label label的IoU,现以0.5为一个阈值,当 I o U IoU IoU大于0.5则这个 p r e pre pre为混淆矩阵中的 T P TP TP,否则为 F P FP FP。假设我们的三个数据 p r e 1 pre1 pre1和 p r e 3 pre3 pre3为 T P TP TP, p r e 2 pre2 pre2为 F P FP FP。
然后根据置信度排序,这里 p r e 1 pre1 pre1、 p r e 2 pre2 pre2和 p r e 3 pre3 pre3正好是从高到低。
然后在不同置信度阈值下计算Precision和Recall。首先设置阈值0.9,则无视所有小于0.9的pre,此时检测器的pre框即TP+FP=1,且pre1是TP,即Precision=1,而label数目为3,所以Recall=1/3。同理得到其他两组P、R值,分别为(1/2,1/3)和(2/3,2/3)。
绘制PR曲线,然后每个峰值点往左画一条线段直到与上一个峰值点的垂直线相交,这样红色线段和坐标轴围起来的面积就是AP值,如下图所示,mAP就是每个类的AP值相加取平均即可,
python代码:
# -*- coding: utf-8 -*-
# @File : https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/master/pytorchyolo/utils/utils.py
# @Desc :
def ap_per_class(tp, conf, pred_cls, target_cls):
""" Compute the average precision, given the recall and precision curves.
Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
# Arguments
tp: True positives (list).
conf: Objectness value from 0-1 (list).
pred_cls: Predicted object classes (list).
target_cls: True object classes (list).
# Returns
The average precision as computed in py-faster-rcnn.
"""
# Sort by objectness
i = np.argsort(-conf)
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
# Find unique classes
unique_classes = np.unique(target_cls)
# Create Precision-Recall curve and compute AP for each class
ap, p, r = [], [], []
for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
i = pred_cls == c
n_gt = (target_cls == c).sum() # Number of ground truth objects
n_p = i.sum() # Number of predicted objects
if n_p == 0 and n_gt == 0:
continue
elif n_p == 0 or n_gt == 0:
ap.append(0)
r.append(0)
p.append(0)
else:
# Accumulate FPs and TPs
fpc = (1 - tp[i]).cumsum()
tpc = (tp[i]).cumsum()
# Recall
recall_curve = tpc / (n_gt + 1e-16)
r.append(recall_curve[-1])
# Precision
precision_curve = tpc / (tpc + fpc)
p.append(precision_curve[-1])
# AP from recall-precision curve
ap.append(compute_ap(recall_curve, precision_curve))
# Compute F1 score (harmonic mean of precision and recall)
p, r, ap = np.array(p), np.array(r), np.array(ap)
f1 = 2 * p * r / (p + r + 1e-16)
return p, r, ap, f1, unique_classes.astype("int32")
def compute_ap(recall, precision):
""" Compute the average precision, given the recall and precision curves.
Code originally from https://github.com/rbgirshick/py-faster-rcnn.
# Arguments
recall: The recall curve (list).
precision: The precision curve (list).
# Returns
The average precision as computed in py-faster-rcnn.
"""
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.0], recall, [1.0]))
mpre = np.concatenate(([0.0], precision, [0.0]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
5. 图像分割任务中的PA、mIoU、FwIoU
假设共有 k + 1 k+1 k+1个类,其中包含一个背景类, p i j p_{ij} pij表示本属于类别 i i i但是被预测为类 j j j的像素数量,也就是说 p i i p_{ii} pii是真正的数量。
像素准确率Pixel Accuracy表示标记正确的像素占总像素的比例:
P
A
=
Σ
i
=
0
k
p
i
i
Σ
i
=
0
k
Σ
j
=
0
k
p
i
j
PA=\frac{\Sigma_{i=0}^kp_{ii}}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}}
PA=Σi=0kΣj=0kpijΣi=0kpii
平均像素准确率Mean Pixel Accuracy,计算每个类被正确分类的像素数的比例,之后求平均:
M
P
A
=
1
k
+
1
Σ
i
=
0
k
p
i
i
Σ
j
=
0
k
p
i
j
MPA=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}}
MPA=k+11Σi=0kΣj=0kpijpii
平均交并比Mean Intersection over Union,最常用的,计算两个集合的交集和并集之比,在语义分割任务中,两个集合分别为真实值和预测值,这个比例可以变形为真正数闭上真正、假负、假正之和,在每个类上计算IoU,之后平均:
m
I
o
U
=
1
k
+
1
Σ
i
=
0
k
p
i
i
Σ
j
=
0
k
p
i
j
+
Σ
j
=
0
k
(
p
i
j
−
p
i
i
)
mIoU=\frac{1}{k+1}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})}
mIoU=k+11Σi=0kΣj=0kpij+Σj=0k(pij−pii)pii
频权交并比Frequency Weighted Intersection over Union,根据每个类出现的频率为其设置权重:
F
W
I
o
U
=
1
Σ
i
=
0
k
Σ
j
=
0
k
p
i
j
Σ
i
=
0
k
p
i
i
Σ
j
=
0
k
p
i
j
+
Σ
j
=
0
k
(
p
i
j
−
p
i
i
)
FWIoU=\frac{1}{\Sigma_{i=0}^k\Sigma_{j=0}^kp_{ij}}\Sigma_{i=0}^k\frac{p_{ii}}{\Sigma_{j=0}^kp_{ij}+\Sigma_{j=0}^k(p_{ij}-p_{ii})}
FWIoU=Σi=0kΣj=0kpij1Σi=0kΣj=0kpij+Σj=0k(pij−pii)pii
实现代码如下,工程链接:
# -*- coding: utf-8 -*-
# @Time : 19-1-10 下午11:03
# @Author : Zhao Lei
# @File : metrics.py
# @Desc :
import numpy as np
class Evaluator(object):
def __init__(self, num_class):
self.num_class = num_class
self.confusion_matrix = np.zeros((self.num_class,) * 2)
def Pixel_Accuracy(self):
Acc = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
return Acc
def Pixel_Accuracy_Class(self):
Acc = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
Acc = np.nanmean(Acc)
return Acc
def Mean_Intersection_over_Union(self):
MIoU = np.diag(self.confusion_matrix) / (
np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
np.diag(self.confusion_matrix))
MIoU = np.nanmean(MIoU)
return MIoU
def Frequency_Weighted_Intersection_over_Union(self):
freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)
iu = np.diag(self.confusion_matrix) / (
np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -
np.diag(self.confusion_matrix))
FWIoU = (freq[freq > 0] * iu[freq > 0]).sum()
return FWIoU
def _generate_matrix(self, gt_image, pre_image):
mask = (gt_image >= 0) & (gt_image < self.num_class)
label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
count = np.bincount(label, minlength=self.num_class ** 2)
confusion_matrix = count.reshape(self.num_class, self.num_class)
return confusion_matrix
def add_batch(self, gt_image, pre_image):
assert gt_image.shape == pre_image.shape
self.confusion_matrix += self._generate_matrix(gt_image, pre_image)
def reset(self):
self.confusion_matrix = np.zeros((self.num_class,) * 2)
欢迎扫描二维码关注微信公众号 深度学习与数学[每天获取免费的大数据、AI等相关的学习资源、经典和最新的深度学习相关的论文研读,算法和其他互联网技能的学习,概率论、线性代数等高等数学知识的回顾]