Receiver Operating Characteristic(ROC)
ROC主要用来评估分类器输出品质
ROC曲线将Y轴定义为TPR(True Positive Rate),X轴定义为FPR(False Positive Rate)。这表示ROC曲线上左上角的点是分类器追求的理想点,即高TPR,低的FPR。虽然存在(1,0)配置的点不太现实,但是ROC曲线下方面积越大往往对应着较好的分类效果。
ROC曲线一般用于二分值的分类器中。为了将ROC曲线或区域拓展到多类别(标签)分类中,需要将输出进行二值化。一条ROC曲线可以描绘一种分类(标签),但是也可以考虑每个标签元素的微平均来描绘ROC曲线。多分类问题的另一种评估方法是宏平均,方法是给每个标签相同的权重。
下面是scikit-learn网站中的代码,复制过来做一些注释
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split #对数据集进行分割
from sklearn.preprocessing import label_binarize #对输出进行二值化
from sklearn.multiclass import OneVsRestClassifier #多类别分类(ovr)
from scipy import interp
# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]
# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
random_state=0)
# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)
# Compute ROC curve and ROC area for each class
fpr = dict() #FPR = FP/FP+TN
tpr = dict() #TPR = TP/TP+FN
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
# Compute micro-average ROC curve and ROC area 计算微平均的ROC曲线
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])#将各个标签的数据以及结果混合,计算微平均下的性能
#Plot of a ROC curve for a specific class
plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()
计算宏平均微平均及三种不同主特征下的ROC曲线
# Compute macro-average ROC curve and ROC area
# First aggregate all false positive rates
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))
# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
mean_tpr += interp(all_fpr, fpr[i], tpr[i])
# Finally average it and compute AUC
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])
# Plot all ROC curves
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
label='micro-average ROC curve (area = {0:0.2f})'
''.format(roc_auc["micro"]),
color='deeppink', linestyle=':', linewidth=4)
plt.plot(fpr["macro"], tpr["macro"],
label='macro-average ROC curve (area = {0:0.2f})'
''.format(roc_auc["macro"]),
color='navy', linestyle=':', linewidth=4)
colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=lw,
label='ROC curve of class {0} (area = {1:0.2f})'
''.format(i, roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()
这里涉及到宏平均以及微平均的概念:
准确率p:TP/TP+FP 被正确分类的样本与被分成这一类的总样本之比
召回率r: TP/TP+FN 被正确分类的样本与这一类原有的样本之比(TPR)
对于多分类,宏平均在每一类统计p,r,并计算其平均值;
而微平均对数据集中的每个实例进行全局统计,然后计算p,r;
附上一段英文解释:
When dealing with multiple classes there are two possible ways of averaging these measures(i.e. recall, precision, F1-measure) , namely, macro-average and micro-average. The macro-average weights equally all the classes, regardless of how many documents belong to it. The micro-average weights equally all the documents, thus favouring the performance on common classes. Different classifiers will perform different in common and rare categories. Learning algorithms are trained more often on more populated classes thus risking local over-fitting.