机器学习之ROC曲线以及混淆矩阵

最新推荐文章于 2024-08-12 18:57:34 发布

geekac

最新推荐文章于 2024-08-12 18:57:34 发布

阅读量3.9k

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/AchangeC/article/details/80104279

版权

机器学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.ROC曲线

占坑。

------------------------------------------- 分割线 -------------------------------------------

2.混淆矩阵

混淆矩阵的作用：对机器学习的学习效果进行评估的一种指标。confusion matrix is to evaluate the accuracy of a classification.它的作用是评估分类的准确度。

混淆矩阵的定义：对于混淆矩阵C，C(i,j) 表示真实分类是第i类，但是预测值为第j类的观测总数。

对于二分类问题(假设1是正类)：

C(0,0)表示真反类 TN(true negative)	C(0,1)表示假正类 FP(false negative)
C(1,0)表示假反类 FP(false negative)	C(1,1)表示真正类 TP(true positive)

混淆矩阵的实现：

通过使用scikit-learn模块计算混淆矩阵，

sklearn.metrics.confusion_matrix(y_true, y_pred, labels=None, sample_weight=None).

下面对参数进行说明：(注：表格中的数组类型在python中对应为列表list()类型)

Parameters:

参数：

y_true : 长度为n_samples的数组类型

表示真实分类。Ground truth (correct) target values.

y_pred : 长度为n_samples的数组类型

表示由分类器对n个样例的预测分类值。Estimated targets as returned by a classifier.

labels : [可选参数]，长度为n_classes的数组类型 (n_classes表示类别/标签个数)

这个参数是标签的列表形式，影响输出的混淆矩阵C的行列表示的含义，先输出哪一类的值。

该矩阵可能被用于重排或者选择其中一个子集计算混淆矩阵。

如果不给这个参数(即为None)，那么 y_true or y_pred中至少出现一次的值将被用于排列顺序

注：1. 混淆矩阵的列表示类别的顺序 2. y_true or y_pred中出现的值就表示类标签。

List of labels to index the matrix. This may be used to reorder or select a subset of labels.

If none is given, those that appear at least once in y_true or y_pred are used in sorted order.

sample_weight : 类似长度为n_samples的数组形式，可选参数。

样例的权重向量。Sample weights.

Returns:

返回值：

C : array, shape = [n_classes, n_classes]

返回混淆矩阵(Confusion matrix)C，shape=(n_classes, n_classes)。

混淆矩阵的性质：

矩阵的对角线表示正确分类的个数，非对角线的点是分类错误的情况。

混淆矩阵的实例：

1.计算混淆矩阵：

from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)

2.使用matplotlib.pyplot画出混淆矩阵：

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')


plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_names,
                      title='Confusion matrix, without normalization')


# Plot normalized confusion matrix
plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True,
                      title='Normalized confusion matrix')
plt.show()

-----end-----