《Hands-On Machine Learning with Scikit-Learn and TensorFlow》 学习笔记第三篇----分类模型

本文是《Hands-On Machine Learning with Scikit-Learn and TensorFlow》学习笔记的第三篇,主要涉及MNIST数据集的获取,模型训练,以及模型评估。在评估部分,详细讨论了交叉验证、混淆矩阵、召回率、查准率、F1值、ROC曲线和AUC计算,强调了在分类问题中选择合适性能指标的重要性。
摘要由CSDN通过智能技术生成

Mnist分类器

一、数据准备

从tensorflow下载mnist数据
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('/content',one_hot=True)

查看训练数据的大小
print(mnist.train.images.shape)  # (55000, 784)
print(mnist.train.labels.shape)  # (55000, 10)

查看验证数据的大小
print(mnist.validation.images.shape)  # (5000, 784)
print(mnist.validation.labels.shape)  # (5000, 10)

查看测试数据的大小
print(mnist.test.images.shape)  # (10000, 784)
print(mnist.test.labels.shape)  # (10000, 10)

打印出第0幅图片的向量表示
print(mnist.train.images[0, :])

打印出第0幅图片的标签
print(mnist.train.labels[0, :])
从keras下载mnist数据
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

二、训练模型

三、评估模型

详见模型评估方法

交叉验证cross_val_score

准确率通常无法成为分类器的首要性能指标,特别是当你
处理偏斜数据集(skewed dataset)的时候(即某些类比其他类更为频
繁)所以用cross_val_score交叉验证不行

混淆矩阵confusion_matrix

画出混淆矩阵

conf_mx=confusion_matrix(y_true, y_pred, labels=None, sample_weight=None)
plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

分析:
稍暗一些的地方,这可能意味着数据集中这些样本的图片较少,也可能是分类器在这些样本上的执行效果不如在其他样本分类上好。
改进:
将混淆矩阵中的每个值除以相应类别中的图片数量,用0填充对角线,只保留错误,重新绘制结果

row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = conf_mx / row_sums
np.fill_diagonal(norm_conf_mx, 0)
plt.matshow(norm_conf_mx, cmap=plt.cm.gray)
plt.show()
三个指标

召回率:实际为正的样本中被预测为正样本的概率recall_score(y_true, y_pred)
查准率:所有被预测为正的样本中实际为正的样本的概率precision_score(y_true, y_pred)
F1值:查准率和召回率的调和值f1_score(y_true, y_pred)

可以使用precision_recall_curve() 函数来计算所有可能的阈值的精度和召回率,并画图。目的寻找阈值,提高阈值确实可以降低召回率,通过轻松选择阈值来实现最佳的精度/召回率权衡,改变阈值可以实现一个你想要的精度分类器

from sklearn.metrics import precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
    plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
    plt.xlabel("Threshold", fontsize=16)
    plt.legend(loc="upper left", fontsize=16)
    plt.ylim([0, 1])

plt.figure(figsize=(8, 4))
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.xlim([-700000, 700000])
save_fig("precision_recall_vs_threshold_plot")
plt.show()
二元分类器ROC曲线

二分类的时候,可以利用roc_curve计算fpr,tpr,和阈值thresholds;并画出roc曲线,利用roc_auc_score计算auc的值

fpr,tpr,thresholds=roc_curve(y_test,y_pred, pos_label=None, sample_weight=None, drop_intermediate=True)
print (tpr,fpr,thresholds)
auc = auc(fpr, tpr)#计算auc的值,在(0,1)之间,根据曲线上的点,然后计算AUC值
auc = roc_auc_score(y_true, y_pred)#计算auc的值,在(0,1)之间,根据真实值(必须是二值)、预测值
plt.plot(fpr, tpr, lw=1, label='ROC(area = %0.2f)' % (roc_auc))
plt.xlabel("FPR (False Positive Rate)")
plt.ylabel("TPR (True Positive Rate)")
plt.title("Receiver Operating Characteristic, ROC(AUC = %0.2f)"% (roc_auc))
plt.show()
两种计算AUC方法
  • auc = auc(fpr, tpr)
    #计算auc的值,在(0,1)之间,根据曲线上的点,然后计算AUC值
  • auc = roc_auc_score(y_true, y_pred)
    #计算auc的值,在(0,1)之间,根据真实值(必须是二值)、预测值

对角线表示纯随机分类器的ROC曲线;一个优秀的分类器应该离这条线越远越好(向左上角) 。测量曲线下面积(AUC),面积越接近1,分类效果越好。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron English | 2017 | ISBN: 1491962291 | 566 Pages | EPUB | 8.41 MB Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how. By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started. Explore the machine learning landscape, particularly neural nets Use scikit-learn to track an example machine-learning project end-to-end Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods Use the TensorFlow library to build and train neural nets Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning Learn techniques for training and scaling deep neural nets Apply practical code examples without acquiring excessive machine learning theory or algorithm details
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值