TensorFlow实现识别Mnist手写字体数据集

最新推荐文章于 2022-12-03 10:16:01 发布

文科升

最新推荐文章于 2022-12-03 10:16:01 发布

阅读量958

点赞数

分类专栏： Tensorflow 机器学习文章标签： mnist手写字体识别卷积神经网络 TensorFlow

本文链接：https://blog.csdn.net/moyu123456789/article/details/83794942

版权

机器学习同时被 2 个专栏收录

10 篇文章 4 订阅

订阅专栏

Tensorflow

9 篇文章 3 订阅

订阅专栏

本文为笔者学习《21个项目玩转深度学习：基于TensorFlow的实践详解》这本书第一章的学习笔记。

1.Softmax回归

1）softmax回归原理

Softmax回归是一个线性的多分类模型，实际上它是直接从Logistic回归模型转化而来的，区别在于Logistic回归模型为二分类模型，而Softmax模型为多分类模型。

Softmax函数主要功能是将各个类别的“打分”值转换成概率值，这样，哪个类别的概率值高则属于哪个类别。Softmax函数中主要做了e的幂次扩展和归一化两件事情，比如一张图片属于猫的打分值是3.2，属于汽车的打分值是5.1，属于青蛙的打分值是-1.7，则经过e的幂次扩展后，属于猫的打分值变为 $\exp 3.2$ =24.5，属于汽车的打分值变为 $\exp 5.1$ =164.0，属于青蛙的打分值为 $\exp -1.7$ =0.18。再进行归一化计算出属于各个类别的概率值：属于猫的概率为 $\frac{24.5}{24.5+164.0+0.18}=0.13$ ，属于汽车的概率为 $\frac{164.0}{24.5+164.0+0.18}=0.87$ ，属于青蛙的概率为0.00，归一化后属于各个类别的概率加起来为1。可以看出e的幂次扩展将原来很小的值放大了，将原来稍大的值更加放大了，再经过归一化操作后，概率值相差更大，故此，这张图片很大概率上分类为汽车。

x为样本特征，W，b是权重值和偏置，按照线性模型第一步得到各个类别的打分值： $W^{T}x+b$ ，再经过softmax函数，就得到了属于各个分类的概率值： $y=softmax(W^{T}x+b)$ 。以下，我们利用TensorFlow使用softmax回归来对mnist手写字体进行分类。

2）代码和解析

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
import scipy.misc
import os

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

print(mnist.train.images.shape)
print(mnist.train.labels.shape)
print(mnist.validation.images.shape)
print(mnist.validation.labels.shape)

print(mnist.test.images.shape)
print(mnist.test.labels.shape)

print(mnist.train.images[0,:])
#将数据转成图片保存
save_dir = 'MNIST_data/raw'
if os.path.exists(save_dir) is False:
    os.makedirs(save_dir)
    
#保存前20张图片
for i in range(20):
    image_array = mnist.train.images[i,:]
    image_array = image_array.reshape(28, 28)
    #保存文件格式
    filename = save_dir + 'mnist_train_%d.jpg' %i
    #先用scipy.misc.toimage转换为图片，再调用save直接保存
    scipy.misc.toimage(image_array, cmin=0.0, cmax=1.0).save(filename)

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
#y为预测值
y = tf.nn.softmax(tf.matmul(x, W)+b)
#y_为图像标签的真实值
y_ = tf.placeholder(tf.float32, [None, 10])

#根据y和y_构造交叉熵损失
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y)))
#如何优化损失，让损失减小
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

#创建会话
sess = tf.InteractiveSession()
#运行前初始化所有变量
tf.global_variables_initializer().run()

#进行1000步梯度下降
for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x:batch_xs, y_:batch_ys})
    
#正确率检查
#argmax(x,1)表示获取一行数据中最大数的下标，argmax(x,0)表示获取每个列数据中最大值的下标
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

#在测试数据上计算模型的准确率
print(sess.run(accuracy, feed_dict={x:mnist.test.images, y_:mnist.test.labels}))

3）运行结果

运行结果如下，可以看出使用softmax回归对手写字的识别准确率为0.9171。

0.9171

2.两层卷积网络分类

1）代码和解析如下：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#placeholder一般用于为样本和标签占位
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

#将单张图片从784维转换为28*28的矩阵图片
x_image = tf.reshape(x, [-1, 28, 28, 1])

#开始进行卷积计算，定义参数
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv_2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

#第一层卷积层
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv_2d(x_image, W_conv1)+ b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#第二层卷积计算
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv_2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#两层卷积层之后是全连接层
#全连接，输出为1024维向量
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
#使用Dropout,keep_prob是一个占位符，训练时为0.5，测试时为1
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#第二层全连接，把1024维的向量转换为10维，对应10个类别
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

#计算交叉熵
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels= y_, logits= y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
#定义测试的准确率
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

#创建session，对变量初始化
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

#训练2000步
for i in range(2000):
    batch = mnist.train.next_batch(50)
    #每100步报告一次验证集上的准确率
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
        
    train_step.run(feed_dict = {x : batch[0], y_ : batch[1], keep_prob : 0.5})
    
#训练结束后，报告在测试集上的准确率
print("test accuracy %g" % accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:1.0}))

2）运行结果

运行结果如下，可以看出在训练集上准确率一度达到了100%，在测试集上准确率0.9785，和softmax回归相比，提高不少。

step 0, training accuracy 0.12
step 100, training accuracy 0.8
step 200, training accuracy 0.96
step 300, training accuracy 0.96
step 400, training accuracy 0.96
step 500, training accuracy 0.98
step 600, training accuracy 1
step 700, training accuracy 0.94
step 800, training accuracy 0.96
step 900, training accuracy 1
step 1000, training accuracy 0.94
step 1100, training accuracy 0.94
step 1200, training accuracy 0.98
step 1300, training accuracy 0.96
step 1400, training accuracy 0.96
step 1500, training accuracy 1
step 1600, training accuracy 0.96
step 1700, training accuracy 0.96
step 1800, training accuracy 1
step 1900, training accuracy 1
test accuracy 0.9785

3.一些总结

1）TensorFlow 无论是占位符还是变量，都是“Tensor”（也就是张量）。占位符和变量是不同的tensor，占位符不依赖与其他的Tensor，它的值由用户自行传递给Tensorflow，通常用来存储样本数据和标签。变量是指在计算过程中可以改变的值，每次计算后的值会被保存，通常用来存储模型的参数。

2）会话是TensorFlow中又一个核心概念。Tensor是“希望”TensorFlow进行计算的结点，而会话可以看成对这些结点进行计算的上下文。变量的值就被保存在会话中。在对变量进行操作前，必须对变量进行初始化，实际上是对会话中保存的变量的初始化，初始化变量的方法为：tf.global_variables_initializer().run()。

3）卷积、激活函数和池化可以说是一个卷积层的“标配”。通常一个卷积层都包括这3个步骤，有些也会去掉池化操作。

代码和数据GitHub路径：https://github.com/zhuwsh/21DeepLearningProjects