Tensoflow+CNN实现简单的mnist手写数字识别

本文链接：https://blog.csdn.net/coding_sun/article/details/78512619

MNIST数据集

该数据集的下载官网是 mnist
在tensorflow里面可直接用下面代码实现：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

改代码运行之后所在路径下会多出一个MNIST_data 的文件夹，里面为mnist数据集的四个文件（假如运行时间很长那就是在下载，可以自己把mnist的四个文件拷贝到该文件夹下面）。
下载下来的数据集被分成两部分：60000行的训练数据集（mnist.train）和10000行的测试数据集（mnist.test），每一个MNIST数据单元有两部分组成：一张包含手写数字的图片和一个对应的标签。我们把这些图片设为“xs”，把这些标签设为“ys”。训练数据集和测试数据集都包含xs和ys，比如训练数据集的图片是mnist.train.images，训练数据集的标签是mnist.train.labels。
在MNIST训练数据集中，mnist.train.images 是一个形状为 [60000, 784] 的张量，第一个维度数字用来索引图片，第二个维度数字用来索引每张图片中的像素点。在此张量里的每一个元素，都表示某张图片里的某个像素的强度值，值介于0和1之间。

CNN网络框架

该图是利用Tensorflow中的Tensorboard功能画出来的。
可以看到从下往上分别是input–>conv1–>pool1–>conv2–>pool2–>fc1–>fc2–output，即输入，卷积层1，池化层1，卷积层2，池化层2，全连接层1，全连接层2，输出标签信息。
最原始的输入为[n_samples, 784]，reshape 之后为 x_image ，大小为[n_samples, 28, 28, 1]，其中 n_samples是样本数量，28*28为一幅图片的规格，1为颜色通道，因为在这是黑白图，假如是彩色的RGB图，颜色通道就是3。
在conv1层中，滤波器的尺寸为5*5（即patch的大小），滤波器的数量有32个，所以经过conv1层后的输出规格为 [n_samples, 28, 28, 32]。
在pool1层中，滤波器的尺寸为2*2，移动步长在x轴方向和y轴方向分别为2，所以经过这一层后的输出为[n_samples, 14, 14, 32]。
在conv2层中，滤波器的尺寸为5*5，滤波器的数量有64个，每一个对图像在深度方向上做卷积操作，所以输出的规格为[n_samples, 14, 14 ,64]。
在pool2层中，滤波器与pool1层一样，输出为[n_samples, 7, 7, 64]。
在fc1即第一层全连接层中，加入了dropout，防止过拟合，dropout的作用就是部分连接（这也是cnn与传统的神经网络的区别之一）。
fc2层输入为[n_samples, 1024]，输出是一个10维的列向量，按0,1编码，它的标签是什么，就在对应位置处置1，否则为0。比如输出的标签为2，则编码为[0 0 1 0 0 0 0 0 0 0]。

整个神经网络最终的优化目标函数为交叉熵：
$H_{y^{'}}(y)=-\sum_{i}y_{i}^{'}log(y_{i})$
y 是我们预测的概率分布, y’ 是实际的分布, 在实际操作的时候，因为每个batch大小为100，即每次取100个样本进行训练，这样的话交叉熵用代码表示为
cross_entropy=tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction), reduction_indices=[1]))
然后用Adam优化器来优化这个目标函数，使其最小
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

交叉熵随训练次数的增加其变化趋势如图：
这里写图片描述

以上参考极客学院的tensorflow官方中文版手册

源代码

# -*- coding:utf-8: -*-
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)


def compute_accuracy(v_xs, v_ys):
    global prediction
    y_pre = sess.run(prediction, feed_dict={xs: v_xs, keep_prob: 1})
    # print('y_pre=', np.shape(y_pre))
    # print('tf.argmax(y_pre, 1)=', sess.run(tf.argmax(y_pre, 1)))
    correct_prediction = tf.equal(tf.argmax(y_pre, 1), tf.argmax(v_ys, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    result = sess.run(accuracy, feed_dict={xs: v_xs, ys: v_ys, keep_prob: 1})
    return result


def weight_variable(shape, layer_name):
    with tf.name_scope('weights'):
        initial = tf.truncated_normal(shape, stddev=0.1)
        weights = tf.Variable(initial, name='W')
        tf.summary.histogram(layer_name+'/weights', weights)
    return weights


def bias_variable(shape, layer_name):
    with tf.name_scope('biases'):
        initial = tf.constant(0.1, shape=shape)
        biases = tf.Variable(initial, name='b')
        tf.summary.histogram(layer_name+'biases', biases)
    return biases


def conv2d(x, W, layer_name):  # 卷积层函数
    # stride 格式为： [1, x_movement, y_movement, 1]
    # must have strides[0]=strides[3]=1
    outputs = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
    tf.summary.histogram(layer_name+'/outputs', outputs)
    return outputs
    # W: [filter_height, filter_width, in_channels, out_channels]


def max_pool_2x2(x, layer_name):  # 池化层函数
    outputs = tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    tf.summary.histogram(layer_name+'/outputs', outputs)
    return outputs


with tf.name_scope('inputs'):
    # define placeholder for inputs to network
    xs = tf.placeholder(tf.float32, [None, 784], name='x_input')  # 28*28
    ys = tf.placeholder(tf.float32, [None, 10], name='y_input')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
# keep_prob是保留概率，即我们要保留的结果所占比例，
# 它作为一个placeholder，在run时传入， 当keep_prob=1的时候，相当于100%保留，也就是dropout没有起作用。
x_image = tf.reshape(xs, [-1, 28, 28, 1], name='x_image')  # 图片高度为1
# print(x_image.shape)  # [n_samples, 28, 28, 1]

##########################################################################
###  构建整个卷积神经网络
##########################################################################

# conv1 layer #
with tf.name_scope('conv1'):
    W_conv1 = weight_variable([5, 5, 1, 32], 'conv1')  # patch 5*5, in_size=1. out_size=32
    b_conv1 = bias_variable([32], 'conv1')
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 'conv1')+b_conv1)  # output_size = 28*28*32

with tf.name_scope('pool1'):
    h_pool1 = max_pool_2x2(h_conv1, 'pool1')  # output_size = 14*14*32

with tf.name_scope('conv2'):
    # conv2 layer #
    W_conv2 = weight_variable([5, 5, 32, 64], 'conv2')  # patch 5*5, in_size=32. out_size=64
    b_conv2 = bias_variable([64], 'conv2')
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2, 'conv2')+b_conv2)  # output_size = 14*14*64

with tf.name_scope('pool2'):
    h_pool2 = max_pool_2x2(h_conv2, 'pool2')  # output_size = 7*7*64

with tf.name_scope('fc1'):
    # func1 layer #
    W_fc1 = weight_variable([7*7*64, 1024], 'fc1')
    b_fc1 = bias_variable([1024], 'fc1')

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])  # [n_samples, 7, 7, 64]-->[n_samples,7*7*64]
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1)+b_fc1)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)  # dropout防止过拟合

with tf.name_scope('fc2'):
    # func2 layer #
    W_fc2 = weight_variable([1024, 10], 'fc2')
    b_fc2 = bias_variable([10], 'fc2')
    prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2)+b_fc2)


with tf.name_scope('cross_entropy'):
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction), reduction_indices=[1]))
    tf.summary.scalar('corss_entropy', cross_entropy)


with tf.name_scope('train'):
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

sess = tf.Session()
merged = tf.summary.merge_all()
writer = tf.summary.FileWriter("logs/", sess.graph)

sess.run(tf.global_variables_initializer())

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    # print batch_xs.shape, batch_ys.shape
    ## 输出 (100, 784) (100, 10)
    sess.run(train_step, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})
    if i % 50 == 0:
        rs = sess.run(merged, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})
        writer.add_summary(rs, i)
        print(compute_accuracy(mnist.test.images[: 1000], mnist.test.labels[: 1000]))