TensorFlow学习笔记（3）----CNN识别MNIST手写数字

最新推荐文章于 2024-08-18 10:28:23 发布

海上的独木舟

最新推荐文章于 2024-08-18 10:28:23 发布

阅读量6.6k

点赞数 5

分类专栏： TensorFlow 文章标签： TensorFlow python

本文链接：https://blog.csdn.net/PhDat101/article/details/52403127

版权

TensorFlow 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

卷积神经网络（Convolutional Neural Networks，CNN）是个常用的神经网络构型，考虑了图像的结构信息，对MNIST的效果更好一些。CNN包含的主要操作：convolution和pooling在TF中都有对应的函数，直接构建网络即可。CNN的具体介绍很多，这里不再赘述。本文主要介绍TF的实现，包括一些主要函数、步骤的说明，先浏览一下程序：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

import tensorflow as tf

sess = tf.InteractiveSession()

#定义一些函数：分配系数函数、分配偏置函数、卷积函数、pooling函数
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)#均值0标准方差0.1，剔除2倍标准方差之外的随机数据
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)#统一值0.1
  return tf.Variable(initial)

def conv2d(x, W):
  #待操作的数据x，模板W，tensor不同维度上的步长，强制与原tensor等大
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    #平面数据的pool模板2*2，平面数据滑动步长2*2（非重叠的pool）
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding='SAME')


#x是输入的图像，y_是对应的标签
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

#第1层卷积层，Receptive Field 5＊5，单个batch生成32通道数据
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

#把图像向量还原成28＊28的图像
x_image = tf.reshape(x, [-1,28,28,1])

#第1个卷积层，使用了ReLU激活函数
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

#第2层卷积层，Receptive Field 5＊5，单个batch 32通道生成64通道数据
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

#第2个卷积层
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#全链接层系数
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

#全链接层：把64通道数据展开方便全链接
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

#全链层神经元使用dropout防止过拟合
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#softmax层系数
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

#softmax层
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

#交叉熵和训练构型：AdamOptimizer适合这种求和的误差项
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

#验证步骤的构型
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

#初始化
sess.run(tf.initialize_all_variables())

#开始训练
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    #验证的时候dropout=1.0，训练时=0.5
    train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

#验证最终的准确率
print "test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

一些说明：

1）偏置全部为0.1

ReLU是一个比较有效的激活函数“f(x)=max(x,0)“，只需要设置阈值，没有指数运算，导数就是1，运算速度快，梯度不饱和（什么是饱和：其他的函数接近两端时导数接近0，多层时梯度弥散）。按照经验：“用的是ReLU神经元，因此比较好的做法是用一个较小的正数来初始化偏置项，以避免神经元节点输出恒为0的问题（dead neurons）”

2)系数生成函数

`tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None）`

截断正态分布，指定均值和方差，随机产生，如果偏离均值2个标准方差就丢弃重新采样。

`tf.constant(value, dtype=None, shape=None, name='Const')`

使用dtype格式的数据填充，value可以制指定值（可以是数组也可以是单个值，但是必须小于shape，没有的数据使用最后的数据填充），shape指明形状

3）卷积函数

`tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)`

ininput是4D的，格式［batch, in_height, in_width, in_channels］

filter是一个在数据面上，在height、width方向滑动的板子，参数组[filter_height, filter_width, in_channels, out_channels]，指明了模板的大小、输入输出通道数。

（注意：为了统一构型，输出的tensor可以看作堆叠的图像，这些函数按照API说明使用即可，详细的研究涉及C／C++数组在内存中的存储方式，基本上就是非常直观而高效的“指针满天飞“，而这些存储方式的理解，在后面tensor反转等操作时可能需要理解一下）

4)pool函数

`tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)`

value就是构型为[batch, height, width, channels]的tensor， ksize是一个4元素向量，按照前面的顺序分别指明窗口在各个维度上的大小，strides是滑动步长也是基于前面的维度。其他的pool函数还有：

`tf.nn.avg_pool`

`tf.nn.max_pool_with_argmax`

`tf.nn.avg_pool3d`

`tf.nn.max_pool3d`

5）关于dropout

这也是个常用的经验处理，直观地讲就是：训练的时候随机停止一些神经元的更新，目前没有理论解释为何这样会得出较好的结果（毕竟神经网络本身也还没有严格的理论），主要是从稀疏性、组合性等角度去解释。

定性给出一些结果：（没有严格测试，只测试了1、2次）

标准程序运行时间215.7s，准确率99.18%；没有dropout：245.3s，准确率98.96%，去除dropout后没有显著影响，估计是该问题的过拟合效应不明显。

参考：

激活函数：http://blog.csdn.net/u012526120/article/details/49149317

http://www.cnblogs.com/neopenx/p/4453161.html

dropout：http://blog.csdn.net/stdcoutzyx/article/details/4902244

官方手册：https://www.tensorflow.org/versions/r0.10/get_started/index.html

中文社区：http://www.tensorfly.cn/