Tensorflow教程(五)——MNIST项目提高

最新推荐文章于 2020-05-29 13:28:31 发布

泉伟

最新推荐文章于 2020-05-29 13:28:31 发布

阅读量304

点赞数

分类专栏： Tensorflow 文章标签： TensorFlow

本文链接：https://blog.csdn.net/qq_35451572/article/details/86488036

版权

Tensorflow 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

对于MNIST数据集与Softmax我们在上一节已经介绍了，在这里就不做详细地介绍了。下边我们构建一个多层卷积网络提高一下MNIST数据集的识别能力。

文章目录

tf.InteractiveSession()

名称	tf.InteractiveSession()	tf.Session()
特点	可以先构建会话(Session)，后定义操作(op)	必须先定义操作(op)，后建会话(Session)

tf.InteractiveSession()函数它能让你在运行图的时候，插入一些计算图，不需要再之前就定义好操作；
tf.Session()则需要在启动Session之前构建整个计算图，然后启动该计算图。

相比之下，tf.InteractiveSession()函数的实用性更强，所以在后边的程序中我们会经常使用tf.InteractiveSession()函数。

import tensorflow as tf
sess = tf.InteractiveSession()

初始化权重与偏置

def weight_variable(shape):
	initial = tf.truncated_normal(shape, stddev=0.1)
	return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

通过定义上边两个函数来设置权重与偏置，直接调用上边的函数就可以初始化权重与偏置。
tf.truncated_normal(shape, mean, stddev)函数产生正太分布，均值和标准差自己设定。shape表示生成张量的维度；mean是均值，不设定时默认为0；stddev是标准差。

定义卷积与池化

定义卷积

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

上边定义自定义函数用来定义卷积层。卷积层使用的函数为：

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

这个函数我们在后边还会经常使用，所以在这里做一个详细的介绍。该操作一共六个参数，含义分别如下：

参数	含义
input	指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和float64其中之一
filter	相当于CNN中的卷积核，它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，有一个地方需要注意，第三维in_channels，就是参数input的第四维
strides	卷积时在图像每一维的步长，这是一个一维的向量，长度4
padding	string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式
use_cudnn_on_gpu	use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true
name	指定该操作的名称

该函数结果返回一个Tensor，这个输出，就是我们常说的feature map，shape仍然是[batch, height, width, channels]这种形式。

定义池化

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

上边定义自定义函数用来定义池化层。池化层使用的函数为：

tf.nn.max_pool(value, ksize, strides, padding, name=None)

参数是四个，和卷积很类似：

参数	含义
value	需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape
ksize	池化窗口的大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1
strides	和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]
padding	和卷积类似，可以取’VALID’ 或者’SAME’

返回一个Tensor，类型不变，shape仍然是[batch, height, width, channels]这种形式

定义网络结构

第一层卷积

现在我们可以开始实现第一层了。它由一个卷积接一个max pooling完成。卷积在每个5x5的patch中算出32个特征。卷积的权重张量形状是[5, 5, 1, 32]，前两个维度是patch的大小，接着是输入的通道数目，最后是输出的通道数目。而对于每一个输出通道都有一个对应的偏置量。

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

为了用这一层，我们把x变成一个4d向量，其第2、第3维对应图片的宽、高，最后一维代表图片的颜色通道数(因为是灰度图所以这里的通道数为1，如果是rgb彩色图，则为3)。

x_image = tf.reshape(x, [-1,28,28,1])

tf.reshape()函数格式为：

tf.reshape( tensor, shape, name=None)

tensor形参传入一个tensor。
shape传入一个向量，代表新tensor的维度数和每个维度的长度。如果传入[3,4,5]，就会返回一个内含各分量数值和原传入张量一模一样的345尺寸的张量。如果shape传入的向量某一个分量设置为-1，比如[-1,4,5]，那么这个分量代表的维度尺寸会被自动计算出来。

我们把x_image和权值向量进行卷积，加上偏置项，然后应用ReLU激活函数，最后进行max pooling。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

第二层卷积

为了构建一个更深的网络，我们会把几个类似的层堆叠起来。第二层中，每个5x5的patch会得到64个特征。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

密集连接层

现在，图片尺寸减小到7x7，我们加入一个有1024个神经元的全连接层，用于处理整个图片。我们把池化层输出的张量reshape成一些向量，乘上权重矩阵，加上偏置，然后对其使用ReLU。

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

为了减少过拟合，我们在输出层之前加入dropout。我们用一个placeholder来代表一个神经元的输出在dropout中保持不变的概率。这样我们可以在训练过程中启用dropout，在测试过程中关闭dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神经元的输出外，还会自动处理神经元输出值的scale。所以用dropout的时候可以不用考虑scale。

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

tf.nn.dropout()函数的使用方法如下所示：

tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

参数	含义
x	指输入，输入tensor
keep_prob	float类型，每个元素被保留下来的概率，设置神经元被选中的概率,在初始化时keep_prob是一个占位符, keep_prob = tf.placeholder(tf.float32) 。tensorflow在run时设置keep_prob具体的值，例如keep_prob
noise_shape	一个1维的int32张量，代表了随机产生“保留/丢弃”标志的shape。
seed	整形变量，随机数种子。
name	指定该操作的名字

输出层

最后，我们添加一个softmax层，就像前面的单层softmax regression一样。

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

训练和评估模型

为了进行训练和评估，我们使用与之前简单的单层SoftMax神经网络模型几乎相同的一套代码，只是我们会用更加复杂的ADAM优化器来做梯度最速下降，在feed_dict中加入额外的参数keep_prob来控制dropout比例。然后每100次迭代输出一次日志。

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

TensorFlow官方文档写着以上代码，在最终测试集上的准确率大概是99.2%，但是我的测试是93.6%。如果将

train_step = tf.train.AdagradOptimizer(1e-4).minimize(cross_entropy)

替换为

train_step = tf.train.GradientDescentOptimizer(1e-3).minimize(cross_entropy)

这样准确率是可以提升到99.2%的，两种训练方法我们在后边会进行介绍。

总程序

#!/usr/bin/python
# coding:utf-8
import tensorflow as tf
import input_data
import time

mnist = input_data.read_data_sets("Mnist_data/", one_hot=True)

sess = tf.InteractiveSession()


# weight initialization
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


# convolution
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


# pooling
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


# Create the model
# placeholder
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])
# variables
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

# first convolutinal layer
w_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# second convolutional layer
w_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# densely connected layer
w_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# readout layer
w_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)

# train and evaluate the model
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
# train_step = tf.train.AdagradOptimizer(1e-4).minimize(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(1e-3).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
time_start = time.time()

for i in range(2000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
        time_end = time.time()
        time_cost = time_end - time_start
        print("step %d, train accuracy %g, time cost %g s" % (i, train_accuracy, time_cost))
        time_start = time.time()
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))