一般来说,CNN网络的前几层为卷积层和采样层(或者说池化层),在若干层卷积和池化以后,还有若干层全连接层(也就是传统神经网络),最后输出分类信息。大概的结构示意图如下图所示
可以看到,CNN相比与传统神经网络,最大的区别就是引入了卷积层和池化层。这也是我们在代码中要着重看的地方。
在下面的代码中,卷积是使用tf.nn.conv2d, 池化使用tf.nn.max_pool,下面来详细的讲解一下这两个函数的用法。
tf.nn.conv2d
这个函数的功能是:给定4维的input和filter,计算出一个2维的卷积结果。函数的定义为:
- 1
- 2
前几个参数分别是input, filter, strides, padding, use_cudnn_on_gpu, …下面来一一解释
input:待卷积的数据。格式要求为一个张量,[batch, in_height, in_width, in_channels].
分别表示 批次数,图像高度,宽度,输入通道数。
filter: 卷积核。格式要求为[filter_height, filter_width, in_channels, out_channels].
分别表示 卷积核的高度,宽度,输入通道数,输出通道数。
strides :一个长为4的list. 表示每次卷积以后卷积窗口在input中滑动的距离
padding :有SAME和VALID两种选项,表示是否要保留图像边上那一圈不完全卷积的部分。如果是SAME,则保留
use_cudnn_on_gpu :是否使用cudnn加速。默认是True
tf.nn.max_pool
进行最大值池化操作,而avg_pool 则进行平均值池化操作.函数的定义为:
- 1
value: 一个4D张量,格式为[batch, height, width, channels],与conv2d中input格式一样
ksize: 长为4的list,表示池化窗口的尺寸
strides: 池化窗口的滑动值,与conv2d中的一样
padding: 与conv2d中用法一样。
#coding= "UTF-8"
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
#import data
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('data_dir','data/',"directory for storing data")
print (FLAGS.data_dir)
mnist = input_data.read_data_sets(FLAGS.data_dir,one_hot = True)
def weight_variable(shape):
initial = tf.truncated_normal(shape,stddev = 0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1,shape = shape)
return tf.Variable(initial)
def conv2d(x,w):
return tf.nn.conv2d(x,w,strides = [1,1,1,1],padding ='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x,ksize = [1,2,2,1],2
strides = [1,2,2,1],padding = "SAME")
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32,[None,784])
x_image = tf.reshape(x,[-1,28,28,1])
#第一层
w_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.elu(conv2d(x_image,w_conv1)+b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
#第二层
w_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.elu(conv2d(h_pool1,w_conv2)+b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#第三层
w_fc1 = weight_variable([7*7*64,1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2,[-1,7*7*64])
h_fc1 = tf.nn.elu(tf.matmul(h_pool2_flat,w_fc1)+b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1,keep_prob)
#第四层
w_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2) + b_fc2)
y_ = tf.placeholder(tf.float32,[None,10])
#loss+train
#交叉熵损失函数
loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv),reduction_indices = [1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(1001):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accurancy = sess.run(accuracy,feed_dict = {x:batch[0],y_:batch[1],keep_prob:1.0})
print('step: %d,training accuracy: %g' % (i,train_accurancy))
sess.run(train_step,feed_dict = {x:batch[0],y_:batch[1],keep_prob:0.5})
print ("text accuracy %g" % sess.run(accuracy,feed_dict = {x:mnist.test.images,y_:mnist.test.labels,keep_prob :1.0}))