使用CNN来实现MNIST数据集手写字体识别。
MNIST数据集分为训练集以及测试集,其中每张图片都是28*28*1类型的黑白数字图片,每张图片有标签信息是一个10维数组向量,其中某一位为1,其余为0,用来表示该图片数字属于0-9中哪一个。
之前用逻辑回归实现手写字体识别,这里使用CNN来实现手写字体识别,使用tensorflow。
网络设计
CNN网络总体设计如下:
输入图片读取图片数据,由于mnist数据集对于原始图片数据进行压缩处理,首先将图片进行变换为28*28*1;
经过卷积层1,filter:{3*3*1*64},进行特征提取,最终得到28*28*1*64,即64个特征图;
加以偏移b,经过relu激活函数,实现非线性映射;
经过池化层1,进行图片数据压缩,filter:{2*2},输入为28*28*1*64,输出得到14*14*1*64;
经过卷积层2,filter:{3*3*1*128},进行特征提取,得到14*14*1*128,即128特征图;
经过relu激活函数实现非线性映射;
经过池化层2,filter:{2*2},压缩图片数据,得到7*7*1*128;
经过全连接层1,1024个神经元,实现将输入7*7*128转换为1024维度向量;
经过激活函数relu,实现非线性映射;
经过全连接层2,共有10个神经元,将输入1024维度向量转换为10维度向量,用来表示最终输出;
程序设计
程序总体如下所示:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import input_data
#加载mnist数据集
mnist = input_data.read_data_sets('data/', one_hot=True)
#取出其中的训练集,测试集图片数据以及标签数据
trainimg = mnist.train.images
trainlabel = mnist.train.labels
testimg = mnist.test.images
testlabel = mnist.test.labels
print ("MNIST ready")
#输入图片大小28*28*1,因此输入为784维度向量
n_input = 784
#输出0-9十类别分类值
n_output = 10
#定义卷积层和全连接层的权重参数W
weights = {
#卷积层1权重参数
'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64], stddev=0.1)),
#卷积层2权重参数
'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128], stddev=0.1)),
#全连接层1权重参数
'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
#全连接层2权重参数
'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
}
#定义卷积层和全连接层的偏移参数b
biases = {
#卷积层1偏移参数
'bc1': tf.Variable(tf.random_normal([64], stddev=0.1)),
#卷积层2偏移参数
'bc2': tf.Variable(tf.random_normal([128], stddev=0.1)),
#全连接层1偏移参数
'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
#全连接层2偏移参数
'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
}
#CNN网络结构模型定义
def conv_basic(_input, _w, _b, _keepratio):
# INPUT将输入进行变换,转换为标准格式
_input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])
# CONV LAYER 1 进行卷积层1 64个filter,其中每个filter size 1*1*1
_conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')
print(_conv1)
#激活函数
_conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))
#池化层1 池化filter 2*2, 实现图片数据的压缩
_pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#执行dropout策略,丢弃部分神经元,防止过拟合
_pool_dr1 = tf.nn.dropout(_pool1, _keepratio)
# CONV LAYER 2
_conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
#激活函数
_conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
#池化层2
_pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#dropout
_pool_dr2 = tf.nn.dropout(_pool2, _keepratio)
# VECTORIZE 将输出进行数据格式转换,全连接层有1024个神经元
_dense1 = tf.reshape(_pool_dr2, [-1, _w['wd1'].get_shape().as_list()[0]])
# FULLY CONNECTED LAYER 1
_fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))
#dropout
_fc_dr1 = tf.nn.dropout(_fc1, _keepratio)
# FULLY CONNECTED LAYER 2
_out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])
# RETURN
out = { 'input_r': _input_r, 'conv1': _conv1, 'pool1': _pool1, 'pool1_dr1': _pool_dr1,
'conv2': _conv2, 'pool2': _pool2, 'pool_dr2': _pool_dr2, 'dense1': _dense1,
'fc1': _fc1, 'fc_dr1': _fc_dr1, 'out': _out
}
return out
print ("CNN READY")
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
keepratio = tf.placeholder(tf.float32)
# FUNCTIONS
_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1))
accr = tf.reduce_mean(tf.cast(_corr, tf.float32))
init = tf.global_variables_initializer()
# SAVER
print ("GRAPH READY")
sess = tf.Session()
sess.run(init)
#训练轮次
training_epochs = 15
#每次迭代训练图片数据集的大小
batch_size = 16
#日志显示的迭代间隔
display_step = 1
for epoch in range(training_epochs):
avg_cost = 0.
#计算每次迭代一共需要计算batch总数量
total_batch = int(mnist.train.num_examples / batch_size)
#total_batch = 1
# Loop over all batches
for i in range(total_batch):
#取出某一个batch
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Fit training using batch data
sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
# Compute average loss
avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
print (" Training accuracy: %.3f" % (train_acc))
test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
print (" Test accuracy: %.3f" % (test_acc))
print ("OPTIMIZATION FINISHED")
程序解释
1.加载mnist数据集,获取图片以及标签数据
mnist = input_data.read_data_sets('data/', one_hot=True)
trainimg = mnist.train.images
trainlabel = mnist.train.labels
testimg = mnist.test.images
testlabel = mnist.test.labels
2.利用reshape将输入图片数据进行标准化变换
_input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])
reshape函数:
def reshape(tensor, shape, name=None):
reshape实现将tensor数据按照shape形状进行变换,其中保持总体数据不变,shape为一个列表形式,特殊的是列表可以实现逆序的遍历,即list(-1).-1所代表的含义是我们不用亲自去指定这一维的大小,函数会自动进行计算,但是列表中只能存在一个-1。(如果存在多个-1,就是一个存在多解的方程),变换过程如下:
reshape(t,shape) =>reshape(t,[-1]) =>reshape(t,shape)
3.卷积层1,利用3*3*1的64个filter来实现对于原图片数据卷积,得到64个特征图数据
卷积层1实现特征提取,利用filter:3*3*1*64,将输入图片数据28*28*1,转换为28*28*1*64,得到图片数据的64个特征图
由于padding=SAME,新长度=旧长度/步长 即转换后后特征图尺寸为28*28*1
_conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')
卷积函数conv2d:
def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", dilations=[1, 1, 1, 1], name=None):
输入input为输入图片集的数据大小,将每次卷积计算输入多张图片,每张图片的高度,宽度以及颜色频道
[batch, in_height, in_width, in_channels]
输入filter为卷积计算filter, filter的高度,宽度以及颜色通道(与输入通道保持一致),输出特征图的个数
[filter_height * filter_width * in_channels, output_channels]
strides为filter在进行移动时的四周移动距离
strides: A list of `ints`.
1-D tensor of length 4. The stride of the sliding window for each
dimension of `input`. The dimension order is determined by the value of
`data_format`, see below for details.
padding为当filter移动到图片边界时,图片大小不足够与filter进行卷积计算时采取的策略,取值为SAME VALID。
进行卷积计算当filter移动时,移动到边界处,若剩余部分不足以与filter进行卷积,取值为SAME则会进行补零处理,取值为VALID则会停止计算,因此不同padding方式,得到的最终特征图的大小有区别。
padding: A `string` from: `"SAME", "VALID"`.
The type of padding algorithm to use.
如下图所示:当取值为SAME时,则如右图所示,进行补零操作,取值为VALID时,如左图所示,直接停止计算。
对于VALID,输出的形状计算如下:
对于SAME,输出的形状计算如下:
4.激活函数Relu,实现非线性映射
将卷积层输出,加以偏移b,然后进行激活函数relu,实现非线性映射
_conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))
bias_add函数:
def bias_add(value, bias, data_format=None, name=None):
Args:
value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
`int16`, `int8`, `complex64`, or `complex128`.
bias: A 1-D `Tensor` with size matching the last dimension of `value`.
Must be the same type as `value` unless `value` is a quantized type,
in which case a different quantized type may be used.
relu函数实现非线性映射:
def relu(features, name=None):
r"""Computes rectified linear: `max(features, 0)`.
Args:
features: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `uint8`, `int16`, `int8`, `int64`, `bfloat16`, `uint16`, `half`, `uint32`, `uint64`, `qint8`.
name: A name for the operation (optional).
Returns:
A `Tensor`. Has the same type as `features`.
5.池化层max_pool, 实现图片压缩
池化层实现图片压缩,移动filter2*2,输入为28*28*1*64,输出为14*14*1*64,实现图片长度宽度减半
_pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
池化层实现图片数据的压缩,max_pool实现池化操作:
def max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):
"""Performs the max pooling on the input.
Args:
value: A 4-D `Tensor` of the format specified by `data_format`.
ksize: A list or tuple of 4 ints. The size of the window for each dimension
of the input tensor.
strides: A list or tuple of 4 ints. The stride of the sliding window for
each dimension of the input tensor.
padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
See the "returns" section of `tf.nn.convolution` for details.
data_format: A string. 'NHWC', 'NCHW' and 'NCHW_VECT_C' are supported.
name: Optional name for the operation.
Returns:
A `Tensor` of format specified by `data_format`.
The max pooled output tensor.
6.断开部分连接,防止过拟合dropout
_pool_dr1 = tf.nn.dropout(_pool1, _keepratio)
dropout函数如下:
@tf_export("nn.dropout")
def dropout(x, keep_prob, noise_shape=None, seed=None, name=None): # pylint: disable=invalid-name
"""Computes dropout.
With probability `keep_prob`, outputs the input element scaled up by
`1 / keep_prob`, otherwise outputs `0`. The scaling is so that the expected
sum is unchanged.
By default, each element is kept or dropped independently. If `noise_shape`
is specified, it must be
[broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
will make independent decisions. For example, if `shape(x) = [k, l, m, n]`
and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
kept independently and each row and column will be kept or not kept together.
Args:
x: A floating point tensor.
keep_prob: A scalar `Tensor` with the same type as x. The probability
that each element is kept.
noise_shape: A 1-D `Tensor` of type `int32`, representing the
shape for randomly generated keep/drop flags.
seed: A Python integer. Used to create random seeds. See
`tf.set_random_seed`
for behavior.
name: A name for this operation (optional).
Returns:
A Tensor of the same shape of `x`.
7.卷积2,池化2层
# CONV LAYER 2
_conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
#激活函数
_conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
#池化层2
_pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#dropout
_pool_dr2 = tf.nn.dropout(_pool2, _keepratio)
进行卷积层2,filter:3*3*1*128,将输入14*14*1*64进行特征提取,转换为14*14*1*128,得到最终128张特征图
将卷积结果加以偏移,经过relu激活函数实现非线性映射
经过池化层2,实现图片数据压缩,将输入14*14*1*128,转换为7*7*1*128
将池化层2输出进行dropout操作,防止过拟合
8.全连接层
全连接层1设计共有1024个神经元,池化层2输出为7*7*128数据,因此全连接层权重W:[7*7*128,1024] b:[1024]
'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
# FULLY CONNECTED LAYER 1
_fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))
matmul函数: 实现两个矩阵的相乘
def matmul(a,
b,
transpose_a=False,
transpose_b=False,
adjoint_a=False,
adjoint_b=False,
a_is_sparse=False,
b_is_sparse=False,
name=None):
"""Multiplies matrix `a` by matrix `b`, producing `a` * `b`.
全连接层2实现将全连接层1输出1024映射为10维度向量,来输出最终识别结果。因此W:[1024,10] b[10]
'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
# FULLY CONNECTED LAYER 2
_out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])
9.损失函数
_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1))
accr = tf.reduce_mean(tf.cast(_corr, tf.float32))
conv_basic函数输出为依次CNN网络各层的参数,其中out为最终的输出10维类别向量,其中输入图片的真实标签为y
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
这里使用函数softmax_cross_entropy_with_logits来计算预测值与真实值之间的损失:
def softmax_cross_entropy_with_logits(
_sentinel=None, # pylint: disable=invalid-name
labels=None,
logits=None,
dim=-1,
name=None):
使用时传参要指定参数名logits,labels以及相应取值
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
softmax_cross_entropy_with_logits主要用来计算交叉熵,交叉熵(Cross Entropy)主要用于度量两个概率分布间的差异性信息。
交叉熵的具体解释可以参考:https://blog.csdn.net/yhily2008/article/details/80262321
简要解释如下:
事件的概率:概率(Probability)描述的是某事件A出现的次数与所有事件出现的次数之比
事件的Odds: Odds指的是事件发生的概率与事件不发生的概率之比
logit表示的含义如下:logit变换
logit函数图像如下:
在P=0或P=1附近,Logit非常敏感(值域变化非常大)。通过Logit变换,P从0到1变化时,Logit是从到
。Logit值域的不受限,可以让回归拟合变得容易。
10.训练以及测试过程
for epoch in range(training_epochs):
avg_cost = 0.
#计算每次迭代一共需要计算batch总数量
#total_batch = int(mnist.train.num_examples / batch_size)
total_batch = 1
# Loop over all batches
for i in range(total_batch):
#取出某一个batch
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Fit training using batch data
sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
# Compute average loss
avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
print (" Training accuracy: %.3f" % (train_acc))
test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
print (" Test accuracy: %.3f" % (test_acc))
对于每一轮迭代次数:
根据batch大小,计算本次迭代需要的计算次数,由于数据量太大,不能将所有数据直接输入到网络进行计算,应该batch逐一计算;
对于每一个batch计算完成之后,计算损失值,进行反向传播迭代,然后计算训练集以及测试集的准确率。
程序运行过程观察:
取出第一张图片以及类别信息数据
类别标签数据:[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]] 可知该图片属于数字1。
训练过程中,各层输出的数据规格:
_input_r Tensor("Reshape:0", shape=(?, 28, 28, 1), dtype=float32)
_conv1 Tensor("Conv2D:0", shape=(?, 28, 28, 64), dtype=float32)
relu_conv1 Tensor("Relu:0", shape=(?, 28, 28, 64), dtype=float32)
pool1 Tensor("MaxPool:0", shape=(?, 14, 14, 64), dtype=float32)
_pool_dr1 Tensor("dropout/mul:0", shape=(?, 14, 14, 64), dtype=float32)
_conv2 Tensor("Conv2D_1:0", shape=(?, 14, 14, 128), dtype=float32)
relu_conv2 Tensor("Relu_1:0", shape=(?, 14, 14, 128), dtype=float32)
_pool2 Tensor("MaxPool_1:0", shape=(?, 7, 7, 128), dtype=float32)
_pool_dr2 Tensor("dropout_1/mul:0", shape=(?, 7, 7, 128), dtype=float32)
_dense1 Tensor("Reshape_1:0", shape=(?, 6272), dtype=float32)
_fc1 Tensor("Relu_2:0", shape=(?, 1024), dtype=float32)
_fc_dr1 Tensor("dropout_2/mul:0", shape=(?, 1024), dtype=float32)
_out Tensor("Add_1:0", shape=(?, 10), dtype=float32)
参考链接
唐宇迪深度学习学习资料