深度学习--卷积神经网络CNN--手写字体MNIST识别

最新推荐文章于 2024-05-05 23:39:28 发布

暗夜猎手-大魔王

最新推荐文章于 2024-05-05 23:39:28 发布

阅读量4.5k

点赞数 8

分类专栏：深度学习深度学习

本文链接：https://blog.csdn.net/u014106644/article/details/89071902

版权

深度学习同时被 2 个专栏收录

9 篇文章 1 订阅

订阅专栏

深度学习

9 篇文章 0 订阅

订阅专栏

使用CNN来实现MNIST数据集手写字体识别。

MNIST数据集分为训练集以及测试集，其中每张图片都是28*28*1类型的黑白数字图片，每张图片有标签信息是一个10维数组向量，其中某一位为1，其余为0，用来表示该图片数字属于0-9中哪一个。

之前用逻辑回归实现手写字体识别，这里使用CNN来实现手写字体识别，使用tensorflow。

网络设计

CNN网络总体设计如下：

输入图片读取图片数据，由于mnist数据集对于原始图片数据进行压缩处理，首先将图片进行变换为28*28*1；

经过卷积层1，filter:{3*3*1*64}，进行特征提取，最终得到28*28*1*64，即64个特征图；

加以偏移b，经过relu激活函数，实现非线性映射；

经过池化层1，进行图片数据压缩，filter:{2*2}，输入为28*28*1*64，输出得到14*14*1*64；

经过卷积层2，filter:{3*3*1*128}，进行特征提取，得到14*14*1*128，即128特征图；

经过relu激活函数实现非线性映射；

经过池化层2，filter:{2*2}，压缩图片数据，得到7*7*1*128；

经过全连接层1，1024个神经元，实现将输入7*7*128转换为1024维度向量；

经过激活函数relu，实现非线性映射；

经过全连接层2，共有10个神经元，将输入1024维度向量转换为10维度向量，用来表示最终输出；

程序设计

程序总体如下所示：

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import input_data

#加载mnist数据集
mnist = input_data.read_data_sets('data/', one_hot=True)

#取出其中的训练集，测试集图片数据以及标签数据
trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels
print ("MNIST ready")

#输入图片大小28*28*1，因此输入为784维度向量
n_input  = 784
#输出0-9十类别分类值
n_output = 10
#定义卷积层和全连接层的权重参数W
weights  = {
        #卷积层1权重参数
        'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64], stddev=0.1)),
        #卷积层2权重参数
        'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128], stddev=0.1)),
        #全连接层1权重参数
        'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
        #全连接层2权重参数
        'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
    }
#定义卷积层和全连接层的偏移参数b
biases   = {
        #卷积层1偏移参数  
        'bc1': tf.Variable(tf.random_normal([64], stddev=0.1)),
        #卷积层2偏移参数 
        'bc2': tf.Variable(tf.random_normal([128], stddev=0.1)),
        #全连接层1偏移参数 
        'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
        #全连接层2偏移参数 
        'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
    }

#CNN网络结构模型定义
def conv_basic(_input, _w, _b, _keepratio):
        # INPUT将输入进行变换，转换为标准格式
        _input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])
        # CONV LAYER 1 进行卷积层1  64个filter，其中每个filter size 1*1*1
        _conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')
        print(_conv1)
        #激活函数
        _conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))
        #池化层1 池化filter 2*2, 实现图片数据的压缩
        _pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #执行dropout策略，丢弃部分神经元，防止过拟合
        _pool_dr1 = tf.nn.dropout(_pool1, _keepratio)
        # CONV LAYER 2
        _conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
        #激活函数
        _conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
        #池化层2
        _pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #dropout
        _pool_dr2 = tf.nn.dropout(_pool2, _keepratio)
        # VECTORIZE 将输出进行数据格式转换，全连接层有1024个神经元
        _dense1 = tf.reshape(_pool_dr2, [-1, _w['wd1'].get_shape().as_list()[0]])
        # FULLY CONNECTED LAYER 1
        _fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))
        #dropout
        _fc_dr1 = tf.nn.dropout(_fc1, _keepratio)
        # FULLY CONNECTED LAYER 2
        _out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])
        # RETURN
        out = { 'input_r': _input_r, 'conv1': _conv1, 'pool1': _pool1, 'pool1_dr1': _pool_dr1,
            'conv2': _conv2, 'pool2': _pool2, 'pool_dr2': _pool_dr2, 'dense1': _dense1,
            'fc1': _fc1, 'fc_dr1': _fc_dr1, 'out': _out
        }
        return out
print ("CNN READY")

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output]) 
keepratio = tf.placeholder(tf.float32)

# FUNCTIONS

_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1)) 
accr = tf.reduce_mean(tf.cast(_corr, tf.float32)) 
init = tf.global_variables_initializer()
    
# SAVER
print ("GRAPH READY")

sess = tf.Session()
sess.run(init)

#训练轮次
training_epochs = 15
#每次迭代训练图片数据集的大小
batch_size      = 16
#日志显示的迭代间隔
display_step    = 1
for epoch in range(training_epochs):
    avg_cost = 0.
    #计算每次迭代一共需要计算batch总数量
    total_batch = int(mnist.train.num_examples / batch_size)
    #total_batch = 1
    # Loop over all batches
    for i in range(total_batch):
        #取出某一个batch
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Fit training using batch data
        sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
        # Compute average loss
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch

    # Display logs per epoch step
    if epoch % display_step == 0: 
        print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
        train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
        print (" Training accuracy: %.3f" % (train_acc))
        test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
        print (" Test accuracy: %.3f" % (test_acc))

print ("OPTIMIZATION FINISHED")

程序解释

1.加载mnist数据集，获取图片以及标签数据

mnist = input_data.read_data_sets('data/', one_hot=True)

trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels

2.利用reshape将输入图片数据进行标准化变换

_input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])

reshape函数：

def reshape(tensor, shape, name=None):

reshape实现将tensor数据按照shape形状进行变换，其中保持总体数据不变，shape为一个列表形式，特殊的是列表可以实现逆序的遍历，即list(-1).-1所代表的含义是我们不用亲自去指定这一维的大小，函数会自动进行计算，但是列表中只能存在一个-1。（如果存在多个-1，就是一个存在多解的方程），变换过程如下：

reshape(t,shape) =>reshape(t,[-1]) =>reshape(t,shape)

3.卷积层1，利用331的64个filter来实现对于原图片数据卷积，得到64个特征图数据

卷积层1实现特征提取，利用filter:3*3*1*64，将输入图片数据28*28*1，转换为28*28*1*64，得到图片数据的64个特征图

由于padding=SAME，新长度=旧长度/步长即转换后后特征图尺寸为28*28*1

_conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')

卷积函数conv2d：

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", dilations=[1, 1, 1, 1], name=None):

输入input为输入图片集的数据大小，将每次卷积计算输入多张图片，每张图片的高度，宽度以及颜色频道

[batch, in_height, in_width, in_channels]

输入filter为卷积计算filter， filter的高度，宽度以及颜色通道（与输入通道保持一致），输出特征图的个数

[filter_height * filter_width * in_channels, output_channels]

strides为filter在进行移动时的四周移动距离

 strides: A list of `ints`.
      1-D tensor of length 4.  The stride of the sliding window for each

      dimension of `input`. The dimension order is determined by the value of

      `data_format`, see below for details.

padding为当filter移动到图片边界时，图片大小不足够与filter进行卷积计算时采取的策略，取值为SAME VALID。

进行卷积计算当filter移动时，移动到边界处，若剩余部分不足以与filter进行卷积，取值为SAME则会进行补零处理，取值为VALID则会停止计算，因此不同padding方式，得到的最终特征图的大小有区别。

padding: A `string` from: `"SAME", "VALID"`.
      The type of padding algorithm to use.

如下图所示：当取值为SAME时，则如右图所示，进行补零操作，取值为VALID时，如左图所示，直接停止计算。

对于VALID，输出的形状计算如下：

对于SAME，输出的形状计算如下：

4.激活函数Relu,实现非线性映射

将卷积层输出，加以偏移b，然后进行激活函数relu，实现非线性映射

_conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))

bias_add函数：

def bias_add(value, bias, data_format=None, name=None):

Args:
    value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
      `int16`, `int8`, `complex64`, or `complex128`.
    bias: A 1-D `Tensor` with size matching the last dimension of `value`.
      Must be the same type as `value` unless `value` is a quantized type,
      in which case a different quantized type may be used.

relu函数实现非线性映射：

def relu(features, name=None):
  r"""Computes rectified linear: `max(features, 0)`.

  Args:
    features: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `uint8`, `int16`, `int8`, `int64`, `bfloat16`, `uint16`, `half`, `uint32`, `uint64`, `qint8`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `features`.

5.池化层max_pool, 实现图片压缩

池化层实现图片压缩，移动filter2*2，输入为28*28*1*64，输出为14*14*1*64，实现图片长度宽度减半

_pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

池化层实现图片数据的压缩，max_pool实现池化操作：

def max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):
  """Performs the max pooling on the input.

  Args:
    value: A 4-D `Tensor` of the format specified by `data_format`.
    ksize: A list or tuple of 4 ints. The size of the window for each dimension
      of the input tensor.
    strides: A list or tuple of 4 ints. The stride of the sliding window for
      each dimension of the input tensor.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
      See the "returns" section of `tf.nn.convolution` for details.
    data_format: A string. 'NHWC', 'NCHW' and 'NCHW_VECT_C' are supported.
    name: Optional name for the operation.

  Returns:
    A `Tensor` of format specified by `data_format`.
    The max pooled output tensor.

6.断开部分连接，防止过拟合dropout

_pool_dr1 = tf.nn.dropout(_pool1, _keepratio)

dropout函数如下：

@tf_export("nn.dropout")
def dropout(x, keep_prob, noise_shape=None, seed=None, name=None):  # pylint: disable=invalid-name
  """Computes dropout.

  With probability `keep_prob`, outputs the input element scaled up by
  `1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
  sum is unchanged.

  By default, each element is kept or dropped independently.  If `noise_shape`
  is specified, it must be
  [broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
  to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
  will make independent decisions.  For example, if `shape(x) = [k, l, m, n]`
  and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
  kept independently and each row and column will be kept or not kept together.

  Args:
    x: A floating point tensor.
    keep_prob: A scalar `Tensor` with the same type as x. The probability
      that each element is kept.
    noise_shape: A 1-D `Tensor` of type `int32`, representing the
      shape for randomly generated keep/drop flags.
    seed: A Python integer. Used to create random seeds. See
      `tf.set_random_seed`
      for behavior.
    name: A name for this operation (optional).

  Returns:
    A Tensor of the same shape of `x`.

7.卷积2，池化2层

        # CONV LAYER 2
        _conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
        #激活函数
        _conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
        #池化层2
        _pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #dropout
        _pool_dr2 = tf.nn.dropout(_pool2, _keepratio)

进行卷积层2，filter:3*3*1*128，将输入14*14*1*64进行特征提取，转换为14*14*1*128，得到最终128张特征图

将卷积结果加以偏移，经过relu激活函数实现非线性映射

经过池化层2，实现图片数据压缩，将输入14*14*1*128，转换为7*7*1*128

将池化层2输出进行dropout操作，防止过拟合

8.全连接层

全连接层1设计共有1024个神经元，池化层2输出为7*7*128数据，因此全连接层权重W:[7*7*128,1024] b:[1024]

'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
 'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),

 # FULLY CONNECTED LAYER 1
        _fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))

matmul函数：实现两个矩阵的相乘

def matmul(a,
           b,
           transpose_a=False,
           transpose_b=False,
           adjoint_a=False,
           adjoint_b=False,
           a_is_sparse=False,
           b_is_sparse=False,
           name=None):
  """Multiplies matrix `a` by matrix `b`, producing `a` * `b`.

全连接层2实现将全连接层1输出1024映射为10维度向量，来输出最终识别结果。因此W:[1024，10] b[10]

'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))

        # FULLY CONNECTED LAYER 2
        _out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])

9.损失函数

_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1)) 
accr = tf.reduce_mean(tf.cast(_corr, tf.float32))

conv_basic函数输出为依次CNN网络各层的参数，其中out为最终的输出10维类别向量，其中输入图片的真实标签为y

batch_xs, batch_ys = mnist.train.next_batch(batch_size)

这里使用函数softmax_cross_entropy_with_logits来计算预测值与真实值之间的损失：

def softmax_cross_entropy_with_logits(
    _sentinel=None,  # pylint: disable=invalid-name
    labels=None,
    logits=None,
    dim=-1,
    name=None):

使用时传参要指定参数名logits,labels以及相应取值

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))

softmax_cross_entropy_with_logits主要用来计算交叉熵，交叉熵（Cross Entropy）主要用于度量两个概率分布间的差异性信息。

交叉熵的具体解释可以参考：https://blog.csdn.net/yhily2008/article/details/80262321

简要解释如下：

事件的概率：概率（Probability）描述的是某事件A出现的次数与所有事件出现的次数之比

事件的Odds: Odds指的是事件发生的概率与事件不发生的概率之比

logit表示的含义如下：logit变换

logit函数图像如下：

在P=0或P=1附近，Logit非常敏感（值域变化非常大）。通过Logit变换，P从0到1变化时，Logit是从 $- \infty$ 到 $+ \infty$ 。Logit值域的不受限，可以让回归拟合变得容易。

10.训练以及测试过程

for epoch in range(training_epochs):
    avg_cost = 0.
    #计算每次迭代一共需要计算batch总数量
    #total_batch = int(mnist.train.num_examples / batch_size)
    total_batch = 1
    # Loop over all batches
    for i in range(total_batch):
        #取出某一个batch
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Fit training using batch data
        sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
        # Compute average loss
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch

    # Display logs per epoch step
    if epoch % display_step == 0: 
        print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
        train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
        print (" Training accuracy: %.3f" % (train_acc))
        test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
        print (" Test accuracy: %.3f" % (test_acc))

对于每一轮迭代次数：

根据batch大小，计算本次迭代需要的计算次数，由于数据量太大，不能将所有数据直接输入到网络进行计算，应该batch逐一计算；

对于每一个batch计算完成之后，计算损失值，进行反向传播迭代，然后计算训练集以及测试集的准确率。

程序运行过程观察：

取出第一张图片以及类别信息数据

类别标签数据：[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]] 可知该图片属于数字1。

训练过程中，各层输出的数据规格：

_input_r Tensor("Reshape:0", shape=(?, 28, 28, 1), dtype=float32)
_conv1 Tensor("Conv2D:0", shape=(?, 28, 28, 64), dtype=float32)
relu_conv1 Tensor("Relu:0", shape=(?, 28, 28, 64), dtype=float32)
pool1 Tensor("MaxPool:0", shape=(?, 14, 14, 64), dtype=float32)
_pool_dr1 Tensor("dropout/mul:0", shape=(?, 14, 14, 64), dtype=float32)
_conv2 Tensor("Conv2D_1:0", shape=(?, 14, 14, 128), dtype=float32)
relu_conv2 Tensor("Relu_1:0", shape=(?, 14, 14, 128), dtype=float32)
_pool2 Tensor("MaxPool_1:0", shape=(?, 7, 7, 128), dtype=float32)
_pool_dr2 Tensor("dropout_1/mul:0", shape=(?, 7, 7, 128), dtype=float32)
_dense1 Tensor("Reshape_1:0", shape=(?, 6272), dtype=float32)
_fc1 Tensor("Relu_2:0", shape=(?, 1024), dtype=float32)
_fc_dr1 Tensor("dropout_2/mul:0", shape=(?, 1024), dtype=float32)
_out Tensor("Add_1:0", shape=(?, 10), dtype=float32)

参考链接

唐宇迪深度学习学习资料

https://blog.csdn.net/yhily2008/article/details/80262321

暗夜猎手-大魔王

关注

8
点赞
踩
52

收藏

觉得还不错? 一键收藏
1
评论
深度学习--卷积神经网络CNN--手写字体MNIST识别

使用CNN来实现MNIST数据集手写字体识别。MNIST数据集分为训练集以及测试集，其中每张图片都是28*28*1类型的黑白数字图片，每张图片有标签信息是一个10维数组向量，其中某一位为1，其余为0，用来表示该图片数字属于0-9中哪一个。之前用逻辑回归实现手写字体识别，这里使用CNN来实现手写字体识别，使用tensorflow。网络设计CNN网络总体设计如下：输入图片读取图...
复制链接

扫一扫