深度学习--卷积神经网络CNN--手写字体MNIST识别

使用CNN来实现MNIST数据集手写字体识别。

MNIST数据集分为训练集以及测试集,其中每张图片都是28*28*1类型的黑白数字图片,每张图片有标签信息是一个10维数组向量,其中某一位为1,其余为0,用来表示该图片数字属于0-9中哪一个。

之前用逻辑回归实现手写字体识别,这里使用CNN来实现手写字体识别,使用tensorflow。

网络设计

CNN网络总体设计如下:

输入图片读取图片数据,由于mnist数据集对于原始图片数据进行压缩处理,首先将图片进行变换为28*28*1;

经过卷积层1,filter:{3*3*1*64},进行特征提取,最终得到28*28*1*64,即64个特征图;

加以偏移b,经过relu激活函数,实现非线性映射;

经过池化层1,进行图片数据压缩,filter:{2*2},输入为28*28*1*64,输出得到14*14*1*64;

经过卷积层2,filter:{3*3*1*128},进行特征提取,得到14*14*1*128,即128特征图;

经过relu激活函数实现非线性映射;

经过池化层2,filter:{2*2},压缩图片数据,得到7*7*1*128;

经过全连接层1,1024个神经元,实现将输入7*7*128转换为1024维度向量;

经过激活函数relu,实现非线性映射;

经过全连接层2,共有10个神经元,将输入1024维度向量转换为10维度向量,用来表示最终输出;

程序设计

程序总体如下所示:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import input_data

#加载mnist数据集
mnist = input_data.read_data_sets('data/', one_hot=True)

#取出其中的训练集,测试集图片数据以及标签数据
trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels
print ("MNIST ready")

#输入图片大小28*28*1,因此输入为784维度向量
n_input  = 784
#输出0-9十类别分类值
n_output = 10
#定义卷积层和全连接层的权重参数W
weights  = {
        #卷积层1权重参数
        'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64], stddev=0.1)),
        #卷积层2权重参数
        'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128], stddev=0.1)),
        #全连接层1权重参数
        'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
        #全连接层2权重参数
        'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
    }
#定义卷积层和全连接层的偏移参数b
biases   = {
        #卷积层1偏移参数  
        'bc1': tf.Variable(tf.random_normal([64], stddev=0.1)),
        #卷积层2偏移参数 
        'bc2': tf.Variable(tf.random_normal([128], stddev=0.1)),
        #全连接层1偏移参数 
        'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
        #全连接层2偏移参数 
        'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
    }

#CNN网络结构模型定义
def conv_basic(_input, _w, _b, _keepratio):
        # INPUT将输入进行变换,转换为标准格式
        _input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])
        # CONV LAYER 1 进行卷积层1  64个filter,其中每个filter size 1*1*1
        _conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')
        print(_conv1)
        #激活函数
        _conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))
        #池化层1 池化filter 2*2, 实现图片数据的压缩
        _pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #执行dropout策略,丢弃部分神经元,防止过拟合
        _pool_dr1 = tf.nn.dropout(_pool1, _keepratio)
        # CONV LAYER 2
        _conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
        #激活函数
        _conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
        #池化层2
        _pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #dropout
        _pool_dr2 = tf.nn.dropout(_pool2, _keepratio)
        # VECTORIZE 将输出进行数据格式转换,全连接层有1024个神经元
        _dense1 = tf.reshape(_pool_dr2, [-1, _w['wd1'].get_shape().as_list()[0]])
        # FULLY CONNECTED LAYER 1
        _fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))
        #dropout
        _fc_dr1 = tf.nn.dropout(_fc1, _keepratio)
        # FULLY CONNECTED LAYER 2
        _out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])
        # RETURN
        out = { 'input_r': _input_r, 'conv1': _conv1, 'pool1': _pool1, 'pool1_dr1': _pool_dr1,
            'conv2': _conv2, 'pool2': _pool2, 'pool_dr2': _pool_dr2, 'dense1': _dense1,
            'fc1': _fc1, 'fc_dr1': _fc_dr1, 'out': _out
        }
        return out
print ("CNN READY")

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output]) 
keepratio = tf.placeholder(tf.float32)

# FUNCTIONS

_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1)) 
accr = tf.reduce_mean(tf.cast(_corr, tf.float32)) 
init = tf.global_variables_initializer()
    
# SAVER
print ("GRAPH READY")

sess = tf.Session()
sess.run(init)

#训练轮次
training_epochs = 15
#每次迭代训练图片数据集的大小
batch_size      = 16
#日志显示的迭代间隔
display_step    = 1
for epoch in range(training_epochs):
    avg_cost = 0.
    #计算每次迭代一共需要计算batch总数量
    total_batch = int(mnist.train.num_examples / batch_size)
    #total_batch = 1
    # Loop over all batches
    for i in range(total_batch):
        #取出某一个batch
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Fit training using batch data
        sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
        # Compute average loss
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch

    # Display logs per epoch step
    if epoch % display_step == 0: 
        print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
        train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
        print (" Training accuracy: %.3f" % (train_acc))
        test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
        print (" Test accuracy: %.3f" % (test_acc))

print ("OPTIMIZATION FINISHED")

 程序解释

1.加载mnist数据集,获取图片以及标签数据

mnist = input_data.read_data_sets('data/', one_hot=True)

trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels

2.利用reshape将输入图片数据进行标准化变换

_input_r = tf.reshape(_input, shape=[-1, 28, 28, 1])

reshape函数:

def reshape(tensor, shape, name=None):

reshape实现将tensor数据按照shape形状进行变换,其中保持总体数据不变,shape为一个列表形式,特殊的是列表可以实现逆序的遍历,即list(-1).-1所代表的含义是我们不用亲自去指定这一维的大小,函数会自动进行计算,但是列表中只能存在一个-1。(如果存在多个-1,就是一个存在多解的方程),变换过程如下:

reshape(t,shape) =>reshape(t,[-1]) =>reshape(t,shape)

3.卷积层1,利用3*3*1的64个filter来实现对于原图片数据卷积,得到64个特征图数据

卷积层1实现特征提取,利用filter:3*3*1*64,将输入图片数据28*28*1,转换为28*28*1*64,得到图片数据的64个特征图

由于padding=SAME,新长度=旧长度/步长  即转换后后特征图尺寸为28*28*1

_conv1 = tf.nn.conv2d(_input_r, _w['wc1'], strides=[1, 1, 1, 1], padding='SAME')

卷积函数conv2d

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", dilations=[1, 1, 1, 1], name=None):

输入input为输入图片集的数据大小,将每次卷积计算输入多张图片,每张图片的高度,宽度以及颜色频道

[batch, in_height, in_width, in_channels]

输入filter为卷积计算filter, filter的高度,宽度以及颜色通道(与输入通道保持一致),输出特征图的个数

[filter_height * filter_width * in_channels, output_channels]

strides为filter在进行移动时的四周移动距离

 strides: A list of `ints`.
      1-D tensor of length 4.  The stride of the sliding window for each

      dimension of `input`. The dimension order is determined by the value of

      `data_format`, see below for details.

padding为当filter移动到图片边界时,图片大小不足够与filter进行卷积计算时采取的策略,取值为SAME VALID。

进行卷积计算当filter移动时,移动到边界处,若剩余部分不足以与filter进行卷积,取值为SAME则会进行补零处理,取值为VALID则会停止计算,因此不同padding方式,得到的最终特征图的大小有区别。

padding: A `string` from: `"SAME", "VALID"`.
      The type of padding algorithm to use.

如下图所示:当取值为SAME时,则如右图所示,进行补零操作,取值为VALID时,如左图所示,直接停止计算。

                                                    

对于VALID,输出的形状计算如下: 

 对于SAME,输出的形状计算如下:

 

4.激活函数Relu,实现非线性映射

将卷积层输出,加以偏移b,然后进行激活函数relu,实现非线性映射

_conv1 = tf.nn.relu(tf.nn.bias_add(_conv1, _b['bc1']))

bias_add函数:

def bias_add(value, bias, data_format=None, name=None):
Args:
    value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
      `int16`, `int8`, `complex64`, or `complex128`.
    bias: A 1-D `Tensor` with size matching the last dimension of `value`.
      Must be the same type as `value` unless `value` is a quantized type,
      in which case a different quantized type may be used.

relu函数实现非线性映射:

def relu(features, name=None):
  r"""Computes rectified linear: `max(features, 0)`.

  Args:
    features: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `uint8`, `int16`, `int8`, `int64`, `bfloat16`, `uint16`, `half`, `uint32`, `uint64`, `qint8`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `features`.

5.池化层max_pool, 实现图片压缩

池化层实现图片压缩,移动filter2*2,输入为28*28*1*64,输出为14*14*1*64,实现图片长度宽度减半

_pool1 = tf.nn.max_pool(_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

池化层实现图片数据的压缩,max_pool实现池化操作:

def max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):
  """Performs the max pooling on the input.

  Args:
    value: A 4-D `Tensor` of the format specified by `data_format`.
    ksize: A list or tuple of 4 ints. The size of the window for each dimension
      of the input tensor.
    strides: A list or tuple of 4 ints. The stride of the sliding window for
      each dimension of the input tensor.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
      See the "returns" section of `tf.nn.convolution` for details.
    data_format: A string. 'NHWC', 'NCHW' and 'NCHW_VECT_C' are supported.
    name: Optional name for the operation.

  Returns:
    A `Tensor` of format specified by `data_format`.
    The max pooled output tensor.

6.断开部分连接,防止过拟合dropout

_pool_dr1 = tf.nn.dropout(_pool1, _keepratio)

dropout函数如下:

@tf_export("nn.dropout")
def dropout(x, keep_prob, noise_shape=None, seed=None, name=None):  # pylint: disable=invalid-name
  """Computes dropout.

  With probability `keep_prob`, outputs the input element scaled up by
  `1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
  sum is unchanged.

  By default, each element is kept or dropped independently.  If `noise_shape`
  is specified, it must be
  [broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
  to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
  will make independent decisions.  For example, if `shape(x) = [k, l, m, n]`
  and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
  kept independently and each row and column will be kept or not kept together.

  Args:
    x: A floating point tensor.
    keep_prob: A scalar `Tensor` with the same type as x. The probability
      that each element is kept.
    noise_shape: A 1-D `Tensor` of type `int32`, representing the
      shape for randomly generated keep/drop flags.
    seed: A Python integer. Used to create random seeds. See
      `tf.set_random_seed`
      for behavior.
    name: A name for this operation (optional).

  Returns:
    A Tensor of the same shape of `x`.

7.卷积2,池化2层

        # CONV LAYER 2
        _conv2 = tf.nn.conv2d(_pool_dr1, _w['wc2'], strides=[1, 1, 1, 1], padding='SAME')
        #激活函数
        _conv2 = tf.nn.relu(tf.nn.bias_add(_conv2, _b['bc2']))
        #池化层2
        _pool2 = tf.nn.max_pool(_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        #dropout
        _pool_dr2 = tf.nn.dropout(_pool2, _keepratio)

进行卷积层2,filter:3*3*1*128,将输入14*14*1*64进行特征提取,转换为14*14*1*128,得到最终128张特征图

将卷积结果加以偏移,经过relu激活函数实现非线性映射

经过池化层2,实现图片数据压缩,将输入14*14*1*128,转换为7*7*1*128

将池化层2输出进行dropout操作,防止过拟合

8.全连接层

全连接层1设计共有1024个神经元,池化层2输出为7*7*128数据,因此全连接层权重W:[7*7*128,1024] b:[1024]

'wd1': tf.Variable(tf.random_normal([7*7*128, 1024], stddev=0.1)),
 'bd1': tf.Variable(tf.random_normal([1024], stddev=0.1)),
 # FULLY CONNECTED LAYER 1
        _fc1 = tf.nn.relu(tf.add(tf.matmul(_dense1, _w['wd1']), _b['bd1']))

matmul函数:  实现两个矩阵的相乘

def matmul(a,
           b,
           transpose_a=False,
           transpose_b=False,
           adjoint_a=False,
           adjoint_b=False,
           a_is_sparse=False,
           b_is_sparse=False,
           name=None):
  """Multiplies matrix `a` by matrix `b`, producing `a` * `b`.

全连接层2实现将全连接层1输出1024映射为10维度向量,来输出最终识别结果。因此W:[1024,10]  b[10]

'wd2': tf.Variable(tf.random_normal([1024, n_output], stddev=0.1))
'bd2': tf.Variable(tf.random_normal([n_output], stddev=0.1))
        # FULLY CONNECTED LAYER 2
        _out = tf.add(tf.matmul(_fc_dr1, _w['wd2']), _b['bd2'])

9.损失函数

_pred = conv_basic(x, weights, biases, keepratio)['out']
#损失函数
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))
optm = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
_corr = tf.equal(tf.argmax(_pred,1), tf.argmax(y,1)) 
accr = tf.reduce_mean(tf.cast(_corr, tf.float32)) 

conv_basic函数输出为依次CNN网络各层的参数,其中out为最终的输出10维类别向量,其中输入图片的真实标签为y

batch_xs, batch_ys = mnist.train.next_batch(batch_size)

这里使用函数softmax_cross_entropy_with_logits来计算预测值与真实值之间的损失:

def softmax_cross_entropy_with_logits(
    _sentinel=None,  # pylint: disable=invalid-name
    labels=None,
    logits=None,
    dim=-1,
    name=None):

使用时传参要指定参数名logits,labels以及相应取值

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=_pred, labels=y))

softmax_cross_entropy_with_logits主要用来计算交叉熵,交叉熵(Cross Entropy)主要用于度量两个概率分布间的差异性信息。

交叉熵的具体解释可以参考:https://blog.csdn.net/yhily2008/article/details/80262321

简要解释如下:

事件的概率:概率(Probability)描述的是某事件A出现的次数与所有事件出现的次数之比

事件的Odds: Odds指的是事件发生的概率事件不发生的概率之比

                                                                                  

logit表示的含义如下:logit变换

                                                                              

logit函数图像如下:

在P=0或P=1附近,Logit非常敏感(值域变化非常大)。通过Logit变换,P从0到1变化时,Logit是从- \infty+ \infty。Logit值域的不受限,可以让回归拟合变得容易。

                                                                          

10.训练以及测试过程

for epoch in range(training_epochs):
    avg_cost = 0.
    #计算每次迭代一共需要计算batch总数量
    #total_batch = int(mnist.train.num_examples / batch_size)
    total_batch = 1
    # Loop over all batches
    for i in range(total_batch):
        #取出某一个batch
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Fit training using batch data
        sess.run(optm, feed_dict={x: batch_xs, y: batch_ys, keepratio:0.7})
        # Compute average loss
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.}) / total_batch

    # Display logs per epoch step
    if epoch % display_step == 0: 
        print ("Epoch: %03d/%03d cost: %.9f" % (epoch, training_epochs, avg_cost))
        train_acc = sess.run(accr, feed_dict={x: batch_xs, y: batch_ys, keepratio:1.})
        print (" Training accuracy: %.3f" % (train_acc))
        test_acc = sess.run(accr, feed_dict={x: testimg, y: testlabel, keepratio:1.})
        print (" Test accuracy: %.3f" % (test_acc))

对于每一轮迭代次数:

      根据batch大小,计算本次迭代需要的计算次数,由于数据量太大,不能将所有数据直接输入到网络进行计算,应该batch逐一计算;

     对于每一个batch计算完成之后,计算损失值,进行反向传播迭代,然后计算训练集以及测试集的准确率。

程序运行过程观察:

取出第一张图片以及类别信息数据

                                                                                                   

类别标签数据:[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]] 可知该图片属于数字1。

训练过程中,各层输出的数据规格:

_input_r Tensor("Reshape:0", shape=(?, 28, 28, 1), dtype=float32)
_conv1 Tensor("Conv2D:0", shape=(?, 28, 28, 64), dtype=float32)
relu_conv1 Tensor("Relu:0", shape=(?, 28, 28, 64), dtype=float32)
pool1 Tensor("MaxPool:0", shape=(?, 14, 14, 64), dtype=float32)
_pool_dr1 Tensor("dropout/mul:0", shape=(?, 14, 14, 64), dtype=float32)
_conv2 Tensor("Conv2D_1:0", shape=(?, 14, 14, 128), dtype=float32)
relu_conv2 Tensor("Relu_1:0", shape=(?, 14, 14, 128), dtype=float32)
_pool2 Tensor("MaxPool_1:0", shape=(?, 7, 7, 128), dtype=float32)
_pool_dr2 Tensor("dropout_1/mul:0", shape=(?, 7, 7, 128), dtype=float32)
_dense1 Tensor("Reshape_1:0", shape=(?, 6272), dtype=float32)
_fc1 Tensor("Relu_2:0", shape=(?, 1024), dtype=float32)
_fc_dr1 Tensor("dropout_2/mul:0", shape=(?, 1024), dtype=float32)
_out Tensor("Add_1:0", shape=(?, 10), dtype=float32)

 


参考链接

唐宇迪深度学习学习资料

https://blog.csdn.net/yhily2008/article/details/80262321

  • 8
    点赞
  • 52
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值