深度学习入门——利用卷积神经网络实现MNIST手写数字识别

最新推荐文章于 2024-07-22 13:42:36 发布

YukinoSiro

最新推荐文章于 2024-07-22 13:42:36 发布

阅读量1.5w

点赞数 18

分类专栏： ●人工智能(AI) 专栏：深度学习入门（基于TensorFlow）文章标签：深度学习 MNIST 卷积神经网络手写数字识别

本文链接：https://blog.csdn.net/yukinoai/article/details/84421560

版权

●人工智能(AI) 同时被 2 个专栏收录

11 篇文章 14 订阅

订阅专栏

专栏：深度学习入门（基于TensorFlow）

7 篇文章 11 订阅

订阅专栏

很不错的深度学习入门视频：https://www.bilibili.com/video/av15532370

MNIST（Modified National Institute of Standards and Technology）数据库是一个大型手写数字数据库，通常用于训练各种图像处理系统。该数据库还广泛用于机器学习领域的培训和测试。它是通过重新打乱来自NIST原始数据集的样本而创建的。创作者认为，因为NIST的训练数据集来自美国人口普查局的员工，而测试数据集来自美国高中学生，这不是非常适合于机器学习实验。此外，来自NIST的黑白图像被归一化以适合28x28像素的边界框和抗锯齿，并引入了灰度级。

MNIST数据库包含60,000个训练图像和10,000个测试图像。训练集的一半和测试集的一半来自NIST的训练数据集，而训练集的另一半和测试集的另一半来自NIST的测试数据集。有许多关于试图实现最低错误率的科学论文 ; 一篇论文使用卷积神经网络的分层系统，设法在MNIST数据库上获得0.23％的错误率。

同时，他也被Kaggle设立为入门级机器学习竞赛Digit Recognizer(Learn computer vision fundamentals with the famous MNIST data),可谓是计算机视觉中的”hello world“数据集。

加载MNIST数据

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

使用单层神经网络

占位符

我们通过为输入图像和目标输出类创建节点来开始构建计算图。

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

输入图像x将由浮点数的二维张量组成。这里我们给它赋予一个[None，784]的大小，其中784是一个图像的28×28个像素，None表示第一维度可以是任意大小。目标输出y_也将由二维张量组成，其中每一行是一个唯一的10维向量，指示对应的MNIST图像是哪个数字（0到9）。

变量

我们现在定义权重W，并为我们的模型赋予偏差b。

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

我们将调用中的每个参数的初始值传递给tf.Variable。在这种情况下，我们将W和b初始化为全0的张量。 W是784x10矩阵（因为我们有784个输入特征和10个输出），b是10维向量（因为我们有10个数字）。

在会话中使用变量之前，必须使用该会话进行初始化变量。这一步将已经指定的初始值（在这里，张量全零），分配给每个变量。这可以一次完成所有变量的赋值：

sess.run(tf.global_variables_initializer())

理论输出为

y = tf.matmul(x,W) + b

在以后的训练中要尽量减少损失函数的值。定义损失函数是目标和应用于模型预测的softmax激活函数之间的交叉熵。

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

请注意，tf.nn.softmax_cross_entropy_with_logits在应用softmax激活函数的交叉熵代价函数，并在所有类中进行求和，tf.reduce_mean表示取这些和的平均值。

训练模型

现在我们已经定义了模型和训练损失函数，之后使用梯度下降法，设置步长为0.5来最小化交叉熵。

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

运行train_step时返回的参数会用于下一步优化。因此，训练模型可以通过重复运行train_step来完成。

for _ in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

每次训练迭代中加载100个训练样例。然后我们运行train_step操作，使用feed_dict将训练样例中的占位符张量x和y_替换。

准确度

可以使用tf.equal来检查我们的预测是否符合事实。

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

之后将布尔值转换为浮点数，然后取平均值。例如，[True，False，True，True]将变成[1,0,1,1]，这将变为0.75。

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

最后这种方法准确率大约为92％。

print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

全部代码

#出自《21个项目玩转深度学习》
# coding:utf-8
# 导入tensorflow。
# 这句import tensorflow as tf是导入TensorFlow约定俗成的做法，请大家记住。
import tensorflow as tf
# 导入MNIST教学的模块
from tensorflow.examples.tutorials.mnist import input_data
# 与之前一样，读入MNIST数据
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# 创建x，x是一个占位符（placeholder），代表待识别的图片
x = tf.placeholder(tf.float32, [None, 784])

# W是Softmax模型的参数，将一个784维的输入转换为一个10维的输出
# 在TensorFlow中，变量的参数用tf.Variable表示
W = tf.Variable(tf.zeros([784, 10]))
# b是又一个Softmax模型的参数，我们一般叫做“偏置项”（bias）。
b = tf.Variable(tf.zeros([10]))

# y=softmax(Wx + b)，y表示模型的输出
y = tf.nn.softmax(tf.matmul(x, W) + b)

# y_是实际的图像标签，同样以占位符表示。
y_ = tf.placeholder(tf.float32, [None, 10])

# 至此，我们得到了两个重要的Tensor：y和y_。
# y是模型的输出，y_是实际的图像标签，不要忘了y_是独热表示的
# 下面我们就会根据y和y_构造损失

# 根据y, y_构造交叉熵损失
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y)))

# 有了损失，我们就可以用随机梯度下降针对模型的参数（W和b）进行优化
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

# 创建一个Session。只有在Session中才能运行优化步骤train_step。
sess = tf.InteractiveSession()
# 运行之前必须要初始化所有变量，分配内存。
tf.global_variables_initializer().run()
print('start training...')

# 进行1000步梯度下降
for _ in range(1000):
    # 在mnist.train中取100个训练数据
    # batch_xs是形状为(100, 784)的图像数据，batch_ys是形如(100, 10)的实际标签
    # batch_xs, batch_ys对应着两个占位符x和y_
    batch_xs, batch_ys = mnist.train.next_batch(100)
    # 在Session中运行train_step，运行时要传入占位符的值
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# 正确的预测结果
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
# 计算预测准确率，它们都是Tensor
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# 在Session中运行Tensor可以得到Tensor的值
# 这里是获取最终模型的正确率
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))  # 0.9185

构建一个多层卷积神经网络

在MNIST上获得92％的准确性并不是很高。下面使用卷积神经网络。这将达到约99.2％的准确性。

权重初始化

要创建这个模型需要创建很多权重和偏差。一般应该用少量的噪声初始化权重，以防止对称性破坏，并防止0梯度。由于使用的是ReLU神经元，为了避免“死神经元”，初始化这些神经元是一个很好的做法。

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

卷积和池化

卷积设置步幅大小为1，并在周围填充零，以便输出与输入大小相同。池化为2x2的max pooling。为代码更清晰将这些操作抽象为函数。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

第一卷积层

卷积将为每个5x5 patch计算32个特征。它的权重张量是[5,5,1,32]的形状。前两个维度是patch大小，下一个是输入通道的数量，最后一个是输出通道的数量。每个输出通道还会有带有一个偏差向量的分量。

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

为了应用该层首先将x重塑为4维张量，第二维和第三维对应于图像的宽度和高度，并且最后一个维度对应于色彩通道的数量。

x_image = tf.reshape(x, [-1, 28, 28, 1])

然后将x_image与权重张量进行卷积，加上偏差，应用ReLU函数，最后使用max pooling。 max_pool_2x2方法将图像大小减小到14x14。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

第二卷积层

第二层将为每个5x5 patch有64个特征。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

密集连接层

现在图像尺寸已经减小到7x7，下面添加一个1024个神经元的全连接图层，以允许在整个图像上进行处理。我们将pooling层中的张量重塑为一批向量，乘以权重矩阵，添加一个偏差，并应用一个ReLU。

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

为了减少过拟合，在输出层之前应用dropout。创建一个占位符，用于在dropout期间保持神经元输出的概率。这可以让在训练过程中关闭dropout，并在测试过程中将其关闭。 TensorFlow的tf.nn.dropout可以自动处理缩放神经元输出和掩蔽它们，所以dropout只是在没有任何附加缩放的情况下工作。

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

输出层

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

训练和评估模型

不同之处在于：

我们将用更复杂的ADAM优化器替代最陡的梯度下降优化器。
我们将在feed_dict中包含附加参数keep_prob来控制丢失率。
我们将在训练过程中每100次迭代添加一次记录。

使用tf.Session而不是tf.InteractiveSession可以更好地分离了创建图（模型说明）的过程和评估图（模型拟合）的过程。它通常使更清晰的代码。

tf.Session是在一个块内创建的，所以一旦块退出，它就会被自动销毁。

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

运行此代码后的最终测试集精度应该约为99.2％。

卷积神经网络处理的全部代码

# coding: utf-8
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')


if __name__ == '__main__':
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    x_image = tf.reshape(x, [-1, 28, 28, 1])

    # 第一层卷积层
    W_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
    h_pool1 = max_pool_2x2(h_conv1)

    # 第二层卷积层
    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)

    # 全连接层，输出为1024维的向量
    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
    # 使用Dropout，keep_prob是一个占位符，训练时为0.5，测试时为1
    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    # 把1024维的向量转换成10维，对应10种数字
    W_fc2 = weight_variable([1024, 10])
    b_fc2 = bias_variable([10])
    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2


    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    # 同样定义train_step
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

    # 定义测试的准确率
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # 创建Session和变量初始化
    sess = tf.InteractiveSession()
    sess.run(tf.global_variables_initializer())

    # 训练20000步
    for i in range(20000):
        batch = mnist.train.next_batch(50)
        # 每100步报告一次在验证集上的准确度
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print("step %d, training accuracy %g" % (i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    # 训练结束后报告在测试集上的准确度
    print("test accuracy %g" % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

参考

https://blog.csdn.net/qq_17550379/article/details/78703026

21个项目玩转深度学习