Mnist手写数字识别进阶：模型重构与模型保存

最新推荐文章于 2024-07-17 09:00:20 发布

MYH永恒

最新推荐文章于 2024-07-17 09:00:20 发布

阅读量1k

点赞数 2

分类专栏：机器学习与深度学习文章标签： tensorflow python 深度学习神经网络

本文链接：https://blog.csdn.net/mu_yongheng/article/details/114944994

版权

机器学习与深度学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

经过前面一段时间的学习，今天终于算是把所有的关于Mnist手写数字识别的问题给学完了。从刚开始的单个神经元到后面的单层隐藏层的神经网络，以及多层隐藏层的神经网络，再加上今天要总结的模型的重构以及保存复用等等问题。

从刚开始对深度学习只是听说，到现在跟着慕课敲出了一些简单的机器学习的实现代码，在这几天学习的过程中，感觉也只是大致知道了这个流程，还不能达到自己实现的程度，毕竟也只是跟着慕课敲代码，所以可能效果也不是很明显；不过我觉得首先去学习一门新的知识，能迈出第一步就是很大的进步，后面慢慢学习，慢慢巩固，应该会逐渐了解越来越深刻！

嗯~~其实这段时间想的事情还挺多的，因为现在大三了，面临着很多的选择，工作或者读研，还有一些其他生活学习中的事情，反正还挺烦的。。。哎算了，还是先把这篇文章给写好吧，后面有时间可以把这段时间的迷茫给记录下来，也算是一段成长的经历吧。那接下来废话不多说，开始今天的正题。

1、模型重构

那么先来说一说模型的重构是什么吧，这里的重构其实就是对模型的重复部分做一个整合，方便书写以及避免代码的重复，在上篇文章中，我写了一个包括两层隐藏层的神经网络来对手写数字进行识别，构建模型部分的代码如下：

# 构建隐藏层
H1_NN = 256   # 第1隐藏层神经元数量
H2_NN = 64    # 第2隐藏层神经元数量

# 输入层 - 第1隐藏层参数和偏置项
w1 = tf.Variable(tf.truncated_normal([784, H1_NN], stddev=0.1))
b1 = tf.Variable(tf.zeros([H1_NN]))

# 第1隐藏层 - 第2隐藏层参数和偏执项
w2 = tf.Variable(tf.truncated_normal([H1_NN, H2_NN], stddev=0.1))
b2 = tf.Variable(tf.zeros([H2_NN]))

# 第2隐藏层 - 输出层参数和偏置项
w3 = tf.Variable(tf.truncated_normal([H2_NN, 10], stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))

# 计算第1隐藏层结果
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)

# 计算第2隐藏层结果
y2 = tf.nn.relu(tf.matmul(y1, w2) + b2)

# 计算输出结果
forward = tf.matmul(y2, w3) + b3
pred = tf.nn.softmax(forward)

从上述代码我们可以看到，在构建隐藏层的时候代码几乎是一样的，只是输入层和输出层的参数不一样，这里我们只定义了2层隐藏层可能体现不出来，那么如果我们所写的神经网络有10层、20层隐藏层，那么这样写就会显得代码很繁琐，重复，那么这个时候我们就会写能不能将这个过程归纳为一个函数，这个每次需要时我们只需要调用这个函数即可，这就是我们所说的模型的重构，我们将这个函数叫做全连接层函数，定义如下：

# 定义全连接层函数
def fcn_layer(inputs,               # 输入数据
              input_dim,            # 输入神经元数量
              output_dim,           # 输出神经元数量
              activation=None):     # 激活函数

    w = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1))
    b = tf.Variable(tf.zeros([output_dim]))

    xwb = tf.matmul(inputs, w) + b

    if activation is None:
        outputs = xwb
    else:
        outputs = activation(xwb)

    return outputs

后面我们在构建模型的时候就可以直接调用，会省去很多重复。如下所示：

# 构建隐藏层
H1_NN = 256   # 第1隐藏层神经元数量
H2_NN = 64    # 第2隐藏层神经元数量
H3_NN = 32    # 第3隐藏层神经元数量

# 输入层 - 第1隐藏层参数和偏置项（构建第1隐藏层）
h1 = fcn_layer(inputs=x, input_dim=784, output_dim=H1_NN, activation=tf.nn.relu)

# 第1隐藏层 - 第2隐藏层参数和偏执项（构建第2隐藏层）
h2 = fcn_layer(inputs=h1, input_dim=H1_NN, output_dim=H2_NN, activation=tf.nn.relu)

# 第2隐藏层 - 第3隐藏层参数和偏置项（构建第3隐藏层）
h3 = fcn_layer(inputs=h2, input_dim=H2_NN, output_dim=H3_NN, activation=tf.nn.relu)

# 第3隐藏层 - 输出层参数和偏置项（构建输出层）
forward = fcn_layer(inputs=h3, input_dim=H3_NN, output_dim=10, activation=None)

pred = tf.nn.softmax(forward)

2、模型保存

关于模型的保存，其实是一件非常实用也非常必要的一件事情。加入我们的模型比较繁琐，需要训练几个小时甚至几天的时间，如果中间出现断电或者其他事故，那么很可能需要从头来过，又或者我们已经训练好了一个模型，想要把这个训练好的模型保存方便后面的使用，那么这个时候就需要对模型进行保存了。

这里进行模型的保存其实就是对训练过程中的各个参数进行保存，我们模型训练的目的也就是把参数调整到最优，所有保存的模型其实就是一系列的参数（权值）。

同样，只需要在原来基础上加上一些代码即可。

（1）导入包并创建保存文件

import os     # 用于保存模型

# 保存模型
# 模型的存储粒度
save_step = 5

# 创建模型保存的文件的目录
ckpt_dir = "./ckpt_dir/"
if not os.path.exists(ckpt_dir):
    os.makedirs(ckpt_dir)

（2）创建Saver()对象

# 声明完所有变量，调用tf.train.saver
saver = tf.train.Saver()

（3）保存模型

根据模型存储粒度，在训练过程中保存模型。

for epoch in range(train_epochs):
    for batch in range(total_batch):
        xs, ys = mnist.train.next_batch(batch_size)  # 读取批次数据
        sess.run(optimizer, feed_dict={x: xs, y: ys})   # 执行批次训练

    # total_batch 个批次训练完后，使用验证数据计算准确率
    loss, acc = sess.run([loss_function, accuracy], feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
    # 打印训练过程中的详细信息
    if (epoch + 1) % display_step == 0:
        print("训练轮次：", epoch + 1, "损失值：", format(loss), "准确率：", format(acc))

    if (epoch+1) % save_step == 0:
        saver.save(sess, os.path.join(ckpt_dir, 'mnist_h256_model_{:06d}.ckpt'.format(epoch+1)))  # 存储模型
        print('mnist_h256_model_{:06d}.ckpt saved'.format(epoch+1))

saver.save(sess, os.path.join(ckpt_dir, 'mnist_h256_model.ckpt'))
print("Model saved!")

在运行之后，就会将最后5次训练的模型进行保存，如下所示。
在这里插入图片描述

3、完整代码（模型保存）

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import matplotlib.pyplot as plt
import numpy as np
import os     # 用于保存模型


# 读取数据文件
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


# 定义全连接层函数
def fcn_layer(inputs,               # 输入数据
              input_dim,            # 输入神经元数量
              output_dim,           # 输出神经元数量
              activation=None):     # 激活函数

    w = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1))
    b = tf.Variable(tf.zeros([output_dim]))

    xwb = tf.matmul(inputs, w) + b

    if activation is None:
        outputs = xwb
    else:
        outputs = activation(xwb)

    return outputs


# 保存模型
# 模型的存储粒度
save_step = 5

# 创建模型保存的文件的目录
ckpt_dir = "./ckpt_dir/"
if not os.path.exists(ckpt_dir):
    os.makedirs(ckpt_dir)


# 构建输入层
# 定义标签数据占位符
x = tf.placeholder(tf.float32, [None, 784], name="X")
y = tf.placeholder(tf.float32, [None, 10], name="Y")


# 构建隐藏层
H1_NN = 256   # 第1隐藏层神经元数量
H2_NN = 64    # 第2隐藏层神经元数量
H3_NN = 32    # 第3隐藏层神经元数量

# 输入层 - 第1隐藏层参数和偏置项（构建第1隐藏层）
h1 = fcn_layer(inputs=x, input_dim=784, output_dim=H1_NN, activation=tf.nn.relu)

# 第1隐藏层 - 第2隐藏层参数和偏执项（构建第2隐藏层）
h2 = fcn_layer(inputs=h1, input_dim=H1_NN, output_dim=H2_NN, activation=tf.nn.relu)

# 第2隐藏层 - 第3隐藏层参数和偏置项（构建第3隐藏层）
h3 = fcn_layer(inputs=h2, input_dim=H2_NN, output_dim=H3_NN, activation=tf.nn.relu)

# 第3隐藏层 - 输出层参数和偏置项（构建输出层）
forward = fcn_layer(inputs=h3, input_dim=H3_NN, output_dim=10, activation=None)

pred = tf.nn.softmax(forward)


# 定义训练参数
train_epochs = 40  # 训练的轮数
batch_size = 50  # 单次训练样本数
total_batch = int(mnist.train.num_examples/batch_size)
display_step = 1  # 显示粒度
learning_rate = 0.01  # 学习率

# 定义损失函数
# loss_function = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
loss_function = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=forward, labels=y))  # 结合Softmax的交叉熵损失函数定义方法

# 定义优化器
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function)


# 定义准确率
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# 准确率，将布尔值转化为浮点数，并计算平均值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 声明完所有变量，调用tf.train.saver
saver = tf.train.Saver()

# 模型训练
# 记录训练开始的时间
from time import time
startTime = time()

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for epoch in range(train_epochs):
    for batch in range(total_batch):
        xs, ys = mnist.train.next_batch(batch_size)  # 读取批次数据
        sess.run(optimizer, feed_dict={x: xs, y: ys})   # 执行批次训练

    # total_batch 个批次训练完后，使用验证数据计算准确率
    loss, acc = sess.run([loss_function, accuracy], feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
    # 打印训练过程中的详细信息
    if (epoch + 1) % display_step == 0:
        print("训练轮次：", epoch + 1, "损失值：", format(loss), "准确率：", format(acc))

    if (epoch+1) % save_step == 0:
        saver.save(sess, os.path.join(ckpt_dir, 'mnist_h256_model_{:06d}.ckpt'.format(epoch+1)))  # 存储模型
        print('mnist_h256_model_{:06d}.ckpt saved'.format(epoch+1))

saver.save(sess, os.path.join(ckpt_dir, 'mnist_h256_model.ckpt'))
print("Model saved!")


# 显示运行总时间
duration = time() - startTime
print("本次训练所花的总时间为：", duration)

运行结果：
在这里插入图片描述

4、模型复用

在保存好模型之后，接下来就可以进行直接用了，不需要再训练，可以节省很多时间。

前面读取数据和模型的构建都没什么改变，只是将后面的模型的训练变成了直接调用训练好的模型。主要代码如下：

# -------------------------------- 还原模型 ------------------------------------
# 1、必须指定为模型文件的存放目录
ckpt_dir = "./ckpt_dir"

# 2、读取模型
saver = tf.train.Saver()    # 创建saver

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

ckpt = tf.train.get_checkpoint_state(ckpt_dir)

if ckpt and ckpt.model_checkpoint_path:
    saver.restore(sess, ckpt.model_checkpoint_path)   # 从已经保存的模型中读取参数
    print("Restore model from " + ckpt.model_checkpoint_path)

print("Accuracy: ", accuracy.eval(session=sess, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

5、完整代码（模型复用）

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data

# 读取数据文件
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


# 定义全连接层函数
def fcn_layer(inputs,               # 输入数据
              input_dim,            # 输入神经元数量
              output_dim,           # 输出神经元数量
              activation=None):     # 激活函数

    w = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1))
    b = tf.Variable(tf.zeros([output_dim]))

    xwb = tf.matmul(inputs, w) + b

    if activation is None:
        outputs = xwb
    else:
        outputs = activation(xwb)

    return outputs


# 构建输入层
# 定义标签数据占位符
x = tf.placeholder(tf.float32, [None, 784], name="X")
y = tf.placeholder(tf.float32, [None, 10], name="Y")

# 构建隐藏层
H1_NN = 256   # 第1隐藏层神经元数量
H2_NN = 64    # 第2隐藏层神经元数量
H3_NN = 32    # 第3隐藏层神经元数量

# 输入层 - 第1隐藏层参数和偏置项（构建第1隐藏层）
h1 = fcn_layer(inputs=x, input_dim=784, output_dim=H1_NN, activation=tf.nn.relu)

# 第1隐藏层 - 第2隐藏层参数和偏执项（构建第2隐藏层）
h2 = fcn_layer(inputs=h1, input_dim=H1_NN, output_dim=H2_NN, activation=tf.nn.relu)

# 第2隐藏层 - 第3隐藏层参数和偏置项（构建第3隐藏层）
h3 = fcn_layer(inputs=h2, input_dim=H2_NN, output_dim=H3_NN, activation=tf.nn.relu)

# 第3隐藏层 - 输出层参数和偏置项（构建输出层）
forward = fcn_layer(inputs=h3, input_dim=H3_NN, output_dim=10, activation=None)

pred = tf.nn.softmax(forward)


# 定义准确率
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# 准确率，将布尔值转化为浮点数，并计算平均值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# -------------------------------- 还原模型 ------------------------------------
# 1、必须指定为模型文件的存放目录
ckpt_dir = "./ckpt_dir"

# 2、读取模型
saver = tf.train.Saver()    # 创建saver

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

ckpt = tf.train.get_checkpoint_state(ckpt_dir)

if ckpt and ckpt.model_checkpoint_path:
    saver.restore(sess, ckpt.model_checkpoint_path)   # 从已经保存的模型中读取参数
    print("Restore model from " + ckpt.model_checkpoint_path)

print("Accuracy: ", accuracy.eval(session=sess, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

# 应用模型
prediction_result = sess.run(tf.argmax(pred, 1), feed_dict={x: mnist.test.images})
print(prediction_result[0:10])

最终的结果如下，说明我们的模型保存与调用成功：
在这里插入图片描述

MYH永恒

关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
0
评论
Mnist手写数字识别进阶：模型重构与模型保存

经过前面一段时间的学习，今天终于算是把所有的关于Mnist手写数字识别的问题给学完了。从刚开始的单个神经元到后面的单层隐藏层的神经网络，以及多层隐藏层的神经网络，再加上今天要总结的模型的重构以及保存复用等等问题。从刚开始对深度学习只是听说，到现在跟着慕课敲出了一些简单的机器学习的实现代码，在这几天学习的过程中，感觉也只是大致知道了这个流程，还不能达到自己实现的程度，毕竟也只是跟着慕课敲代码，所以可能效果也不是很明显；不过我觉得首先去学习一门新的知识，能迈出第一步就是很大的进步，后面慢慢学习，慢慢巩固，应该
复制链接

扫一扫