莫烦RNN_classification例子讲解

吉均

已于 2023-06-27 16:07:01 修改

阅读量473

点赞数 1

分类专栏： AI-原理文章标签： tensorflow 神经网络深度学习 python

于 2020-10-20 19:21:30 首次发布

本文链接：https://blog.csdn.net/m0_51246196/article/details/109170959

版权

AI-原理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

几点说明

1.学习了一段时间，利用tensorflow构建RNN，LSTM，结合莫烦的视频，源代码，总结下tensorflow用法，记录一下。
是一个利用rnn来处理mnist数据集的例子。
2.关于tensorflow版本问题，莫烦给出的代码为1.1版，而博主的tensorflow为2.3版，所以需要在开头写入相关代码，进行转版本
3.莫烦的RNN代码示例的数据集主要采用tensorflow自带的mnist数据集。

源码

先给出莫烦大佬的关于RNN_classification的代码：

Dependencies:
tensorflow: 1.1.0
matplotlib
numpy
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt

tf.set_random_seed(1)
np.random.seed(1)

# Hyper Parameters
BATCH_SIZE = 64
TIME_STEP = 28          # rnn time step / image height
INPUT_SIZE = 28         # rnn input size / image width
LR = 0.01               # learning rate

# data
mnist = input_data.read_data_sets('./mnist', one_hot=True)              # they has been normalized to range (0,1)
test_x = mnist.test.images[:2000]
test_y = mnist.test.labels[:2000]

# plot one example
print(mnist.train.images.shape)     # (55000, 28 * 28)
print(mnist.train.labels.shape)   # (55000, 10)
plt.imshow(mnist.train.images[0].reshape((28, 28)), cmap='gray')
plt.title('%i' % np.argmax(mnist.train.labels[0]))
plt.show()

# tensorflow placeholders
tf_x = tf.placeholder(tf.float32, [None, TIME_STEP * INPUT_SIZE])       # shape(batch, 784)
image = tf.reshape(tf_x, [-1, TIME_STEP, INPUT_SIZE])                   # (batch, height, width, channel)
tf_y = tf.placeholder(tf.int32, [None, 10])                             # input y

# RNN
rnn_cell = tf.nn.rnn_cell.LSTMCell(num_units=64)
outputs, (h_c, h_n) = tf.nn.dynamic_rnn(
    rnn_cell,                   # cell you have chosen
    image,                      # input
    initial_state=None,         # the initial hidden state
    dtype=tf.float32,           # must given if set initial_state = None
    time_major=False,           # False: (batch, time step, input); True: (time step, batch, input)
)
output = tf.layers.dense(outputs[:, -1, :], 10)              # output based on the last output step

loss = tf.losses.softmax_cross_entropy(onehot_labels=tf_y, logits=output)           # compute cost
train_op = tf.train.AdamOptimizer(LR).minimize(loss)

accuracy = tf.metrics.accuracy(          # return (acc, update_op), and create 2 local variables
    labels=tf.argmax(tf_y, axis=1), predictions=tf.argmax(output, axis=1),)[1]

sess = tf.Session()
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) # the local var is for accuracy_op
sess.run(init_op)     # initialize var in graph

for step in range(1200):    # training
    b_x, b_y = mnist.train.next_batch(BATCH_SIZE)
    _, loss_ = sess.run([train_op, loss], {tf_x: b_x, tf_y: b_y})
    if step % 50 == 0:      # testing
        accuracy_ = sess.run(accuracy, {tf_x: test_x, tf_y: test_y})
        print('train loss: %.4f' % loss_, '| test accuracy: %.2f' % accuracy_)

# print 10 predictions from test data
test_output = sess.run(output, {tf_x: test_x[:10]})
pred_y = np.argmax(test_output, 1)
print(pred_y, 'prediction number')
print(np.argmax(test_y[:10], 1), 'real number')

结果展示

在这里插入图片描述

具体步骤以及讲解

本人环境配置如下：
Dependencies:
python3.8
tensorflow: 2.3.1

1.导入相应的库（随机种子确保可重复）

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()  #使用2.0中的v1兼容包来沿用1.x代码
from tensorflow.examples.tutorials.mnist import input_data  #mnist 0123456789的数据
#在tensorflow中直接封装好的是mnist手写字符的数据集类，方便直接用input_data.*中的方法调用其读取数据，读取数据标签

import numpy as np   #导入numpy库 是处理数值计算最为基础的类库
import matplotlib.pyplot as plt #导入可视化工具matplotlib.pyplot 风格与MATLAB类似，可以方便的绘制图像、展示结果

#做图片的深度学习时，在程序中random.seed()、numpy.random.seed()、tf.set_random_seed()是三种随机种子。
#在深度学习中，先确定好随机种子，以后每次随机的结果相同。
#在每次执行代码时，使每次切分后的训练集、验证集输入结果相同，便于验证学习参数的有效性和查找问题。
tf.set_random_seed(1) #tensorflow模组内随机函数的随机操作  整形
np.random.seed(1) #numpy模组内随机函数的随机操作   整形（0~2**32-1）、一维整形数组。

2.根据模型定义超参数

模型超参数是模型外部的配置，其值不能从数据估计得到。

# Hyper Parameters
# 深度学习的超参数 根据经验进行设定，影响到权重和偏置的大小，比如迭代次数、隐藏层的层数、每层神经元的个数、学习速率等
BATCH_SIZE = 64  #一次训练所选取的样本数，每个BATCH训练的图片数量采用64
TIME_STEP = 28#每个数据包含的输入序列书，即图片的行数28像素  / image height 或图片高度28像素
INPUT_SIZE = 28# rnn input size / image width原始的训练数据的大小，即图片的列数28像素
LR = 0.01               # learning rate 学习率
# 在确定初始学习率的时候，从一个很小的值（例如 1e-7）开始，然后每一步指数增大学习率（例如扩大1.05 倍）进行训练。
# 训练几百步应该能观察到损失函数随训练步数呈对勾形，选择损失下降最快那一段的学习率即可。

3.数据预处理（mnist自带的，对数据进行重塑）

# data 
mnist = input_data.read_data_sets('./mnist', one_hot=True) #从mnist这个模块中引入 input_data 这个类
#归一化 they has been normalized to range (0,1)
test_x = mnist.test.images[:2000] #将MNIST.TEST前2000个数据设置为测试数据集
test_y = mnist.test.labels[:2000]

# plot one example 打印一张试试看
#MNIST数据集中的图片是28X28 Pixel，所以，每一幅图就是1行784（28X28）列的数据，括号中的每一个值代表一个像素。
#打印出来的是训练集的图片信息，是一个55000行、784列的矩阵。即训练集里有55000张图片。
print(mnist.train.images.shape)     # (55000, 28 * 28) 训练数据的大小
print(mnist.train.labels.shape)   # (55000, 10) 10个标签
plt.imshow(mnist.train.images[0].reshape((28, 28)), cmap='gray')   #绘图展示结果
plt.title('%i' % np.argmax(mnist.train.labels[0]))
plt.show()#示例手写数字图例 7

4.tensorflow占位符（通过需要输出的值进行定义）

#定义输入层
# tensorflow placeholders
# 定义  占位符tf_x   image    tf_y，可以将placeholder理解为一种形参，需要用户传递常数值。
tf_x = tf.placeholder(tf.float32, [None, TIME_STEP * INPUT_SIZE])       # shape(batch, 784)
#dtype：数据类型，必填，默认为value的数据类型，传入参数为tensorflow下的枚举值（float32，float64.......）
#shape：数据形状，选填，不填则随传入数据的形状自行变动，可以在多次调用中传入不同形状的数据
#Batch大小是一个超参数，用于定义在更新内部模型参数之前要处理的样本数。将批处理视为循环迭代一个或多个样本并进行预测
image = tf.reshape(tf_x, [-1, TIME_STEP, INPUT_SIZE])                   # 在处理图像数据的时候总会遇到输入图像的维数不符合的情况，此时tensorflow中reshape()就很好的解决了这个问题。(batch, height 28, width 28, channel)
#batch_size是输入的这批数据的数量。
tf_y = tf.placeholder(tf.int32, [None, 10])                             # input y

5.定义网络结构（rnn，输出，训练、loss、以及准确率）

rnn_cell:cnn神经元个数

outputs, (h_c, h_n):h_c 是细胞状态，存着过去的信息 h_n 是当前细胞的输出，本来是要传给下一个细胞的 h_n分线剧情的结果就是最后一个output.

output = tf.layers.dense(outputs[:, -1, :], 10)

全连接层函数tf.layers.dense( input, units=k )会在内部自动生成一个权矩阵kernel和偏移项bias，各变量具体尺寸如下：对于尺寸为**[m, n]的二维张量input， tf.layers.dense()会生成：尺寸为[n, k]的权矩阵kernel**，和尺寸为**[m, k]的偏移项bias**。内部的计算过程为y = input * kernel + bias，输出值y的维度为**[m, k]**。

loss=tf.losses.softmax_cross_entropy(onehot_labels=tf_y, logits=output)：损失函数，onehot_labels,#one_hot编码的label, shape为[batch_size, num_classes]，logits#one_hot编码的label, shape为[batch_size, num_classes]

train_op = tf.train.AdamOptimizer(LR).minimize(loss)：

accuracy = tf.metrics.accuracy：准确率：计算predictions与labels匹配的频率。tf.metrics.accuracy返回两个值，accuracy为到上一个batch为止的准确度，update_op为更新本批次后的准确度。

# RNN      lstm长短时记忆网络
#隐藏层
rnn_cell = tf.nn.rnn_cell.LSTMCell(num_units=64) #相当于一次创建一个带隐藏层的LSTM单元，隐藏节点64个
#num_units: int型, LSTM网络单元的个数，即隐藏层的节点数。
#输出层 输出outputs形状为(64,28,64)(图片个数,步数,隐藏单元个数),image的形状为(64,28,28)(图片个数，步数，每个输入的个数)
outputs, (h_c, h_n) = tf.nn.dynamic_rnn(           #神经元的输出
    # dynamic_rnn运算两个结果tf.nn.dynamic_rnn的返回值有两个：outputs和state。  state分两个 主线和分线剧情
    # 为了描述输出的形状，先介绍几个变量，
    rnn_cell,                   # cell you have chosen   隐藏层有64个节点数。
    image,                      # input     图像大小 # (batch, height 28, width 28, channel)
    initial_state=None,         # the initial hidden state   初始状态
    dtype=tf.float32,           # must given if set initial_state = None  在内存中占分别32个bits
    time_major=False,           # False: (batch, time step, input); True: (time step, batch, input)
    #参数位置  time steps时间点是否位于tf_x中第一个维度  54行里是次要的维度 所以false
)
# 最后加一层全连接层,取每个outputs的最后一行数据，用[:,-1,:],此处也可以换成h_n，效果一样
output = tf.layers.dense(outputs[:, -1, :], 10)# outputs里最后一个输出output output based on the last output step
#h_c 是细胞状态，存着过去的信息  h_n 是当前细胞的输出，本来是要传给下一个细胞的   h_n分线剧情的结果就是最后一个output
#outputs[-1]就是最后一个的=state[1]= h_n     输出最后一个output  结果
#激励函数是下面的softmax，是tf里自带的 softmax，就是如果判断输入属于某一个类的概率大于属于其他类的概率，那么这个类对应的值就逼近于1，其他类的值就逼近于0.
#该算法主要应用就是多分类，而且是互斥的，即只能属于其中的一个类。


loss = tf.losses.softmax_cross_entropy(onehot_labels=tf_y, logits=output)  # loss损失函数  compute cost
#logits是神经网络的输出, 注意要求是softmax处理之前的logits,
#因为tf.losses.softmax_cross_entropy()方法内部会对logits做softmax处理

train_op = tf.train.AdamOptimizer(LR).minimize(loss) #LR学习率=0.01
#tf.train.AdamOptimizer()函数是Adam优化算法：是一个寻找全局最优点的优化算法，引入了二次方梯度校正。
# 实际上运行tf.train.AdamOptimizer(),除了利用反向传播算法对权重和偏置项进行修正外，也在运行中不断修正学习率。
# 根据其损失量学习自适应，损失量大则学习率大，进行修正的角度越大，损失量小，修正的幅度也小，学习率就小，但是不会超过自己所设定的学习率。

accuracy = tf.metrics.accuracy(          # return (acc, update_op), and create 2 local variables
    # tf.metrics.accuracy返回两个值，accuracy为到上一个batch为止的准确度，update_op为更新本批次后的准确度。
    labels=tf.argmax(tf_y, axis=1), predictions=tf.argmax(output, axis=1),)[1]
#显示10个labels   tf.argmax就是返回tf_y输入的y 和 output预测的y 最大的那个数值所在的下标。
#创建两个局部变量labels  和predictions

tf.nn.dynamic_rnn()：

函数定义：
tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

cell：LSTM、GRU等的记忆单元。cell参数代表一个LSTM或GRU的记忆单元，也就是一个cell。例如，cell = tf.nn.rnn_cell.LSTMCell((num_units)，其中，num_units表示rnn cell中神经元个数，也就是下文的cell.output_size。返回一个LSTM或GRU cell，作为参数传入。

inputs：输入的训练或测试数据，一般格式为[batch_size, max_time, embed_size]，其中batch_size是输入的这批数据的数量，max_time就是这批数据中序列的最长长度，embed_size表示嵌入的词向量的维度。

sequence_length：是一个list，假设你输入了三句话，且三句话的长度分别是5,10,25,那么sequence_length=[5,10,25]。

time_major：决定了输出tensor的格式，如果为True, 张量的形状必须为 [max_time, batch_size,cell.output_size]。如果为False, tensor的形状必须为[batch_size, max_time, cell.output_size]，cell.output_size表示rnn cell中神经元个数。

返回值：元组（outputs, states）

outputs：outputs很容易理解，就是每个cell会有一个输出

states：states表示最终的状态，也就是序列中最后一个cell输出的状态。一般情况下states的形状为 [batch_size, cell.output_size ]，但当输入的cell为BasicLSTMCell时，state的形状为[2，batch_size, cell.output_size ]，其中2也对应着LSTM中的cell state和hidden state。

6.初始化

#初始化
sess = tf.Session() #Session 是 Tensorflow 为了控制,和输出文件的执行的语句.
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
# the local var is for accuracy_op
sess.run(init_op)     # initialize var in graph 初始化图表中的变量

7.运行训练，展示

1.确定训练次数

2.sess.run

#运行 session.run() 可以获得你要得知的运算结果, 或者是你所要运算的部分.accuracy_op
for step in range(1200):    # 训练网络 training
    b_x, b_y = mnist.train.next_batch(BATCH_SIZE)#在训练mnist数据集的过程中，利用next_batch功能来不断地获取新的数据集进行训练，BATCH_SIZE代表返回BATCH_SIZE个训练数据集 b_x和对应的标签b_y
    _, loss_ = sess.run([train_op, loss], {tf_x: b_x, tf_y: b_y})#执行train_op，和loss这两个函数，返回值训练数据集 b_x和对应的标签b_y。传入到tf_x,和tf_y中。
    if step % 50 == 0:      # 测试 testing，每50次展现一次
        accuracy_ = sess.run(accuracy, {tf_x: test_x, tf_y: test_y})   #准确度=输入数据：测试数据
       
        print('train loss: %.4f' % loss_, '| test accuracy: %.2f' % accuracy_)


#从测试数据里打印10个预测数据 print 10 predictions from test data
test_output = sess.run(output, {tf_x: test_x[:10]})
pred_y = np.argmax(test_output, 1)
print(pred_y, 'prediction number')
print(np.argmax(test_y[:10], 1), 'real number')