使用多层RNN-LSTM网络实现MNIST数据集分类及常见坑汇总

最新推荐文章于 2024-08-05 11:15:19 发布

little_fat_sheep

最新推荐文章于 2024-08-05 11:15:19 发布

阅读量3k

点赞数 3

分类专栏：智能算法文章标签： RNN LSTM tensorflow 深度学习

本文链接：https://blog.csdn.net/m0_37602827/article/details/90232240

版权

智能算法专栏收录该内容

18 篇文章 11 订阅

订阅专栏

1 前言

循环神经网络（Recurrent Neural Network, RNN）又称递归神经网络，出现于20世纪80年代，其雏形见于美国物理学家J.J.Hopfield于1982年提出的可作联想存储器的互联网络——Hopfield神经网络模型。RNN是一类专门用于处理和预测序列数据的神经网络，其网络结构如下：

Sepp Hochreiter教授和Jurgen Schmidhuber教授于1997年提出了长短时记忆网络（Long Short-Term Memory，LSTM），解决了长期依赖问题，主要应用于文本分类、语音识别、机器翻译、自动对话、图片生成标题等问题中。LSTM网络结构如下所示：

本博客仍采用MNIST数据集做实验，关于MNIST数据集的说明及其配置，见使用TensorFlow实现MNIST数据集分类

RNN采用一行一行地读取图片数据，即每个时刻读取图片一行的28个像素，一共有28个时间序列（28行），最后一个时刻输出汇总了前面所有时刻的信息，因此只用最后一个时刻的输出来判断图片类别。数据转换如下：

2 单层RNN-LSTM网络

数据流如下：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#载入数据集
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

#lstm细胞输入向量维度，即每个时刻输入一行，共28个像素
input_size = 28
#时序持续长度，28个时刻，即每做一次预测，需要输入28行
time_size = 28
#每个隐藏层节点数
hidden_size = 100
#10个分类
class_num = 10
#每批次50个样本
batch_size = 50
#计算一共有多少个训练批次
batch_num = mnist.train.num_examples // batch_size

x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])

weights=tf.Variable(tf.truncated_normal([hidden_size,class_num],stddev=0.1))
biases=tf.Variable(tf.constant(0.1,shape=[class_num,]))

#定义RNN-LSTM网络
def RNN_LSTM(x,weights,biases):
    #[batch_size,time_size*input_size]==>[batch_size,time_size,input_size]
    inputs=tf.reshape(x,[-1,time_size,input_size])
    #定义LSTM基本单元lstm_cell
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size,forget_bias=1.0,state_is_tuple=True)
    outputs,state = tf.nn.dynamic_rnn(lstm_cell,inputs,dtype=tf.float32,time_major=False)
    #输出隐层变换
    results = tf.matmul(outputs[:,-1,:],weights)+biases
    return results
    
y_=RNN_LSTM(x,weights,biases)
#交叉熵损失函数
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_,labels=y))
#使用AdamOptimizer优化器进行优化
train = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

#初始化
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    test_feed={x:mnist.test.images,y:mnist.test.labels}
    for epoch in range(6):
        #训练
        for batch in range(batch_num):
            batch_x,batch_y=mnist.train.next_batch(batch_size)
            sess.run(train,feed_dict={x:batch_x,y:batch_y})
        #预测
        acc=sess.run(accuracy,feed_dict=test_feed)
        print("Iter "+str(epoch)+", Testing Accuracy =",acc)

3 多层RNN-LSTM网络

数据流如下：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#载入数据集
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

#lstm细胞输入向量维度，即每个时刻输入一行，共28个像素
input_size = 28
#时序持续长度，28个时刻，即每做一次预测，需要输入28行
time_size = 28
#每个隐藏层节点数
hidden_size = 100
#LSTM layer的层数
layer_num = 2
#10个分类
class_num = 10
#每批次50个样本
batch_size = 50
#计算一共有多少个训练批次
batch_num = mnist.train.num_examples // batch_size

x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])

weights={'in':tf.Variable(tf.truncated_normal([input_size,hidden_size],stddev=0.1)),
         'out':tf.Variable(tf.truncated_normal([hidden_size,class_num]))}
biases={'in':tf.Variable(tf.constant(0.1,shape=[hidden_size,])),
        'out':tf.Variable(tf.constant(0.1,shape=[class_num,]))}

#定义RNN-LSTM网络
def RNN_LSTM(x,weights,biases):
    #[batch_size,time_size*input_size]==>[batch_size*time_size,input_size]
    x=tf.reshape(x,[-1,input_size])
    #输入隐层变换
    inputs=tf.matmul(x,weights["in"])+biases["in"]
    #[batch_size*time_size,hidden_size]==>[batch_size,time_size,hidden_size]
    inputs=tf.reshape(inputs,[-1,time_size,hidden_size])
    #定义LSTM基本单元lstm_cell
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size,forget_bias=1.0,state_is_tuple=True)
    #堆叠多层LSTM单元
    mlstm_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell]*layer_num,state_is_tuple=True)
    outputs,state = tf.nn.dynamic_rnn(mlstm_cell,inputs,dtype=tf.float32,time_major=False)
    #输出隐层变换
    results = tf.matmul(outputs[:,-1,:],weights["out"])+biases["out"]
    return results
    
y_=RNN_LSTM(x,weights,biases)
#交叉熵损失函数
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_,labels=y))
#使用AdamOptimizer优化器进行优化
train = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

#初始化
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    test_feed={x:mnist.test.images,y:mnist.test.labels}
    for epoch in range(6):
        #训练
        for batch in range(batch_num):
            batch_x,batch_y=mnist.train.next_batch(batch_size)
            sess.run(train,feed_dict={x:batch_x,y:batch_y})
        #预测
        acc=sess.run(accuracy,feed_dict=test_feed)
        print("Iter "+str(epoch)+", Testing Accuracy =",acc)

4 常见错误汇总

单层RNN-LSTM网络一般不会犯错，这里主要介绍多层RNN-LSTM网络中的常见错误。

4.1 输入隐层没有进行维数变换

错误提示：

ValueError: Dimensions must be equal, but are 200 and 128 for 'rnn/while/rnn/multi_rnn_cell/cell_0/
basic_lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [?,200], [128,400].

在LSTM内部有遗忘门、输入门、输出门，每个时刻权值和偏值共享。如果不对输入隐层进行维数变换，第一层的输入向量为28+100=128维，第二层的输入向量为100+100=200维。所以，在输入前需要将28维的向量映射到100维，这样两层的输入都是200维。

4.2 训练batch_size和预测batch_size不一致

很多博客和视频将如下代码

outputs,state = tf.nn.dynamic_rnn(mlstm_cell,inputs,dtype=tf.float32,time_major=False)

写为：

#用全零来初始化state
init_state = mlstm_cell.zero_state(batch_size,dtype=tf.float32)
outputs,state=tf.nn.dynamic_rnn(mlstm_cell,inputs,initial_state=init_state,time_major=False)

它将batch_size与RNN-LSTM绑定在一起了，然而训练时的batch_size和预测时的batch_size不一致（巨坑），导致出现如下报错提示：

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [10000,100] vs. shape[1] = [50,100]
[[node rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat (defined at G:/Anaconda/Spyder/lstm.py:44) ]]

这里的10000是指预测数据集的batch_size。在不删除init_state的情况下，有如下两种解决方案：

（1）将测试集的batch_size和训练集的batch_size保持一致

#预测
total_acc=0.0
for batch in range(test_batch_num):
    batch_x,batch_y=mnist.test.next_batch(batch_size)
    total_acc+=sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
acc=total_acc/test_batch_num
print("Iter "+str(epoch)+", Testing Accuracy =",acc)

（2）使用placeholder定义batch_size

.................

#每个训练批次50个样本
train_batch_size = 50
#计算一共有多少个训练批次
batch_num = mnist.train.num_examples//train_batch_size
batch_size = tf.placeholder(tf.int32,[])

.................

with tf.Session() as sess:
    sess.run(init)
    test_feed={x:mnist.test.images,y:mnist.test.labels,batch_size:mnist.test.num_examples}
    for epoch in range(6):
        #训练
        for batch in range(batch_num):
            batch_x,batch_y=mnist.train.next_batch(train_batch_size)
            sess.run(train,feed_dict={x:batch_x,y:batch_y,batch_size:train_batch_size})
        #预测
        acc=sess.run(accuracy,feed_dict=test_feed)
        print("Iter "+str(epoch)+", Testing Accuracy =",acc)