tensorflow实现循环神经网络——经典网络（LSTM、GRU、BRNN）

最新推荐文章于 2024-05-30 08:00:00 发布

尼古拉斯·two_dog

最新推荐文章于 2024-05-30 08:00:00 发布

阅读量1.4k

点赞数 2

分类专栏：深度学习文章标签： tensorflow 深度学习

本文链接：https://blog.csdn.net/gm_ergou/article/details/118360593

版权

深度学习专栏收录该内容

28 篇文章 4 订阅

订阅专栏

参考链接：

https://www.cnblogs.com/tensorflownews/p/7293859.html

http://www.360doc.com/content/17/0321/10/10408243_638692495.shtml

http://blog.itpub.net/31555081/viewspace-2221434/

https://blog.csdn.net/qq_34000894/article/details/80421007

循环神经网络——RNN：

在普通多层BP神经网络基础上，增加了隐藏层各单元间的横向联系，通过一个权重矩阵，可以将上一个时间序列的神经单元的值传递至当前的神经单元，从而使神经网络具备了记忆功能，对于处理有上下文联系的NLP、或者时间序列的机器学习问题，有很好的应用性。

优点：模型具备记忆性。
缺点：不能记忆太前或者太后的内容，因为存在梯度爆炸或者梯度消失。

常见的几种RNN：
1.长短期记忆网络——LSTM：解决RNN的梯度消失和梯度爆炸的问题

基本原理：
在普通RNN基础上，在隐藏层各神经单元中增加记忆单元，从而使时间序列上的记忆信息可控，
每次在隐藏层各单元间传递时通过几个可控门（遗忘门、输入门、候选门、输出门），
可以控制之前信息和当前信息的记忆和遗忘程度，从而使RNN网络具备了长期记忆功能

1.遗忘门：决定我们会从细胞状态中丢弃什么信息.（sigmod函数——1完全保留，0——完全丢弃）
2.输入门：决定让多少新的信息加入到 cell 状态中来.（sigmod函数决定）
3.候选门：一个 tanh 层生成一个向量，也就是备选的用来更新的内容.
4.输出门：运行一个 sigmoid 层来确定细胞状态的哪个部分将输出出去；接着，我们把细胞状态通过 tanh 进行处理，并将它和 sigmoid 门的输出相乘，最终我们仅仅会输出我们确定输出的那部分。

优点：比RNN具备长期记忆功能，可控记忆能力。
缺点：网络结构上比较复杂，门多，对效率又影响。

2.门控循环单元——GRU：是LSTM的一个变体。

它把输入门和遗忘门进行了合并，把St和Ct，即记忆单元和输出单元进行了合并，提高了效率。

1.更新门：决定上一层隐藏层状态中有多少信息传递到当前隐藏状态h_t中，或者说前一时刻和当前时刻的信息有多少需要继续传递的；越接近0为说明上一层隐藏状态的信息在该隐藏层被遗忘，接近1则说明在该隐藏层继续保留。

2.重置门：与更新门的运算操作类似，只是权重矩阵不同而已。
重置门是决定上一时刻隐藏状态的信息中有多少是需要被遗忘的。当该值接近于0，则说明上一时刻的信息在当前记忆内容中被遗忘，接近于1则说明在当前记忆内容中继续保留。

两者区别：
更新门是作用于上一时刻隐藏状态和记忆内容，并最终作用于当前时刻的隐藏状态；
重置门作用于当前记忆内容。
ps：
GRU和LSTM基本结构一样，只是门做了改变；因此，代码中只需要把定义LSTM模型的地方改成定义GRU模型即可.

3.双向循环神经网络——BRNN：BRNN由两个RNN上下叠加在一起组成的，输出由这两个RNN的状态共同决定。

基本思想：
前时刻的输出不仅和之前的状态有关，还可能和未来的状态有关系。
比如预测一句话中缺失的单词不仅需要根据前文来判断，还需要考虑它后面的内容，真正做到基于上下文判断。

求解模型参数的方法——BPTT：
它的基本原理和BP算法是一样的，也包含同样的三个步骤：
首先确定参数的初始化值，然后
1.前向计算每个神经元的输出值；于是，就有了代价函数的值，接下来需要算出各个参数的梯度，从而能够让参数沿梯度下降。

2.反向计算每个神经元的误差项值，它是误差函数E对神经元j的加权输入的偏导数；
反向传播包括两个层面：
一个是空间上层面的，将误差项向网络的上一层传播；
一个是时间层面上的，沿时间反向传播，即从当前t时刻开始，计算每个时刻的误差。

3.计算每个权重的梯度。

最后不断重复前面的1-3步，用随机梯度下降算法更新权重。

双向LSTM实现步骤：

1.先定义两个LSTM：
# lstm模型正方向传播的RNN
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(embedding_size, forget_bias=1.0)
# 反方向传播的RNN
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(embedding_size, forget_bias=1.0)

2.使用双向循环函数——bidirectional_dynamic_rnn()：
(outputs, output_states) = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell, lstm_bw_cell, embedded_chars, dtype=tf.float32)

代码实现：LSTM、GRU

"""
LSTM与GRU区别：
把BasicLSTMCell改成GRUCell，就变成了GRU网络
lstm_cell = tf.nn.rnn_cell.GRUCell(num_units=hidden_size)
"""

# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
from PIL import Image

config = tf.ConfigProto()
sess = tf.Session(config=config)
mnist = input_data.read_data_sets ('mnist', one_hot=True)
print(mnist.train.images.shape)

# 设置用到的参数
lr = 1e-3
# 在训练和测试的时候 想使用不同的batch_size 所以采用占位符的方式
batch_size = tf.placeholder(tf.int32, [])
# 输入数据是28维 一行 有28个像素
input_size = 28
# 时序持续时长为28  每做一次预测，需要先输入28行
timestep_size = 28
# 每个隐含层的节点数
hidden_size = 64
# LSTM的层数
layer_num = 2
# 最后输出的分类类别数量，如果是回归预测的呼声应该是1
class_num = 10
_X = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, class_num])
keep_prob = tf.placeholder(tf.float32)

# 定义一个LSTM结构， 把784个点的字符信息还原成28*28的图片
X = tf.reshape(_X, [-1, 28, 28])
def unit_lstm():
    # 定义一层LSTM_CELL hiddensize 会自动匹配输入的X的维度
    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=hidden_size, forget_bias=1.0, state_is_tuple=True)
    
    # lstm_cell = tf.nn.rnn_cell.GRUCell(num_units=hidden_size)
    """
    LSTM与GRU区别：把BasicLSTMCell改成GRUCell，就变成了GRU网络
    """

    # 添加dropout layer， 一般只设置output_keep_prob
    lstm_cell = tf.nn.rnn_cell.DropoutWrapper(cell=lstm_cell, input_keep_prob=1.0, output_keep_prob=keep_prob)
    return lstm_cell
# 调用MultiRNNCell来实现多层 LSTM
mlstm_cell = tf.nn.rnn_cell.MultiRNNCell([unit_lstm() for i in range(3)], state_is_tuple=True)

# 使用全零来初始化state
init_state = mlstm_cell.zero_state(batch_size, dtype=tf.float32)
outputs, state = tf.nn.dynamic_rnn(mlstm_cell, inputs=X, initial_state=init_state,
                                   time_major=False)
h_state = outputs[:, -1, :]

# 设置loss function 和优化器
W = tf.Variable(tf.truncated_normal([hidden_size, class_num], stddev=0.1), dtype=tf.float32)
bias = tf.Variable(tf.constant(0.1, shape=[class_num]), dtype=tf.float32)
y_pre = tf.nn.softmax(tf.matmul(h_state, W) + bias)
# 损失和评估函数
cross_entropy = -tf.reduce_mean(y * tf.log(y_pre))
train_op = tf.train.AdamOptimizer(lr).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_pre, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

# 开始训练
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(1000):
        _batch_size = 128
        batch = mnist.train.next_batch(_batch_size)
        if (i+1)%200 == 0:
            train_accuracy  = sess.run(accuracy, feed_dict={
                _X: batch[0], y: batch[1], keep_prob: 1.0, batch_size: _batch_size
            })
            print("step %d, training accuracy %g" % ((i+1), train_accuracy ))
        sess.run(train_op, feed_dict={_X: batch[0], y: batch[1], keep_prob: 0.5,
                                      batch_size: _batch_size})
images = mnist.test.images
labels = mnist.test.labels
print("test accuracy %g" % sess.run(accuracy,feed_dict={_X: images, y: labels, keep_prob: 1.0,
                                                        batch_size: mnist.test.images.shape[0]}))

current_y = mnist.train.labels[5]
current_x = mnist.train.images[5]
print(current_y)
plt.show(current_x)

# 将原始数据进行转换，变为模型能够识别
current_x.shape = [-1, 784]
current_y.shape = [-1, class_num]
current_outputs = np.array(sess.run(outputs, feed_dict={
        _X: current_x, y: current_y, keep_prob: 1.0,batch_size: 1}))
current_outputs.shape = [28, hidden_size]

# 计算模型里边的变量
h_W = sess.run(W, feed_dict={_X: current_x,y: current_y, keep_prob: 1.0,batch_size: 1})
h_bias = sess.run(bias, feed_dict={_X: current_x,y: current_y, keep_prob: 1.0,batch_size: 1})
h_bias.shape = [-1, 10]

# 识别过程
bar_index = range(class_num)
for i in range(current_outputs.shape[0]):
    plt.subplot(7, 4, i+1)
    current_h_shate = current_outputs[i, :].reshape([-1, hidden_size])
    current_formula = tf.nn.softmax(tf.matmul(current_h_shate, h_W) + h_bias)
    pro = sess.run(current_formula)
    plt.bar(bar_index, pro[0], width=0.2)
    plt.axis('off')
plt.show()

代码实现：BRNN（双向循环神经网络）

#coding:utf-8
import tensorflow as tf
import numpy as np 
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('mnist',one_hot = True)

learning_rate = 0.01
max_samples = 400000
batch_size = 128
display_step = 10


n_input = 28 #图像的宽度
n_steps = 28 #LSTM的展开步数，图像的高
n_hidden = 256 
n_classes = 10

x = tf.placeholder('float',[None,n_steps,n_input])#none 高度 宽度
y = tf.placeholder('float',[None,n_classes])
weights = tf.Variable(tf.random_normal([2*n_hidden,n_classes]))
biases = tf.Variable(tf.random_normal([n_classes]))


def BiRNN(x,weights,biases):
    x = tf.transpose(x,[1,0,2])
    x = tf.reshape(x,[-1,n_input])
    x = tf.split(x,n_steps)

    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden,forget_bias = 1.0)   #定义向前循环
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden,forget_bias = 1.0)   #定义向后循环

    """
    ps：改成GRUCell即可变成双向GRU循环神经网络
    lstm_fw_cell = tf.contrib.rnn.GRUCell(n_hidden)
    lstm_bw_cell = tf.contrib.rnn.GRUCell(n_hidden)
    """

    outputs,_,_ = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell,lstm_bw_cell,x,dtype = tf.float32)   #调用双向循环函数

    return tf.matmul(outputs[-1],weights)+biases



pred = BiRNN(x,weights,biases)


#优化部分
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred,labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))

accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    step = 1
    while step*batch_size<max_samples:
        batch_x,batch_y = mnist.train.next_batch(batch_size)
        batch_x = batch_x.reshape((batch_size,n_steps,n_input))
        sess.run(optimizer,feed_dict={x:batch_x,y:batch_y})
        if step % display_step ==0:
            acc = sess.run(accuracy,feed_dict={x:batch_x,y:batch_y})
            loss = sess.run(cost,feed_dict={x:batch_x,y:batch_y})
            print ("step",step,"    acc = ",acc,"   lost = ",loss)
        step += 1

    print ("finished!")

尼古拉斯·two_dog

关注

2
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
tensorflow实现循环神经网络——经典网络（LSTM、GRU、BRNN）

参考链接：https://www.cnblogs.com/tensorflownews/p/7293859.htmlhttp://www.360doc.com/content/17/0321/10/10408243_638692495.shtmlhttp://blog.itpub.net/31555081/viewspace-2221434/https://blog.csdn.net/qq_34000894/article/details/80421007循环神经网络——RNN：..
复制链接

扫一扫