tensorflow tf.nn.rnn_cell.BasicLSTMCell原理与代码详解

最新推荐文章于 2023-10-22 13:45:00 发布

weixin_42713739

最新推荐文章于 2023-10-22 13:45:00 发布

阅读量7.9k

点赞数 32

分类专栏： tensorflow 文章标签： lstm

本文链接：https://blog.csdn.net/weixin_42713739/article/details/103391813

版权

tensorflow 专栏收录该内容

20 篇文章 5 订阅

订阅专栏

LSTM内部结构

函数：

 tf.nn.rnn_cell.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True):

num_units：表示神经元的个数，
forget_bias：就是LSTM们的忘记系数，如果等于1，就是不会忘记任何信息。如果等于0，就都忘记。
state_is_tuple：默认就是True，官方建议用True，就是表示返回的状态用一个元祖表示。

这个里面存在一个状态初始化函数，就是zero_state（batch_size，dtype）两个参数。batch_size就是输入样本批次的数目，dtype就是数据类型。

上图：

在这里插入图片描述
LSTM每时每刻有3个输入：x_t， h_t-1，C_t-1。2个输出：C_t 和 h_t

可以看到中间的 cell 里面有四个黄色小框，你如果理解了那个代表的含义一切就明白了，每一个小黄框代表一个前馈网络层，num_units就是这个层的隐藏神经元个数，就这么简单。其中1、2、4的激活函数是 sigmoid，第三个的激活函数是 tanh。

另外几个需要注意的地方：

1、 cell 的状态是一个向量，是有多个值的。。。一开始没有理解这点的时候怎么都想不明白

2、上一次的状态 h(t-1)是怎么和下一次的输入 x(t) 结合（concat）起来的，这也是很多资料没有明白讲的地方，也很简单，concat，直白的说就是把二者直接拼起来，比如 x是28位的向量，h(t-1)是128位的，那么拼起来就是156位的向量，就是这么简单。。

3、 cell 的权重是共享的，这是什么意思呢？这是指这张图片上有三个绿色的大框，代表三个 cell 对吧，但是实际上，它只是代表了一个 cell 在不同时序时候的状态，所有的数据只会通过一个 cell，然后不断更新它的权重。

4、那么一层的 LSTM 的参数有多少个？根据第 3 点的说明，我们知道参数的数量是由 cell 的数量决定的，这里只有一个 cell，所以参数的数量就是这个 cell 里面用到的参数个数。假设 num_units 是128，输入是28位的，那么根据上面的第 2 点，可以得到，四个小黄框的参数一共有（128+28）*（128 * 4），也就是156 * 512，可以看看 TensorFlow 的最简单的 LSTM 的案例，中间层的参数就是这样，不过还要加上输出的时候的激活函数的参数，假设是10个类的话，就是128 * 10的 W 参数和10个bias 参数

5、cell 最上面的一条线的状态即 s(t) 代表了长时记忆，而下面的 h(t)则代表了工作记忆或短时记忆

神经元个数(num_units)

还是先上图：
在这里插入图片描述
忘记门层(forget gate): 决定从细胞状态中丢弃什么信息,通过当前时刻输入和前一个时刻输出决定
细胞状态(cell state): 确定并更新新信息到当前时刻的细胞状态中
输出门层(output gate): 基于目前的细胞状态决定该时刻的输出

简单假设样例

假设现有一个样本,Shape=(13,5),时间步是13,每个时间步的特征长度是5.形象点,我把一个样本画了出来:
在这里插入图片描述
使用Keras框架添加LSTM层时,我的设置是这样的keras.layers.LSTM(10),也就是我现在设定,每个时间步经过LSTM后,得到的中间隐向量是10维(意思是5->10维),13个时间步的数据进去得到的是(13*10)的数据.

每个时间步对应神经元个数(参数个数)一样.也就是算一个LSTM中神经元个数,算一个时间步中参与的神经元个数即可.下面将对LSTM每个计算部分进行神经元分析.

神经元分析

遗忘门

计算公式：
在这里插入图片描述

遗忘门的功能是决定应丢弃或保留哪些信息。来自前一个隐藏状态的信息和当前输入的信息同时传递到 sigmoid 函数中去，输出值介于 0 和 1 之间，越接近 0 意味着越应该丢弃，越接近 1 意味着越应该保留。

上一次的状态 h_t-1 是怎么和下一次的输入 x_t 结合（concat）起来的？
这也是很多资料没有明白讲的地方，也很简单，就是concat，直白的说就是把二者直接拼起来，比如 x是28位的向量，h(t-1)是128位的，那么拼起来就是156位的向量，就是这么简单。

图中公式的 h_t-1 是上一个状态的隐向量(已设定隐向量长度为10),为当前状态 x_t 的输入(长度为5),那么 [h_t-1, x_t ]的长度就是10+5=15了.和为该层的参数.

该层 W_f * [h_t-1, x_t ] 输出是中间隐向量的长度10（及向量为（1,10）),经过激活前后的长度不变.只需要考虑里面的操作得到10维特征即可.

[h_t-1, x_t ]是(1,15)的向量,与 W_f 相乘得到(1,10)的向量,根据矩阵相乘规律，可以得到
W_f 是(15,10)的矩阵,得到(1,10)矩阵后,与该门层偏置相加,偏置也应该有相同的形状,即 b_f 是(1,10)的矩阵.

即:该层神经元为:
在这里插入图片描述
及 W_f + b_f

细胞状态

还是先上图：
在这里插入图片描述

细胞状态分为2部分：

(1) 确定更新信息过程，公式为：
在这里插入图片描述还有

这里公式和前面的一样的,和都是激活函数,不影响参数个数，同理遗忘门，这一步的神经元为：

(2) 更新过程：

公式中的四个值,均是前面计算得到的结果,因此该过程没有W,b参数需要学习.

输出层

先上图：
在这里插入图片描述
首先，我们运行一个 sigmoid 层来确定细胞状态的哪个部分将输出出去。

接着，我们把细胞状态通过 tanh 进行处理（得到一个在 -1 到 1 之间的值）并将它和 sigmoid 门的输出相乘，最终我们仅仅会输出我们确定输出的那部分。
在这里插入图片描述
一样的公式,神经元个数一样.即个数为:

总结：
把公式(1),(2),(3)的神经元加起来,就是该LSTM的神经元个数了.

其实,我们可以把这个问题一般化,不看这个例子,假设你一个时间步的特征长度是n,经过该LSTM得到的长度是m,这样就可以算出该LSTM层的神经元个数为:
在这里插入图片描述
测试结果：

from keras.layers import LSTM
from keras.models import Sequential

time_step = 13 #
featrue = 5  #n
hidenfeatrue = 10 #m

model = Sequential()
model.add(LSTM(hidenfeatrue, input_shape=(time_step, featrue)))
model.summary()

input_shape=(13,5)，在NLP中可以理解为一个句子有13个词，所以，LSTM在时间上展开是13个框，每个词对应的向量是5维，及每个时刻的输入X_t 是5维的向量。
kreas中的model.add(LSTM(10))中， 10代表LSTM中的 h_t(hidden state)是10维，不要错误的理解为：一层添加了10个cell的神经网络。

输出结果为：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10)                640       
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
_________________________________________________________________

代码详细说明

先上结论

一个LSTM cell中有4个参数，并且形状都是一样的shape=[output_size+n,output_size],其中n表示输入张量的维度,output_size通过函数BasicLSTMCell(num_units=output_size)获得。

怎么来的？

让我们一步一步从Tensorflow的源码中来获得这些信息！

1. cell.state_size

首先，需要明白Tensorflow中，state表示的是cell中有几个状态。例如在BasicRNNCell中，state就只有h这一个状态；而在BasicLSTMCell中，state就有h和c这两个状态。其次，state_size表示的是每个状态的第二维度，也就是output_size。

代码：

import tensorflow as tf

output_size = 10
batch_size = 32
dim = 50
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=output_size)
print('cell: ', cell)
print('cell.state_size: ', cell.state_size)

结果：

cell:  <tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x000002185F6190F0>
cell.state_size:  LSTMStateTuple(c=10, h=10)

LSTMStateTuple(c=10, h=10)就表示，c和h的output_size都为10,即[batch_size,10]。另外Tensorflow在实现的时候，都将c,h困在一起了，即以Tuple的方式，这也是Tensorflow所推荐的。

2. cell.zero_state

在LSTM中，zero_state就自然对应两个部分了，h₀ , c₀

代码：

import tensorflow as tf

output_size = 10
batch_size = 32
dim = 50
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=output_size)
h0 = cell.zero_state(batch_size=batch_size, dtype=tf.float32)
print(h0)

结果：

LSTMStateTuple(c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(32, 10) dtype=float32>, h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(32, 10) dtype=float32>)

可以看到，返回了c,h两个零状态！

3. 关键性的一步cell.call

测试代码：

import tensorflow as tf

output_size = 10
batch_size = 32
dim = 50
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=output_size)
input = tf.placeholder(dtype=tf.float32, shape=[batch_size,50])
h0 = cell.zero_state(batch_size=batch_size, dtype=tf.float32)
new_h, new_state = cell(input, h0)#就是cell.call()
print('new_h: ', new_h)
print('new_state: ', new_state)

结果：

new_h:  Tensor("basic_lstm_cell/Mul_2:0", shape=(32, 10), dtype=float32)
new_state:  LSTMStateTuple(c=<tf.Tensor 'basic_lstm_cell/Add_1:0' shape=(32, 10) dtype=float32>, h=<tf.Tensor 'basic_lstm_cell/Mul_2:0' shape=(32, 10) dtype=float32>)

call源码：

def call(self, inputs, state):

_check_rnn_cell_input_dtypes([inputs, state])

sigmoid = math_ops.sigmoid
one = constant_op.constant(1, dtype=dtypes.int32)
# Parameters of gates are concatenated into one multiply for efficiency.
if self._state_is_tuple:
#   测试代码里的input是[32,50],所以这的inputs [32,50]
#   初始化的 c 和 h 都是zero_state 也就是都为[32,50]的zero，这是参数state_is_tuple的情况下，
	c, h = state
else:
	c, h = array_ops.split(value=state, num_or_size_splits=2, axis=one)

gate_inputs = math_ops.matmul(array_ops.concat([inputs, h], 1), self._kernel)
gate_inputs = nn_ops.bias_add(gate_inputs, self._bias)

# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(value=gate_inputs, num_or_size_splits=4, axis=one)
forget_bias_tensor = constant_op.constant(self._forget_bias, dtype=f.dtype)

#   计算这个cell中的new_c 和 new_h
#   forget_gate_output =  sigmoid(add(f, forget_bias_tensor))
#   input_gate_output = multiply(sigmoid(i), tanh(j))
#   update_c = add(multiply(c, forget_gate_output), input_gate_output)
#   output_gate_output = multiply(tanh(new_c), sigmoid(o))
add = math_ops.add
multiply = math_ops.multiply
new_c = add(
multiply(c, sigmoid(add(f, forget_bias_tensor))),
multiply(sigmoid(i), self._activation(j)))
new_h = multiply(self._activation(new_c), sigmoid(o))

if self._state_is_tuple:
	new_state = LSTMStateTuple(new_c, new_h)
else:
	new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state

1.计算c，h：

c, h = state：首先从state中得到传进来的c,h，也就是从参数h₀中取出c,h。
如果：state_is_tuple=True，那么输出c,h为：

c: Tensor("BasicLSTMCellZeroState/zeros:0", shape=(32, 10), dtype=float32)
h: Tensor("BasicLSTMCellZeroState/zeros_1:0", shape=(32, 10), dtype=float32)

如果：state_is_tuple=False，那么输出c,h为：

c: Tensor("basic_lstm_cell/split:0", shape=(32, 10), dtype=float32)
h: Tensor("basic_lstm_cell/split:1", shape=(32, 10), dtype=float32)

2.计算输入数据：

gate_inputs = math_ops.matmul(array_ops.concat([inputs, h], 1), self._kernel)
gate_inputs = nn_ops.bias_add(gate_inputs, self._bias)

mn = array_ops.concat([A, D], 0) # 按照第一维度相接，A=[a,m] ,D= [b,m] mn=:[a+b,m]
mn_1 = array_ops.concat([A, C], 1) # 按照第二维度相接，A= [m,a], C=[m,b] concat_done:[m,a+b]

input=[32,50]，h = [32,10],所以 array_ops.concat([inputs, h], 1)=[32,60]，_kernel=[60,40],
所以第一个gate_inputs =[32,40]

_bias=[32, ]，所以第2个gate_inputs =[32,40]，最终输入数据：[32,40]
3.计算输入门，遗忘门，输出门：

# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(value=gate_inputs, num_or_size_splits=4, axis=one)

gate_inputs =[32,40]，i, j, f, o都为[32,10]。

4.计算输出：C_t 和 h_t
forget_gate_output 就是：在这里插入图片描述
input_gate_output 就是：
update_c =new_c 就是：
output_gate_output=new_h 就是：

#   forget_gate_output =  sigmoid(add(f, forget_bias_tensor))
#   input_gate_output = multiply(sigmoid(i), tanh(j))
#   update_c = add(multiply(c, forget_gate_output), input_gate_output)
#   output_gate_output = multiply(tanh(new_c), sigmoid(o))
new_c = add(
    multiply(c, sigmoid(add(f, forget_bias_tensor))),
    multiply(sigmoid(i), self._activation(j))
    )
new_h = multiply(self._activation(new_c), sigmoid(o))

5.输出最终结果，输出的new_state 和new_h 类型如测试代码所示

new_state = LSTMStateTuple(new_c, new_h)
LSTMStateTuple：用于存储LSTM单元的state_size,zero_state和output state的元组。按顺序存储两个元素(c,h),其中c是隐藏状态，h是输出。只有在state_is_tuple=True是才使用。

参考：

https://www.cnblogs.com/wushaogui/p/9176617.html
https://mp.weixin.qq.com/s/aV9Rj-CnJZRXRm0rDOK6gg
https://www.zhihu.com/question/41949741
https://blog.csdn.net/u013230189/article/details/82808362
https://blog.csdn.net/The_lastest/article/details/83996494