NNDL 作业11 LSTM

02（网络界泥石流）

已于 2023-12-19 03:26:19 修改

阅读量350

点赞数 8

文章标签：深度学习 lstm rnn

于 2023-12-19 03:24:23 首次发布

本文链接：https://blog.csdn.net/m0_63591032/article/details/135071961

版权

习题6-4 推导LSTM网络中参数的梯度，并分析其避免梯度消失的效果

LSTM（Long Short-Term Memory）是一种递归神经网络（RNN）的变体，用于处理序列数据。它通过使用门控机制来解决传统 RNN 中的梯度消失问题。下面将推导 LSTM 网络中参数的梯度，并分析其避免梯度消失的效果。

首先，我们定义 LSTM 神经元的输入门、遗忘门、输出门和细胞状态如下：

输入门： $i_{t}=\sigma (W_{xi}x_{t}+W_{hi}h_{t-1}+b_{i})$

遗忘门： $f_{t}=\sigma (W_{xf}x_{t}+W_{hf}h_{t-1}+b_{f})$

输出门： $o_{t}=\sigma (W_{xo}x_{t}+W_{ho}h_{t-1}+b_{o})$

细胞状态： $c_{t}=\sigma (f_{t}c_{t-1}+i_{t}tanh(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c}))$

隐藏状态： $h_{t}=o_{t}tanh(c_{t})$

其中， $x_{t}$ 是当前时间步的输入， $h_{t-1}$ 是上一个时间步的隐藏状态， $i_{t}$ 、 $f_{t}$ 、 $o_{t}$ 分别表示输入门、遗忘门和输出门的激活值， $c_{t}$ 表示细胞状态， $h_{t}$ 表示当前时间步的隐藏状态。 $W_{*}$ 和 $b_{*}$ 是模型参数， $\sigma$ 是 sigmoid 函数。

现在开始推导 LSTM 网络中参数的梯度。我们以单个时间步为例，对于其他时间步，可以通过展开网络来计算梯度。

首先，计算 $\partial h_{t}/\partial h_{t-1}$ : $\partial h_{t}/\partial h_{t-1}=(\partial h_{t}/ \partial c_{t})*( \partial c_{t}/ \partial h_{t-1} )$

由于 $h_{t}=o_{t}tanh(c_{t})$ ,我们有： $\partial h_{t}/\partial c_{t}=o_{t}*\partial tanh(c_{t})/\partial c_{t}=o_{t}*(1-tanh^{2}(c_{t}))*\partial c_{t}\partial h_{t-1}=f_{t}$

接下来，计算关于输入的梯度 $\partial h_{t}/\partial W_{xi}$ : $\partial h_{t}/\partial W_{xi}=(\partial h_{t}/\partial c_{t})*(\partial c_{t}/,\partial W_{xi})$ 由于 $c_{t}=f_{t}c_{t-1}+i_{t}tanh(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})$ ,我们有：

$\partial c_{t}/\partial W_ {xi}=i_{t}*\partial tanh(W_{xc}x_{t}+W_{hc}h_{t}+b_{c})/\partial W_{xi}=i_{t}tanh^{'}(W_{hc}x_{t}+W_{hc}h_{t-1}+b_{c})*\partial (W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})/\partial W_{xi}=i_{t}tanh^{'}(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})x_{t}$

其中，tanh′ 表示对 tanh 函数的导数。

类似地，我们可以计算关于其他参数的梯度，如 $\partial h_{t}/\partial W_{hi}$ 、 $\partial h_{t}/\partial W_{xf}$ 、 $\partial h_{t}/\partial W_{hf}$ 等。

通过反向传播算法，我们可以计算出损失函数对于 LSTM 网络中各个参数的梯度，并使用梯度下降算法来更新参数。

LSTM 网络之所以能够避免梯度消失问题，是因为它的门控机制可以控制信息在时间序列中的流动。遗忘门可以决定是否丢弃过去的状态，输入门可以决定是否接受新的输入，输出门可以决定隐藏状态的输出。这些门的存在使得网络能够更好地处理长期依赖关系，从而避免了梯度消失问题。

总之，LSTM 网络通过门控机制和细胞状态的记忆来解决了传统 RNN 中的梯度消失问题。通过这种方式，LSTM 网络能够更好地捕捉时间序列中的长期依赖关系，提高了模型的性能和效果。

习题6-3P 编程实现下图LSTM运行过程

同学提出，未发现输入。可以适当改动例题，增加该输入。

实现LSTM算子，可参考实验教材代码。

1. 使用Numpy实现LSTM算子

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Wi = np.array([0, 100, 0, -10])
Wc = np.array([1, 0, 0, 0])
Wf = np.array([0, 100, 0, 10])
Wo = np.array([0, 0, 100, -10])

x = [[1, 0, 0, 1], [3, 1, 0, 1], [2, 0, 0, 1], [4, 1, 0, 1], [2, 0, 0, 1], [1, 0, 1, 1], [3, -1, 0, 1], [6, 1, 0, 1], [1, 0, 1, 1]]
xt = np.array(x)

a_prev = 0
c_prev = 0
memory = []
y = []

for i in range(xt.shape[0]):
    memory.append(round(c_prev))
    ft = sigmoid(np.dot(xt[i], Wf.T))
    it = sigmoid(np.dot(xt[i], Wi.T))
    ct = np.dot(xt[i], Wc.T)
    ot = sigmoid(np.dot(xt[i], Wo.T))
    c_prev = c_prev * ft + it * ct
    a_prev = ot * c_prev
    y.append(round(a_prev))

print("Memory: ", memory)
print("y", y)

2. 使用nn.LSTMCell实现

import torch
import torch.nn as nn

# 设置随机数种子
torch.manual_seed(0)

# 定义输入和隐藏层的大小
input_size = 4
hidden_size = 1

# 定义输入数据
xt = torch.tensor([
    [1, 0, 0, 1], [3, 1, 0, 1], [2, 0, 0, 1],
    [4, 1, 0, 1], [2, 0, 0, 1], [1, 0, 1, 1],
    [3, -1, 0, 1], [6, 1, 0, 1], [1, 0, 1, 1]
], dtype=torch.float32)

# 定义LSTM单元，不使用偏置项
lstm_cell = nn.LSTMCell(input_size, hidden_size, bias=False)

# 初始化权重矩阵
with torch.no_grad():
    lstm_cell.weight_ih[:] = torch.tensor([
        [0., 100., 0., 10.],   # forget gate
        [0., 100., 0., -10.],  # input gate
        [1., 0., 0., 0.],      # output gate
        [0., 0., 100., -10.]   # cell gate
    ])
    lstm_cell.weight_hh[:] = torch.zeros([hidden_size * 4, hidden_size])

# 初始化隐藏状态和细胞状态
hx = torch.zeros(1, hidden_size)
cx = torch.zeros(1, hidden_size)

# 存储每一步的细胞状态和输出
cell_memory = []
cell_y = []

# 遍历输入数据
for i in range(xt.shape[0]):
    hx, cx = lstm_cell(xt[i].unsqueeze(0), (hx, cx))
    cell_memory.append(round(cx.detach().numpy()[0][0]))
    cell_y.append(round(hx.detach().numpy()[0][0]))

# 输出细胞状态和输出
print(cell_memory)
print(cell_y)

3. 使用nn.LSTM实现

import torch
import torch.nn as nn

# 设置随机数种子
torch.manual_seed(0)

# 定义输入和隐藏层的大小
input_size = 4
hidden_size = 1
num_layers = 1

# 定义输入数据
xt = torch.tensor([
    [1, 0, 0, 1], [3, 1, 0, 1], [2, 0, 0, 1],
    [4, 1, 0, 1], [2, 0, 0, 1], [1, 0, 1, 1],
    [3, -1, 0, 1], [6, 1, 0, 1], [1, 0, 1, 1]
], dtype=torch.float32)

# 定义LSTM模型，不使用偏置项
lstm = nn.LSTM(input_size, hidden_size, num_layers, bias=False)

# 初始化权重矩阵
with torch.no_grad():
    lstm.weight_ih_l0[:] = torch.tensor([
        [0., 100., 0., 10.],   # forget gate
        [0., 100., 0., -10.],  # input gate
        [1., 0., 0., 0.],      # output gate
        [0., 0., 100., -10.]   # cell gate
    ])
    lstm.weight_hh_l0[:] = torch.zeros([hidden_size * 4, hidden_size])

# 初始化隐藏状态和细胞状态
hx = torch.zeros(num_layers, 1, hidden_size)
cx = torch.zeros(num_layers, 1, hidden_size)

# 前向传播
output, _ = lstm(xt.unsqueeze(1), (hx, cx))

# 输出结果
for i, y in enumerate(output.squeeze(1)):
    print(f"{i} : {round(y.item())}")