【23-24 秋学期】NNDL 作业11 LSTM

t落

已于 2023-12-19 21:13:03 修改

阅读量428

点赞数 11

文章标签： lstm 机器学习人工智能

于 2023-12-19 21:10:35 首次发布

本文链接：https://blog.csdn.net/y18733648428/article/details/135061054

版权

习题6-4 推导LSTM网络中参数的梯度，并分析其避免梯度消失的效果

LSTM（长短期记忆网络）是简单循环神经网络的一个变体，可以有效地解决简单循环网络的梯度爆炸或消失问题。LSTM相较于简单循环神经网络，多出了三个控制门：输入门（input gate）、遗忘门（forget gate）和输出门（output gate）。

首先，写出LSTM的更新公式：

输入门：i $_{t}$ = $\sigma$ (W $_{i}$ X $_{t}$ +U $_{i}$ h $_{t-1}$ +b $_{i}$ )

遗忘门：f $_{t}$ = $\sigma$ (W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )

非线性得到的候选状态： $\tilde{C}$ $_{t}$ = tanh(W $_{c}$ X $_{t}$ +U $_{c}$ h $_{t-1}$ +b $_{c}$ )

输出门：O $_{t}$ = $\sigma$ (W $_{o}$ X $_{t}$ +U $_{o}$ h $_{t-1}$ +b $_{o}$ )

内部状态：C $_{t}$ =f $_{t}$ $\bigodot$ C $_{t-1}$ +i $_{t}$ $\bigodot$ $\tilde{C}$ $_{t}$

输出信息到隐藏状态的外部状态：h $_{t}$ =O $_{t}$ $\bigodot$ tanh(C $_{t}$ )

其中，X $_{t}$ 表示当前的输入，h $_{t-1}$ 表示前一刻的隐藏状态，C $_{t-1}$ 表示前一刻的细胞状态，i $_{t}$ 表示输入门的输出， f $_{t}$ 表示遗忘门输出，C $_{t}$ 表示当前的细胞状态，O $_{t}$ 表示输出门的输出，h $_{t}$ 表示当前的隐藏状态。

然后，对参数梯度进行推导。以权重W $_{i}$ 为例：

$\frac{\partial L}{\partial W_{i}}$ = $\sum_{t=1}^{T}$ $\frac{\partial L}{\partial i_{t}}$ · $\frac{\partial i_{t}}{\partial W_{i}}$

通过链式法则得到：

$\frac{\partial i_{t}}{\partial W_{i}}$ = $\frac{\partial }{\partial W_{i}}$ $\sigma$ (W $_{i}$ X $_{t}$ +U $_{i}$ h $_{t-1}$ +b $_{i}$ )= $\sigma$ '(W $_{i}$ X $_{t}$ +U $_{i}$ h $_{t-1}$ +b $_{i}$ )·X $_{t}$

其中， $\sigma$ 'sigmoid函数的导数

同理，可得权重W $_{f}$ 和W $_{o}$ 的导数为：

$\frac{\partial f_{t}}{\partial W_{f}}$ = $\frac{\partial }{\partial W_{f}}$ $\sigma$ (W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )= $\sigma$ '(W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )·X $_{t}$

$\frac{\partial O_{t}}{\partial W_{o}}$ = $\frac{\partial }{\partial W_{o}}$ $\sigma$ (W $_{o}$ X $_{t}$ +U $_{o}$ h $_{t-1}$ +b $_{o}$ )= $\sigma$ '(W $_{o}$ X $_{t}$ +U $_{o}$ h $_{t-1}$ +b $_{o}$ )·X $_{t}$

由此可得:(以f $_{t}$ 为例）

$\frac{\partial f_{t}}{\partial h_{f}}$ = $\frac{\partial }{\partial h_{f}}$ $\sigma$ (W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )= $\sigma$ '(W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )·h $_{t-1}$

$\frac{\partial f_{t}}{\partial b_{f}}$ = $\frac{\partial }{\partial b_{f}}$ $\sigma$ (W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )= $\sigma$ '(W $_{f}$ X $_{t}$ +U $_{f}$ h $_{t-1}$ +b $_{f}$ )

i $_{t}$ 与O $_{t}$ 对于U和b参数得导数与上式大致相同。

LSTM网络得设计使其能够有效地缓解梯度消失问题。

习题6-3P 编程实现下图LSTM运行过程

同学提出，未发现 $h_{t-1}$ 输入。可以适当改动例题，增加该输入。

实现LSTM算子，可参考实验教材代码。

1. 使用Numpy实现LSTM算子

代码：

import numpy as np

x = np.array([[1, 0, 0, 1],
              [3, 1, 0, 1],
              [2, 0, 0, 1],
              [4, 1, 0, 1],
              [2, 0, 0, 1],
              [1, 0, 1, 1],
              [3, -1, 0, 1],
              [6, 1, 0, 1],
              [1, 0, 1, 1]])   #x与b的数据
#以下为权重
W_i = np.array([0, 100, 0, -10]) #输入门
W_o = np.array([0, 0, 100, -10]) #输出门
W_f = np.array([0, 100, 0, 10]) #遗忘门
W_c = np.array([1, 0, 0, 0])    #内部状态
#sigmoid函数
def sigmoid(x):
    y = 1 / (1 + np.exp(-x))
    if y >= 0.5:
        return 1
    else:
        return 0

#初始化变量
temp = 0
t=0
y = []
p=[]
memory=[]
for input in x:
    memory.append(t)  # 将当前记忆值添加到memory中
    temp_c = np.tanh(np.sum(np.multiply(input, W_c)))
    temp_input = sigmoid(np.sum(np.multiply(input, W_i)))
    temp_forget = sigmoid(np.sum(np.multiply(input, W_f)))
    temp_output = sigmoid(np.sum(np.multiply(input, W_o)))
    temp = temp_c * temp_input + temp_forget * temp
    if input[1]== 1:
        t += input[0]
    if input[1]==-1:
        t=0
    if input[2] ==1:
        p.append(t)
    if input[2] ==0:
        p.append(0)    #h的状态
    y.append(temp_output * np.tanh(temp)) #LSTM得到的结果
print('c为：',memory)
print('输出门：',p)
outputs = [round(x) for x in y]  #将y中的数字整数化。
print(outputs)

结果：

2. 使用nn.LSTMCell实现

代码：

import torch
import torch.nn as nn

# 输入数据 x 维度需要变换，因为LSTMcell接收的是(time_steps,batch_size,input_size)
# time_steps = 9, batch_size = 1, input_size = 4
x = torch.tensor([[1, 0, 0, 1],
                  [3, 1, 0, 1],
                  [2, 0, 0, 1],
                  [4, 1, 0, 1],
                  [2, 0, 0, 1],
                  [1, 0, 1, 1],
                  [3, -1, 0, 1],
                  [6, 1, 0, 1],
                  [1, 0, 1, 1]], dtype=torch.float)
#在第二个维度上加一个维度，便于与LSTM模型的输入匹配
x = x.unsqueeze(1)
# LSTM的输入大小和隐藏层大小
input_size = 4
hidden_size = 1

# 定义一个LSTM单元
lstm_cell = nn.LSTMCell(input_size=input_size, hidden_size=hidden_size, bias=False)
#为LSTM单元设置权重值
lstm_cell.weight_ih.data = torch.tensor([[0, 100, 0, 10],  # 遗忘门
                                         [0, 100, 0, -10],  # 输入门
                                         [1, 0, 0, 0],  # 内部状态
                                         [0, 0, 100, -10]],dtype=torch.float)  # 输出门
#设置隐层到隐层的权重为全0矩阵
lstm_cell.weight_hh.data = torch.zeros([4 * hidden_size, hidden_size])
#初始化隐层和细胞状态为全0状态
hx = torch.zeros(1, hidden_size)
cx = torch.zeros(1, hidden_size)
#创建一个存储输出的列表
outputs = []
#通过LSTM单元处理每个输入数据并收集输出
for i in range(len(x)):
    hx, cx = lstm_cell(x[i], (hx, cx))
    outputs.append(hx.detach().numpy()[0][0])
#将输出的值四舍五入并输出
outputs_rounded = [round(x) for x in outputs]
print(outputs_rounded)

结果：

3. 使用nn.LSTM实现

代码：

import torch
import torch.nn as nn

# 输入数据 x 维度需要变换，因为 LSTM 接收的是 (sequence_length, batch_size, input_size)
# sequence_length = 9, batch_size = 1, input_size = 4
x = torch.tensor([[1, 0, 0, 1],
                  [3, 1, 0, 1],
                  [2, 0, 0, 1],
                  [4, 1, 0, 1],
                  [2, 0, 0, 1],
                  [1, 0, 1, 1],
                  [3, -1, 0, 1],
                  [6, 1, 0, 1],
                  [1, 0, 1, 1]], dtype=torch.float)
#给输入数据x增加一个时间步维度，使其维度变成(sequence_length, batch_size, input_size)
#使其能被LSTM模型接受
x = x.unsqueeze(1)

# LSTM 的输入大小 和隐藏层大小
input_size = 4
hidden_size = 1

# 定义 LSTM 模型   b为偏置。
lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, bias=False)

# 设置 LSTM 的权重矩阵
#ih表示从输入门到隐层的权重矩阵，共经过四个门
lstm.weight_ih_l0.data = torch.tensor([[0, 100, 0, 10],  # forget gate
                                       [0, 100, 0, -10],  # input gate
                                       [1, 0, 0, 0],  # output gate
                                       [0, 0, 100, -10]],dtype=torch.float)  # cell gate
#hh表示从隐层到隐层的权重矩阵
lstm.weight_hh_l0.data = torch.zeros([4 * hidden_size, hidden_size])

# 初始化隐藏状态和记忆状态
hx = torch.zeros(1, 1, hidden_size)
cx = torch.zeros(1, 1, hidden_size)

# 进行前向传播
outputs, (hx, cx) = lstm(x, (hx, cx))
outputs = outputs.squeeze().tolist()
#对结果进行四舍五入处理，使其整数化，便于展示和观察
outputs_rounded = [round(x) for x in outputs]
print(outputs_rounded)

结果：