NNDL 作业11 LSTM

最新推荐文章于 2024-09-12 18:02:47 发布

dabulalala

最新推荐文章于 2024-09-12 18:02:47 发布

阅读量393

点赞数 11

文章标签： lstm 人工智能 rnn

本文链接：https://blog.csdn.net/m0_72169474/article/details/135023550

版权

本文介绍了LSTM网络中如何通过门控机制（如遗忘门、输入门和输出门）处理梯度问题，避免梯度消失，并提供了使用Numpy和PyTorch实现LSTM算子和细胞的例子。

摘要由CSDN通过智能技术生成

习题6-4 推导LSTM网络中参数的梯度，并分析其避免梯度消失的效果

在反向传播中，需要更新的参数依旧是w,u,b,只不过在长短程循环网络中多加入了门控装置，使得w,u,b在输入门、输出门、遗忘门各有不同。

LSTM中通过门控机制解决梯度问题，遗忘门、输入门和输出门是非0就是1的，并且三者之间都是相加关系，梯度能够很好的在LSTM传递，减轻了梯度消失发生的概率，门为0时，上一刻的信息对当前时刻无影响，没必要接受传递更新参数了。

习题6-3P 编程实现下图LSTM运行过程

同学提出，未发现h(t-1)输入。可以适当改动例题，增加该输入。

实现LSTM算子，可参考实验教材代码。

1. 使用Numpy实现LSTM算子

import numpy as np
 
x = np.array([[1, 0, 0, 1],
              [3, 1, 0, 1],
              [2, 0, 0, 1],
              [4, 1, 0, 1],
              [2, 0, 0, 1],
              [1, 0, 1, 1],
              [3, -1, 0, 1],
              [6, 1, 0, 1],
              [1, 0, 1, 1]])
# x = np.array([
#               [3, 1, 0, 1],
#
#               [4, 1, 0, 1],
#               [2, 0, 0, 1],
#               [1, 0, 1, 1],
#               [3, -1, 0, 1]])
inputGate_W = np.array([0, 100, 0, -10])
outputGate_W = np.array([0, 0, 100, -10])
forgetGate_W = np.array([0, 100, 0, 10])
c_W = np.array([1, 0, 0, 0])
 
 
def sigmoid(x):
    y = 1 / (1 + np.exp(-x))
    if y >= 0.5:
        return 1
    else:
        return 0
 
 
temp = 0
y = []
c = []
for input in x:
    c.append(temp)
    temp_c = np.sum(np.multiply(input, c_W))
    temp_input = sigmoid(np.sum(np.multiply(input, inputGate_W)))
    temp_forget = sigmoid(np.sum(np.multiply(input, forgetGate_W)))
    temp_output = sigmoid(np.sum(np.multiply(input, outputGate_W)))
    temp = temp_c * temp_input + temp_forget * temp
    y.append(temp_output * temp)
print("memory:",c)
print("y     :",y)

2. 使用nn.LSTMCell实现

import torch
import torch.nn as nn
 
 
# 输入数据 x 维度需要变换，因为LSTMcell接收的是(time_steps,batch_size,input_size)
# time_steps = 9, batch_size = 1, input_size = 4
x = torch.tensor([[1, 0, 0, 1],
                  [3, 1, 0, 1],
                  [2, 0, 0, 1],
                  [4, 1, 0, 1],
                  [2, 0, 0, 1],
                  [1, 0, 1, 1],
                  [3, -1, 0, 1],
                  [6, 1, 0, 1],
                  [1, 0, 1, 1]], dtype=torch.float)
x = x.unsqueeze(1)
# LSTM的输入size和隐藏层size
input_size = 4
hidden_size = 1
 
# 定义LSTM单元
lstm_cell = nn.LSTMCell(input_size=input_size, hidden_size=hidden_size, bias=False)
 
lstm_cell.weight_ih.data = torch.tensor([[0, 100, 0, 10],   # forget gate
                                         [0, 100, 0, -10],  # input gate
                                        [1, 0, 0, 0], # output gate
                                        [0, 0, 100, -10]]).float()  # cell gate
lstm_cell.weight_hh.data = torch.zeros([4 * hidden_size, hidden_size])
#https://runebook.dev/zh/docs/pytorch/generated/torch.nn.lstmcell
 
hx = torch.zeros(1, hidden_size)
cx = torch.zeros(1, hidden_size)
outputs = []
for i in range(len(x)):
    hx, cx = lstm_cell(x[i], (hx, cx))
    outputs.append(hx.detach().numpy()[0][0])
outputs_rounded = [round(x) for x in outputs]
print(outputs_rounded)

3. 使用nn.LSTM实现

import torch
import torch.nn as nn
 
 
# 输入数据 x 维度需要变换，因为 LSTM 接收的是 (sequence_length, batch_size, input_size)
# sequence_length = 9, batch_size = 1, input_size = 4
x = torch.tensor([[1, 0, 0, 1],
                  [3, 1, 0, 1],
                  [2, 0, 0, 1],
                  [4, 1, 0, 1],
                  [2, 0, 0, 1],
                  [1, 0, 1, 1],
                  [3, -1, 0, 1],
                  [6, 1, 0, 1],
                  [1, 0, 1, 1]], dtype=torch.float)
x = x.unsqueeze(1)
 
# LSTM 的输入 size 和隐藏层 size
input_size = 4
hidden_size = 1
 
# 定义 LSTM 模型
lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, bias=False)
 
# 设置 LSTM 的权重矩阵
lstm.weight_ih_l0.data = torch.tensor([[0, 100, 0, 10],   # forget gate
                                        [0, 100, 0, -10],  # input gate
                                        [1, 0, 0, 0],      # output gate
                                        [0, 0, 100, -10]]).float()  # cell gate
lstm.weight_hh_l0.data = torch.zeros([4 * hidden_size, hidden_size])
 
# 初始化隐藏状态和记忆状态
hx = torch.zeros(1, 1, hidden_size)
cx = torch.zeros(1, 1, hidden_size)
 
# 前向传播
outputs, (hx, cx) = lstm(x, (hx, cx))
outputs = outputs.squeeze().tolist()
 
# print(outputs)
outputs_rounded = [round(x) for x in outputs]
print(outputs_rounded)