循环神经网络--LSTM模型

m0_64592880

已于 2024-09-26 17:08:19 修改

阅读量587

点赞数 12

分类专栏：自然语言处理文章标签： rnn lstm 人工智能

于 2024-09-26 17:07:41 首次发布

本文链接：https://blog.csdn.net/m0_64592880/article/details/142559957

版权

自然语言处理专栏收录该内容

4 篇文章 0 订阅

订阅专栏

LSTM模型

1、概述：

LSTM（Long Short-Term Memory）模型是一种特殊的循环神经网络（RNN），它能够学习和记忆长期依赖关系。LSTM通过引入门控机制来解决传统RNN在处理长序列数据时遇到的梯度消失问题。这些门控机制包括遗忘门、输入门和输出门，它们可以控制信息的流动，从而使得网络能够学习到长期依赖关系。

2、门：

门是一种让信息选择式通过的方法，包含一个sigmoid神经网络层和一个pointwise乘法操作。

1、遗忘门（Forget Gate）

遗忘门决定哪些信息应该从细胞状态中被遗忘或保留。它通过以下公式计算：

其中 σ 是sigmoid激活函数，Wf是遗忘门的权重矩阵，ht−1 是上一时间步的隐藏状态，xt是当前时间步的输入，bf是偏置项。

2、输出门（Input Gate）

输入门由两部分组成：一个sigmoid层决定哪些值将要更新，和一个tanh层创建一个新的候选值向量，该向量将被加入到状态中。输入门的计算如下：

其中 it 是输入门的输出，C~t是候选值

3、状态更新（Cell State Update）

细胞状态的更新是LSTM中最关键的部分，它结合了遗忘门和输入门的信息：

其中 Ct是当前时间步的细胞状态，Ct−1 是上一时间步的细胞状态。

4、输出门（Output Gate）

输出门决定隐藏状态的值，隐藏状态包含关于观测序列的信息，输出门的计算如下：

其中 ot 是输出门的输出，ht 是当前时间步的隐藏状态。

3、代码实现

手写的简单LSTM：

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # 使用 nn.Parameter 来初始化权重和偏置
        self.w_f = np.random.rand(hidden_size, input_size+hidden_size)
        self.b_f = np.random.rand(hidden_size)

        self.w_i = np.random.rand(hidden_size, input_size+hidden_size)
        self.b_i = np.random.rand(hidden_size)

        self.w_c = np.random.rand(hidden_size, input_size+hidden_size)
        self.b_c = np.random.rand(hidden_size)

        self.w_o = np.random.rand(hidden_size, input_size+hidden_size)
        self.b_o = np.random.rand(hidden_size)

        # 输出层
        self.w_y = np.random.rand(output_size, hidden_size)
        self.b_y = np.random.rand(output_size)


    def tanh(self, x):
        return np.tanh(x)

    def sigmoid(self, x):
        return 1/(1+np.exp(-x))

    def forward(self, x):
        # 初始化隐藏状态和细胞状态
        h_t = np.zeros((self.hidden_size,))
        c_t = np.zeros((self.hidden_size,))

        h_states = []
        c_states = []

        for t in range(x.size(0)):
            x_t = x[t]
            x_t = np.concatenate([x_t, h_t])
            # 遗忘门
            f_t = self.sigmoid(np.dot(self.w_f,x_t) + self.b_f)
            # 输入门
            i_t = self.sigmoid(np.dot(self.w_i,x_t) + self.b_i)
            # 候选细胞状态
            c_hat_t = self.tanh(np.dot(self.w_c,x_t) + self.b_c)
            # 更新细胞状态
            c_t = f_t * c_t + i_t * c_hat_t
            # 输出门
            o_t = self.sigmoid(np.dot( self.w_o,x_t) + self.b_o)
            # 更新隐藏状态
            h_t = o_t * self.tanh(c_t)
            # 保存每个时间步的隐藏状态和细胞状态
            h_states.append(h_t)
            c_states.append(c_t)
        y_t = np.dot(self.w_y,h_t) + self.b_y
        output = torch.softmax(torch.tensor(y_t), dim=0)
        return np.array(h_states), np.array(c_states), output

# 将 NumPy 数组转换为 PyTorch 张量
x = torch.tensor(np.random.randn(3, 2), dtype=torch.float32)
hidden_size = 5

lstm = LSTM(input_size=2, hidden_size=hidden_size, output_size=6)
hidden_states, cell_states, output = lstm.forward(x)
print(hidden_states, cell_states, output)

多对一（简单案例）：

import torch
import torch.nn as nn

class ManyToOneLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(ManyToOneLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # 初始化隐藏状态和细胞状态
        h0 = torch.zeros(1, x.size(0), self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        # 前向传播LSTM
        out, _ = self.lstm(x, (h0, c0))
        # 只取最后一个时间步的输出
        out = out[:, -1, :]
        # 通过全连接层得到最终输出
        output = self.fc(out)
        return output

# 示例使用
input_size = 10
hidden_size = 20
output_size = 2
model = ManyToOneLSTM(input_size, hidden_size, output_size)
x = torch.randn(4, 7, input_size)
output = model(x)
print(output.shape)  # 输出形状：(4, output_size)

多对多（简单案例）：

import torch
import torch.nn as nn
import torch.nn.functional as F

class ManyToManyLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(ManyToManyLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.output_size = output_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # 初始化隐藏状态和细胞状态
        h0 = torch.zeros(1, x.size(0), self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        # 前向传播LSTM
        out, _ = self.lstm(x, (h0, c0))
        # 应用全连接层到每个时间步的输出
        out = self.fc(out)
        return out

# 超参数设置
input_size = 10  # 输入特征的维度
hidden_size = 20  # LSTM隐藏层的维度
output_size = 5   # 输出的维度

# 创建模型实例
model = ManyToManyLSTM(input_size, hidden_size, output_size)

# 示例输入数据 (batch_size, sequence_length, input_size)
x = torch.randn(4, 7, input_size)  # 假设有4个样本，每个样本是7个时间步的序列

# 前向传播
output = model(x)
print(output.shape)  # 输出形状将是 (batch_size, sequence_length, output_size)

一对多（简单案例）：

import torch
import torch.nn as nn

class OneToManyLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(OneToManyLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # 初始化隐藏状态和细胞状态
        h0 = torch.zeros(1, x.size(0), self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        # 前向传播LSTM
        out, _ = self.lstm(x, (h0, c0))
        # 应用全连接层到每个时间步的输出
        out = self.fc(out)
        return out

# 示例使用
input_size = 10
hidden_size = 20
output_size = 2
model = OneToManyLSTM(input_size, hidden_size, output_size)
x = torch.randn(4, 1, input_size)  # 假设每个样本是一个时间步
output = model(x)
print(output.shape)  # 输出形状：(4, 1, output_size)

4、序列池化

1、最大池化（MAX Pooling）：

最大池化通过选择序列中的最大值来生成固定长度的输出。在NLP中，这可以用于提取关键词或短语的最重要特征。最大池化对于异常值具有一定的鲁棒性，因为它只关注最大的激活值。

超级简单案例：

import torch
import torch.nn as nn

input_data = torch.randn(100, 1000,32)

max_pool = nn.AdaptiveMaxPool1d(1)

input_data = input_data.permute(0,2,1)

output = max_pool(input_data)
print(output.size())
#torch.Size([100, 32, 1])

2、平均池化（Average Pooling）：

平均池化通过计算序列中所有值的平均值来生成输出。这种方法倾向于平滑特征，减少噪声的影响。然而，它可能会丢失一些重要信息，因为它对所有值给予相同的权重。

又是一个超级简单的案例：

import torch
import torch.nn as nn

input_data = torch.randn(100, 1000,32)

avg_pool = nn.AdaptiveAvgPool1d(1)

input_data = input_data.permute(0,2,1)

output = avg_pool(input_data)

print(output.size())
#torch.Size([100, 32, 1])