双向长短期记忆网络（BiLSTM）简介

SimpleLearing

已于 2025-05-02 22:31:42 修改

阅读量1w

点赞数 8

分类专栏：多模态理解文章标签：人工智能深度学习

于 2024-05-17 16:16:55 首次发布

本文链接：https://blog.csdn.net/yiqiedouhao11/article/details/139007679

版权

多模态理解专栏收录该内容

23 篇文章

订阅专栏

双向长短期记忆网络（Bidirectional Long Short-Term Memory, BiLSTM）是一种改进的循环神经网络（Recurrent Neural Network, RNN），专门设计用于处理序列数据。BiLSTM 能够通过结合前向和后向两个 LSTM 网络的输出来捕捉序列中的双向依赖关系。

基本概念

LSTM

长短期记忆网络（LSTM）是一种特殊的 RNN，能够学习长期依赖关系。LSTM 通过引入门机制（输入门、遗忘门和输出门）来克服传统 RNN 在处理长序列时的梯度消失和梯度爆炸问题。

LSTM 的基本结构包括：

遗忘门（Forget Gate）：决定丢弃多少来自先前时刻的信息。
输入门（Input Gate）：决定当前时刻的信息有多少被写入到细胞状态中。
输出门（Output Gate）：决定输出多少信息到下一个时刻。

双向 LSTM

BiLSTM 在每个时间步运行两个独立的 LSTM，一个从序列的开始到结束（前向 LSTM），另一个从序列的结束到开始（后向 LSTM）。这两个 LSTM 的输出结合在一起，能够同时考虑前后文信息。

BiLSTM 的结构

BiLSTM 的架构如下所示：

Input sequence: x1, x2, x3, ..., xn

Forward LSTM:    --> h1, h2, h3, ..., hn

Backward LSTM:   <-- h1', h2', h3', ..., hn'

Combined output: [h1, h1'], [h2, h2'], [h3, h3'], ..., [hn, hn']

每个时刻的输出是前向和后向 LSTM 的隐状态的连接（或者其他组合方式，如相加）。

优点

捕捉双向依赖关系：相比于单向 LSTM，BiLSTM 能够同时考虑到序列的前后文信息，对需要了解全局上下文的任务（如命名实体识别、机器翻译等）非常有利。
改善性能：在许多自然语言处理（NLP）任务中，BiLSTM 通常比单向 LSTM 具有更好的表现。

实现示例

在 PyTorch 中，实现 BiLSTM 非常简单。以下是一个简单的 BiLSTM 实现示例：

import torch
import torch.nn as nn

class BiLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(BiLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # 定义双向 LSTM
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        # 定义全连接层
        self.fc = nn.Linear(hidden_size * 2, output_size)
    
    def forward(self, x):
        # 初始化 LSTM 的隐状态和细胞状态
        h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(x.device)  # 2 for bidirectional
        c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(x.device)
        
        # 通过 LSTM
        out, _ = self.lstm(x, (h0, c0))  # out 的形状为 (batch_size, seq_length, hidden_size * 2)
        
        # 通过全连接层
        out = self.fc(out[:, -1, :])  # 取最后一个时间步的输出
        
        return out

# 示例：假设输入特征维度为 10，LSTM 隐层大小为 20，2 层 LSTM，输出大小为 1
input_size = 10
hidden_size = 20
num_layers = 2
output_size = 1

model = BiLSTM(input_size, hidden_size, num_layers, output_size)

# 假设输入的 batch 大小为 32，序列长度为 5，特征维度为 10
inputs = torch.randn(32, 5, input_size)

# 前向传播
outputs = model(inputs)
print(outputs.shape)  # 输出大小为 (32, 1)