【有啥问啥】工作中经常听到的Encoder-Decoder结构框架是什么？

有啥问啥

已于 2024-09-04 09:23:06 修改

阅读量949

点赞数 22

分类专栏：机器学习算法科普文章标签：人工智能 ai 深度学习机器学习

于 2024-08-17 00:15:00 首次发布

本文链接：https://blog.csdn.net/mieshizhishou/article/details/141249332

版权

科普同时被 2 个专栏收录

102 篇文章 3 订阅

订阅专栏

机器学习算法

62 篇文章 0 订阅

订阅专栏

Encoder-Decoder

工作中经常听到的Encoder-Decoder结构框架是什么？

在人工智能和深度学习领域，Encoder-Decoder结构是一个非常重要且广泛应用的框架。它能够处理从序列到序列的任务到复杂的图像生成任务。本文将详细解释Encoder-Decoder结构的工作原理，介绍相关的公式，并通过经典的例子和代码示例来帮助读者理解其应用。

1. Encoder-Decoder结构概述

Encoder-Decoder结构通常包括两个主要部分：编码器（Encoder）和解码器（Decoder）。编码器负责将输入数据转换为一个固定长度的隐层表示（latent representation），而解码器则使用这一表示生成目标输出。

2. 编码器（Encoder）

2.1 编码器的功能

编码器的任务是将输入序列 $(x_1, x_2, \dots, x_n)$ 转换为一个固定长度的向量表示 $h_n$ 。这个表示可以视为对输入数据的一种压缩。

2.2 编码器的公式表示

对于一个序列 $(x_1, x_2, \dots, x_n)$ ，编码器逐步计算每个时间步的隐藏状态 $h_t$ ：

$h_t = f(h_{t-1}, x_t)$

其中， $f$ 通常是一个RNN、LSTM或GRU单元。最终，编码器输出的最后一个隐藏状态 $h_n$ 就是整个输入序列的编码。

2.3 代码示例：简单的LSTM编码器

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)

    def forward(self, x):
        _, (hidden, _) = self.lstm(x)
        return hidden

# 示例输入
input_seq = torch.randn(10, 1, 20)  # (sequence_length, batch_size, input_size)
encoder = Encoder(input_size=20, hidden_size=50)
encoded = encoder(input_seq)
print(encoded.shape)  # 输出形状应为 (1, batch_size, hidden_size)

3. 解码器（Decoder）

3.1 解码器的功能

解码器使用编码器的输出 $h_n$ 来生成目标序列 $(y_1, y_2, \dots, y_m)$ 。解码器通常是一个递归神经网络，每一步生成的输出依赖于前一步的输出和编码向量。

3.2 解码器的公式表示

解码器逐步生成目标序列中的每个元素：

$s_t = g(s_{t-1}, y_{t-1}, h_n)$

$y_t = \text{softmax}(Ws_t + b)$

其中， $s_t$ 是解码器在时间步 $t$ 的隐藏状态， $g$ 通常是一个RNN、LSTM或GRU单元， $W$ 和 $b$ 是解码器的权重矩阵和偏置。

3.3 代码示例：简单的LSTM解码器

class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        self.lstm = nn.LSTM(hidden_size, hidden_size)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden):
        output, (hidden, _) = self.lstm(x, hidden)
        output = self.fc(output)
        return output, hidden

# 示例输入
decoder = Decoder(hidden_size=50, output_size=30)
output_seq, _ = decoder(encoded, None)
print(output_seq.shape)  # 输出形状应为 (sequence_length, batch_size, output_size)

4. 注意力机制（Attention Mechanism）

4.1 注意力机制的原理

注意力机制通过计算当前解码时间步与输入序列中各个时间步的相似度，生成一个注意力权重分布 $\alpha_t$ 。然后，将输入序列的隐藏状态按权重加权平均，得到上下文向量 $c_t$ ，并结合它生成当前的输出。

4.2 注意力机制的公式表示

注意力权重的计算：

$\alpha_{t,i} = \frac{\exp(e_{t,i})}{\sum_{j=1}^{n} \exp(e_{t,j})}$

其中， $e_{t,i} = \text{score}(s_{t-1}, h_i)$ 是解码器隐藏状态 $s_{t-1}$ 与编码器隐藏状态 $h_i$ 的相似度得分。

上下文向量的计算：

$c_t = \sum_{i=1}^{n} \alpha_{t,i} h_i$

4.3 代码示例：添加注意力机制的解码器

class AttentionDecoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(AttentionDecoder, self).__init__()
        self.lstm = nn.LSTM(hidden_size, hidden_size)
        self.fc = nn.Linear(hidden_size * 2, output_size)
        self.attn = nn.Linear(hidden_size * 2, 1)

    def forward(self, x, hidden, encoder_outputs):
        attn_weights = torch.softmax(self.attn(torch.cat((x, hidden[0]), dim=2)), dim=1)
        context = attn_weights * encoder_outputs
        context = torch.sum(context, dim=1)
        output, (hidden, _) = self.lstm(x, hidden)
        output = torch.cat((output, context.unsqueeze(0)), dim=2)
        output = self.fc(output)
        return output, hidden

# 示例输入
decoder = AttentionDecoder(hidden_size=50, output_size=30)
encoder_outputs = torch.randn(10, 1, 50)  # (sequence_length, batch_size, hidden_size)
output_seq, _ = decoder(encoded, None, encoder_outputs)
print(output_seq.shape)  # 输出形状应为 (sequence_length, batch_size, output_size)

5. Transformer架构

5.1 Self-Attention机制

在Self-Attention中，每个输入元素都会与序列中的所有其他元素计算相关性，并根据这些相关性进行加权平均。这种机制能够捕捉输入序列中的长距离依赖关系，而无需逐步处理序列。

5.2 公式表示

在Transformer中，Self-Attention的计算步骤如下：

查询（Query）、键（Key）、值（Value）计算：

$XW^Q, \quad K = XW^K, \quad V = XW^V$

注意力权重计算：

$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$

其中， $W^Q, W^K, W^V$ 是可训练的权重矩阵， $d_k$ 是键向量的维度。

5.3 代码示例：Self-Attention层

class SelfAttention(nn.Module):
    def __init__(self, hidden_size):
        super(SelfAttention, self).__init__()
        self.query = nn.Linear(hidden_size, hidden_size)
        self.key = nn.Linear(hidden_size, hidden_size)
        self.value = nn.Linear(hidden_size, hidden_size)
        self.scale = torch.sqrt(torch.FloatTensor([hidden_size]))

    def forward(self, x):
        Q = self.query(x)
        K = self.key(x)
        V = self.value(x)
        attn_weights = torch.softmax(torch.bmm(Q, K.transpose(1, 2)) / self.scale, dim=2)
        output = torch.bmm(attn_weights, V)
        return output

# 示例输入
self_attention = SelfAttention(hidden_size=50)
output = self_attention(encoder_outputs)
print(output.shape)  # 输出形状应为 (sequence_length, batch_size, hidden_size)

6. 举个栗子：Seq2Seq模型在机器翻译中的应用

6.1 问题背景

机器翻译任务是将一句话从一种语言翻译成另一种语言，例如将英语句子“ChatGPT is powerful”翻译成中文句子“ChatGPT很强大”。这是一个典型的序列到序列任务。

6.2 Seq2Seq模型的完整实现

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder):
        super(Seq2Seq, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, source, target):
        encoder_hidden = self.encoder(source)
        outputs = []
        decoder_input = target[0].unsqueeze(0)
        hidden = encoder_hidden
        for t

 in range(1, target.size(0)):
            output, hidden = self.decoder(decoder_input, hidden)
            outputs.append(output)
            decoder_input = output
        outputs = torch.cat(outputs, dim=0)
        return outputs

# 示例输入
encoder = Encoder(input_size=20, hidden_size=50)
decoder = Decoder(hidden_size=50, output_size=30)
seq2seq = Seq2Seq(encoder, decoder)

input_seq = torch.randn(10, 1, 20)  # 英文输入句子
target_seq = torch.randn(10, 1, 30)  # 中文目标句子

output_seq = seq2seq(input_seq, target_seq)
print(output_seq.shape)  # 输出形状应为 (sequence_length, batch_size, output_size)