Attention的两种实现方法

最新推荐文章于 2024-04-27 22:01:34 发布

Mark_Aussie

最新推荐文章于 2024-04-27 22:01:34 发布

阅读量489

点赞数

分类专栏： nlp 文章标签：深度学习

本文链接：https://blog.csdn.net/markaustralia/article/details/129854781

版权

nlp 专栏收录该内容

60 篇文章 3 订阅

订阅专栏

构建 LSTM + Attention 网络时，有两种Attention的实现方法，attention_1 使用的是原文的公式实现，如下：

def attention_1(x):  # x:[batch, seq_len, hidden_dim * 2] 2:双向lstm
    """
    根据 attention 定义实现
    @param x:
    @return:
    """
    d_k = x.size(-1)  # d_k 为 query 的维度
    k = x.transpose(1, 2)
    att = torch.matmul(x, k) / math.sqrt(d_k)
    att_score = func.softmax(att, dim=-1)
    context = torch.matmul(att_score, x).sum(1)

    return context

第二种实现方式，依据的公式：

注意力就是加权求和（有可能是只加权不求和），权重是计算向量之间的相似度。原始的注意力里有Q、K、V。V可以省略，重点是K和Q。K 代表自身，Q 代表其他。计算 Q、K 中的每个向量相似度，得到不同的权重（相似度越大权重越大），给 K 中的每个向量加权。
当对文本做注意力，文本自身就是K，Q 和 K 是相同的。另外一种方法是《 Hierarchical Attention Networks for Document Classification》提出来的，即attention_2 的依据。
此时随机初始化 Q，作为context vector，代表整个句子的语义。与句子中每个向量相乘，得到权重，再加权求和。

模型初始化时，随机生成w_omega 和 u_omega，attention_2 中输入的 x 为 LSTM 网络输出的隐藏层数据，x 点乘 w_omega (省略了上面公式中的 b)，通过 tanh 变换，在点乘 u_omega 变换shape 为 batch_size * seq_len * 1，即每个样本中的 token 对应的概率，再通过softmax求和。

# 初始时间步和最终时间步的隐藏状态作为全连接层输入
self.w_omega = nn.Parameter(torch.Tensor(n_hidden * 2, n_hidden * 2))
self.u_omega = nn.Parameter(torch.Tensor(n_hidden * 2, 1))
nn.init.uniform_(self.w_omega, -0.1, 0.1)
nn.init.uniform_(self.u_omega, -0.1, 0.1)

def attention_2(self, x):  # x:[batch, seq_len, hidden_dim * 2]
    """
    根据 attention 公式实现
    @param x:
    @return:
    """
    u = torch.tanh(torch.matmul(x, self.w_omega))  # [batch, seq_len, hidden_dim * 2], q·w 并取得相似度, 2:双向lstm
    att = torch.matmul(u, self.u_omega)  # [batch, seq_len, 1],
    att_score = func.softmax(att, dim=1)
    scored_x = x * att_score  # [batch, seq_len, hidden_dim*2]
    context = torch.sum(scored_x, dim=1)  # [batch, hidden_dim*2]

    return context

使用 lstm + attention 网络做多分类任务，使用attention_1 (即 attention 定义)的效果优于 attention_2 (公式方法) 效果(单纯实验的效果，不确定是否有普遍性)。

参考：

pytorch实现LSTM+Attention文本分类_lstm+attention +cnn文本分类pytorch_明日何其多_的博客-CSDN博客

https://www.cnblogs.com/douzujun/p/13511237.html#autoid-2-2-0