Attention的学习

吴小东丶

已于 2022-04-04 13:55:45 修改

阅读量1k

点赞数 1

分类专栏： pytorch python 文章标签：编辑器 java vim

于 2022-04-01 16:03:33 首次发布

本文链接：https://blog.csdn.net/qq_42325691/article/details/123900597

版权

pytorch 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

python

1 篇文章 0 订阅

订阅专栏

Attention的学习

简介：

其实attention机制是对于当前状态的生成，对前面的状态的一个关注程度。通过当前的隐藏状态，去和之前的hidden进行比对，也相当于计算相似度。（其实代码实现的话，就是一个两个矩阵拼接进行一个Linear，最中终Encoder中会得出hidden个score，每个score对应每个hidden的值，再进行一个点积也相当于加权求和，得出batch个attention的向量，然后通过attention、decoder_input,hidden_state进行拼接再经过一个Linear得出预测值）

参考：https://wmathor.com/index.php/archives/1451/

这个博主b站视频：https://www.bilibili.com/video/BV1op4y1U7ag/

1.Encoder部分

encoder部分和传统的seq2seq其实是一样的
经过encoder 会获得两个东西
1. encoder_output : 是encoder中循环神经网络的最后一层全部的hidden_state
2. hidden : 是encoder中循环神经网络每层的最后一个hidden

在这里插入图片描述
请添加图片描述

2.Attention部分

Attention分为两部分：

计算权重 得出 score
加权求和 得出 Attention_vector

请添加图片描述

计算权重：

公式：
$Weight : a_i = align(h_i,s_0)$

具体编码的时候（伪代码）：
```
energy=tanh(linear(torch.cat(hi,s0),hidden_num))
a_i = linear(energy,1)
```
这里会得到seq_len个a ，就是分数，我会经过一个softmax
$softmax(a_{1-seq_len})$
加权求和：

公式：
$Attention-vector:c_0 = a_1h_1+...+a_mh_m$

3.Decoder部分

经过前两步操作，我现在拥有的是 :

$S_{t-1} (上一个hidden)$
$A t t e n t i o n - v e c$
$d e c o d e r - i n p u t$

步骤：

把 Attention_vec , decoder_input 进行cat拼接，得到input
,把input , s(t-1) ** 传入rnn网络中，得出output** 其实就是新的 s(t)
最后进行预测,就是一个Linear,这个Linear是 output 和 Attention_vec 和 decoder_input 进行拼接经过Linear变成word_size_dim大小进行预测

下面给出伪代码：

input = torch.cat(c , encoder_input)
encoder_output = rnn(input,s)
pre = Linear(encoder_output,Attention_vec,encoder_input)

4.需要注意的地方

在Encoder阶段得到的最后一个hidden要经过一个tanh非线性变换作为Decoder的初始阶段。
最后做预测的时候并不用做一个softmax，因为在loss选择交叉熵的时候里面会给你做softmax

吴小东丶

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Attention的学习

Attention的学习
复制链接

扫一扫

专栏目录

Attention的学习

Attention的学习

1.Encoder部分

2.Attention部分

3.Decoder部分

4.需要注意的地方

“相关推荐”对你有帮助么？