Transformer

最新推荐文章于 2024-05-29 21:14:18 发布

Doooer

最新推荐文章于 2024-05-29 21:14:18 发布

阅读量1.1w

点赞数 4

分类专栏：深度学习

本文链接：https://blog.csdn.net/YQMind/article/details/80864133

版权

强烈推荐：https://jalammar.github.io/illustrated-transformer/
特点：简单明了，清晰易懂。对Transformer里的self-attention(multi-head), positional encoding这些concepts有一个基本的认识。
缺点：具体细节仍需要进一步阅读其他资料。

不喜欢阅读英文的同学，可以看下这个很棒的中文资料：https://kexue.fm/archives/4765
优点：有写者自己更多的思考、分析。可以帮助读者对Transformer里的模块有更深刻的认识。我要为写者点赞！

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
此链接对seq2seq模型里的注意力进行了可视化解释

Transformer由encoding component和decoding component组成，encoding component由6个堆叠的Encoder组成，decoding component是6个堆叠的Decoder组成。
每一个Encoder有两个部分：self-attention + feed forward neural network
每一个Decoder有三个部分：self-attention + encoder-decoder attention + feed forward

关键的一点是，同RNN的输出一样，每个时刻都会输出一个向量表示。因此对每一个位置，都会进行self-attention以及feed forward nn。上面提到6个堆叠的Encoder，同样和堆叠RNN一样，把上一层每个时刻的输出作为当层每一时刻的输入。

在self-attention中，每个位置有其对应的Query vector, Key vector, Value vector。三个向量是通过矩阵运算得到的。
$x_1 × W_Q = q_1$

最低0.47元/天解锁文章

Doooer

关注

4
点赞
踩
21

收藏

觉得还不错? 一键收藏
2
评论
Transformer

强烈推荐：https://jalammar.github.io/illustrated-transformer/ 特点：简单明了，清晰易懂。对Transformer里的self-attention(multi-head), positional encoding这些concepts有一个基本的认识。缺点：具体细节仍需要进一步阅读其他资料。更进一步：未完…...
复制链接

扫一扫