Transformer architecture的解释

最新推荐文章于 2024-06-03 20:19:24 发布

Takoony

最新推荐文章于 2024-06-03 20:19:24 发布

阅读量1k

点赞数

分类专栏： deep learning

deep learning 专栏收录该内容

166 篇文章 17 订阅

订阅专栏

Go Forth And Transform

I hope you’ve found this a useful place to start to break the ice with the major concepts of the Transformer. If you want to go deeper, I’d suggest these next steps:

Read the Attention Is All You Need paper, the Transformer blog post (Transformer: A Novel Neural Network Architecture for Language Understanding), and the Tensor2Tensor announcement.
Watch Łukasz Kaiser’s talk walking through the model and its details
Play with the Jupyter Notebook provided as part of the Tensor2Tensor repo
Explore the Tensor2Tensor repo.

Follow-up works:

Depthwise Separable Convolutions for Neural Machine Translation
One Model To Learn Them All
Discrete Autoencoders for Sequence Models
Generating Wikipedia by Summarizing Long Sequences
Image Transformer
Training Tips for the Transformer Model
Self-Attention with Relative Position Representations
Fast Decoding in Sequence Models using Discrete Latent Variables
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Acknowledgements

Thanks to Illia Polosukhin, Jakob Uszkoreit, Llion Jones , Lukasz Kaiser, Niki Parmar, and Noam Shazeer for providing feedback on earlier versions of this post.

Please hit me up on Twitter for any corrections or feedback.

转载自：http://jalammar.github.io/illustrated-transformer/

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Takoony

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Transformer architecture的解释

Go Forth And TransformI hope you’ve found this a useful place to start to break the ice ...
复制链接

扫一扫