小周带你读论文-2之“草履虫都能看懂的Transformer老活儿新整“Attention is all you need(1)

本文是对经典论文《Attention is all you need》的个人解读,旨在以简单易懂的方式介绍Transformer架构。文章指出Transformer解决了传统Seq2Seq模型的时序计算问题,通过对比GPT和BERT的训练方式,探讨了为何Decoder-only架构逐渐成为主流,并讨论了Decoder-only的表达能力和优势。此外,还涉及了低秩问题和零样本测试,证明Decoder-only模型的优越性。
摘要由CSDN通过智能技术生成

这论文其实也不用多说了,我相信百分之70以上我的读者读过

      但是还是老规矩 1,2,3 上链接

      1706.03762.pdf (arxiv.org)

      《Attention is all you need》我如果干讲这个可能有点枯燥,毕竟好多人看过,但是这个论文又是玩LLM不可能跨过的一篇文章,所以我站在我的角度夹带点私货来对这个论文做一些个人解读,保证你们看到一篇不一样的,更丰富内容的“Attention is all you need”

       我的目的就是一定要让大家明白,所以会讲的很细,希望出一个能让草履虫都能看懂的Transformer论文解析

       我就只沾一部分原文就是background:

Background The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difficult to learn dependencies between distant positions [12]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22]. End-to-end memory networks are based on a recurrent attention mechanism instead of sequencealigned recurrence and have been shown to perform well on simple-language question answeri

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值