Transformer论文分析

最新推荐文章于 2024-08-31 23:09:33 发布

赛文忆莱文

最新推荐文章于 2024-08-31 23:09:33 发布

阅读量63

点赞数

文章标签： transformer 深度学习人工智能

本文链接：https://blog.csdn.net/weixin_45477628/article/details/131302748

版权

Transformer是一种由Google提出的基于Encoder-Decoder架构的模型，主要用于神经语言建模和机器翻译。它创新性地采用了全注意力机制，解决了RNNs和CNNs在序列处理中的速度限制问题。文章讨论了自注意力的概念，以及Residualconnections在模型中的应用，强调了Transformer如何通过消除循环和卷积来提高效率。

摘要由CSDN通过智能技术生成

背景

Transformer是google研究人员提出的一种模型，基于Encoder-Decoder架构。
原文为：Arxiv：Attention Is All You Need
代码库为：Github：tensorflow/tensor2tensor

任务定义

In neural language modelling, a neural network estimates a distribution over sequences of words or characters that belong to a given language (Bengio et al., 2003). In neural machine translation, the network estimates a distribution over sequences in the target language conditioned on a given sequence in the source language.
参考文献：ByteNet：Arxiv：Neural Machine Translation in Linear Time(ByteNet)
语言建模（language model）就是估计一个序列所对应的概率分布。翻译模型（machine translation）就是估计在给定序列的基础上目标语言的序列对应的概率分布。

内容分析

Introduction

循环神经网络：参考文献：IEEE Xplore：Long Short-Term Memory
IEEE Xplore：Long Short-Term Memory就是用循环神经网络去解决上面的建模问题，但是由于序列化的原因，运行速度受限。
也有用卷积网络去解决这个问题的比如VGG-16：Arxiv：Very Deep Convolutional Networks for Large-Scale Image Recognition，google研究人员在VGG-16基础上进一步进行全卷积化得到了Fine-tuned VGG-16：Arxiv：Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs。利用这个文章中所提到的膨胀技术，google研究人员又开发出了ByteNet：Arxiv：Neural Machine Translation in Linear Time。进而针对ByteNet及其他网络中存在的注意力机制的问题，提出全部仅使用注意力机制的网络：Transformer：Arxiv：Attention Is All You Need
从VGG-16变换为Fine-tuned VGG-16如下图所示：
模型结构