研读论文《Attention Is All You Need》（4）

最新推荐文章于 2025-05-15 15:39:13 发布

CS创新实验室

最新推荐文章于 2025-05-15 15:39:13 发布

阅读量844

点赞数 19

分类专栏：研读论文文章标签：自然语言处理人工智能深度学习

本文链接：https://blog.csdn.net/qiwsir/article/details/147955864

版权

研读论文专栏收录该内容

5 篇文章

订阅专栏

原文 7

Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations.

翻译

自注意力（亦称内部注意力）是一种通过关联单个序列内部不同位置来计算序列表征的机制，已成功应用于阅读理解、抽象摘要、文本蕴含、以及与任务无关的句子表征学习等多个领域。

重点句子解析

Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

【解析】

这句话的主干是：Self-attention is an attention mechanism. 原句中两个逗号之间的sometimes called intra-attention是插入语，对Self-attention进行补充说明，相当于一个被动语态的定语从句：which is sometimes called intra-attention；现在分词短语relating different positions of a single sequence做后置定语，修饰attention mechanism；其中的介词短语of a single sequence也是后置定语，修饰positions；不定式短语in order to compute a representation of the sequence做目的状语；其中的介词短语of the sequence是后置定语，修饰a representation。

【参考翻译】

自注意力（亦称内部注意力）是一种通过关联单个序列内部不同位置来计算序列表征的机制，

Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations.

【解析】

句子的主干是：Self-attention has been used. 后边的successfully是程度副词，做状语；in a variety of tasks是表示应用范围的介词短语，也是做状语；including…用于举例说明，修饰a variety of tasks，做后置定语。其中，and连接了四个并列的名词或动名词短语。

【参考翻译】

自注意力已成功应用于阅读理解、抽象摘要、文本蕴含、以及与任务无关的句子表征学习等多个领域。

原文 8

End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks.

翻译

端到端记忆网络基于循环注意力机制而非序列对齐的循环结构。实践证明，该网络在简单语言问答和语言建模任务中表现优异。

重点句子解析

【解析】

这是一个独立成段的句子。该句包含了两个并列的谓语结构，且谓语分别是一般现在时的被动语态(are based on)和现在完成时的被动语态(have been shown)。其主干可以概括为：主语(End-to-end memory networks)+谓语1(are based on)+宾语1(a recurrent attention mechanism) +并列谓语2(have been shown to perform well)。原句中的instead of sequence-aligned recurrence表示对比关系，意思是：而不是…。介词短语on simple-language question answering and language modeling tasks做状语，表示任务的范围。其中and连接了两个并列的动名词短语，即：question answering和language modeling，它们共同修饰tasks。

原文 9

To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as and .

翻译

但据我们所知，Transformer是首个完全依赖自注意力机制计算输入输出表征的转换模型，无需使用序列对齐的RNN或卷积结构。在后续章节中，我们将描述Transformer架构，阐述自注意力机制的原理，并论述其相对于[17,18][9]等模型的优势。

重点句子解析

To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.

【解析】

句子的主干是：the Transformer is the first transduction model. 句首的To the best of our knowledge做状语，意为：据我们所知。两个逗号之间的插入语however也是做状语，表转折关系，翻译的时候需要提到句首。 relying entirely on self-attention是现在分词短语做后置定语，修饰transduction model；不定式to compute representations…做目的状语；介词短语of its input and output做后置定语，修饰representations；介词短语without using sequence-aligned RNNs or convolution是条件状语，修饰compute。

【参考翻译】

但据我们所知，Transformer是首个完全依赖自注意力机制计算输入输出表征的转换模型，无需使用序列对齐的RNN或卷积结构。

In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as and.

【解析】

句首的介词短语In the following sections做状语，交代动作放生的时间或位置；后边的主句是主谓宾结构，只不过主语“we”后边是三个并列的动宾短语；其中，第三个动宾短语后边使用了介词短语over…做后置定语，修饰advantages；“advantages over…” 表示“相对于…的优势”。such as …用于对前边的models进行举例说明。

【参考翻译】

在后续章节中，我们将描述Transformer架构，阐述自注意力机制的原理，并论述其相对于[17,18][9]等模型的优势。