Encoder and Decoder with Attention Model

最新推荐文章于 2024-10-12 19:01:58 发布

来自宇宙岛的海龟

最新推荐文章于 2024-10-12 19:01:58 发布

阅读量1k

点赞数

分类专栏： DL

本文链接：https://blog.csdn.net/weixin_44064649/article/details/103314676

版权

本文介绍了Encoder-Decoder模型与注意力机制在序列学习中的应用，通过对比RNNs和LSTMs的局限性，阐述了注意力机制如何解决长距离依赖问题。Transformer模型利用注意力机制获取单词的相关上下文信息，实现更智能的表示，为机器翻译和自然语言处理任务带来突破。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Encoder Decoder with Attention model is a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. It uses a multilayered Gated Recurrent Unit (GRU) to map the input sequence to a vector of a fixed dimensionality, and then another deep GRU to decode the target sequence from the vector.

kaggle
在这里插入图片描述
A sequence to sequence model has two parts – an encoder and a decoder. Both the parts are practically two different neural network models combined into one giant network. the task of an encoder network is to understand the input sequence, and create a smaller dimensional representation of it. This representation is then forwarded to a decoder network which generates a sequence of its own that represents the output. The input is put through an encoder model which gives us the encoder output. Here, each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence. We use Bahdanau attention for the encoder.

One type of network built with attention is called a transformer (explained below). If you understand the transformer, you understand attention. And the best way to understand the transformer is to contrast it with the neural networks that came before. They differ in the way they process input (which in turn contains assumptions about the structure of the data to be processed, assumptions about the world) and automatically recombine that input into relevant features.

Recurrent Networks and LSTMs
How do words work? Wel