20240604日志:Attention

十年伴树

已于 2024-06-05 10:42:04 修改

阅读量620

点赞数 16

文章标签：自然语言处理 gpt-3 语言模型机器翻译 chatgpt

于 2024-06-04 22:33:08 首次发布

本文链接：https://blog.csdn.net/JiajunSun/article/details/139442148

版权

# location:beijing

Attention[^1]

single head of attention

Attention blocks showed in Fig. 1 allow vectors to talk to each other and pass information back and forth to update their values.
在这里插入图片描述

Fig. 1.1 single head attention (red rectangle)

there is an example to understand the mechanism of Attention. Fig. 1. shows that vector $\vec{\mathbf{E}_n}$ is a combination of token’s meaning and positional information (M&P). then $\vec{\mathbf{E}_n}$ will be sent to Attention blocks.
在这里插入图片描述

Fig. 1.2 vector with information of meaning and position

the input of Attention $\vec{\mathbf{E}_n}$ will be figured by three linear layers: V, K, Q for feature shifting.
We can see Q as a method that adds attributes that can receive information of M&P from in-context words(vectors) to $\vec{\mathbf{E}_n}$ . And the outcomes are $\vec{\mathbf{Q}_n}$