目录
摘要
This week, I learned the essential idea of the attention mechanism, the computational process, the concept of the self attention model and its advantages. The self attention model makes it easier to capture long-distance interdependent features in sentences and increases computational parallelism; in addition, I read a paper on the attention mechanism of natural language processing, and the Transformer model proposed in the paper makes the training speed significantly faster.
本周,我学习了attention机制的本质思想、计算过程以及self attention模型的概念及其优点,self attention模型使得捕获句子中长距离的相互依赖的特征更加容易,也增加了计算的并行性;另外,我阅读了一篇有关自然语言处理注意力机制的论文,文章提出的Transformer模型使得训练速度明显提升。
深度学习
1、attention机制的本质思想
2、attention机制的具体计算过程
(1)根据Query和Key计算两者的相关性
可以引入不同函数和计算机制,根据Query和Key,计算两者的相似性或者相关性,常见方法:
(2)对1的原始分值进行softmax归一化处理
![](https://i-blog.csdnimg.cn/blog_migrate/777cc875b141d35809f192c6f30156d7.png)