[Base] Agent Attention

Xy-unu

于 2024-08-19 18:55:19 发布

阅读量406

点赞数 9

文章标签：论文阅读 transformer

本文链接：https://blog.csdn.net/weixin_45863274/article/details/141331149

版权

1. BaseInfo


Title	Agent Attention: On the Integration of Softmax and Linear Attention
Adress	https://arxiv.org/pdf/2312.08874
Journal/Time	202312 ECCV2024
Author	清华自动化系
Code	https://github.com/LeapLabTHU/Agent-Attention
Table	Attention

2. Creative Q&A

Q：
Swin Transformer: 缩小感受野，局部自注意力。
PVT: 稀疏注意力模式，通过减少 Q 和 V 的数量。
影响长程关系建模能力，并且仍然不如全局自我关注机制。
A：
主要是针对 Softmax 计算复杂度的改进。引入了 A 的额外 token 。利用注意权重之间的冗余来实现高模型表达性和低计算复杂度。计算复杂度低了后，就可以采用更大的感受野。
Motivation

3. Concrete

可应用下游任务：

Classification
Segmentation
Detection
Agent Attention for Stable Diffusion

3.1. Model

model
Softmax 和 Linear Attention 的集成。

在这里插入图片描述
DWC ： depthwise convolution
tokens A is obtained through pooling

3.2. Dataset

ImageNet1K classification
ADE20K semantic segmentation
COCO object detection

3.3. Eval

在这里插入图片描述

3.4. Ablation

Ablation on key designs.
Ablation on number of agent tokens.
Comparison with Other Linear Attention

3.5. Appendix

与 GPViT 和 GRL 的不同之处。
内容很丰富，后面实验的时候再仔细看吧…

4. Reference

Our code is developed on the top of PVT, Swin Transformer, CSwin Transformer and ToMeSD.

5. Additional

不想看代码就先读论文。在周三之前把 3 篇读完，尝试一下 SLViT 。
我现在看 0815 写的桥接论文给自己提出的问题都还没解决。
事情一件一件做吧。
我怎么觉得这篇论文我也看过，甚至在周报里还写过。
这个公式证明倒是写的也挺简洁明了。图也画的挺清晰的。
实验内容是真的丰富。附录里还有很多实验设置。
30页的论文。是一个即插即用的 Attention 模块。
笔记写的简略。