TRAR: Routing the Attention Spans in Transformer for Visual Question Answering——论文

最新推荐文章于 2024-07-25 23:06:58 发布

weixin_45154287

最新推荐文章于 2024-07-25 23:06:58 发布

阅读量226

点赞数

文章标签： transformer 深度学习人工智能

本文链接：https://blog.csdn.net/weixin_45154287/article/details/133775790

版权

Abstract

transformer优点：the superior ability of global dependency modeling（全局关系建模的能力很强）

目前存在问题：how to dynamically schedule the global and local dependency modeling in Transformer has become an emerging issue（如何动态规划transformer中的全局和局部依赖关系建模已经成为一个问题）

本文方法：example-dependent routing scheme called TRAnsformer Routing (TRAR) to address this issue（依赖例子的路由策略）

具体做法：each visual Transformer layer is equipped with a routing module with different attention spans（每个视觉transformer层都配有一个路由模块，有不同的注意力范围）

1. Introduction

目前存在的现象： the multi-modal inference often requires visual attentions from different receptive fields（多模态推理经常需要来自不同感受野的视觉注意力）

grid features： the semantic information of grid features are more fragmented（grid特征的语义更加碎片化）

待解决的关键问题：helping Transformer networks to explore different attention spans（帮助模型探索不同的注意力范围）

本文方法：a novel yet lightweight routing scheme called Transformer Routing (TRAR)（自动选择注意力）

具体做法：equips each visual SA layer with a path controller to predict the next attention span (or receptive field) based on the output of the previous step（给每一个SA层配置一个路径控制器，根据上一层的输出来决定下一层的注意力范围/感受野范围）

2. Related Work

2.1. Visual Question Answering

2.2. Referring Expression Comprehension

multi-stage modeling（先得到框，再根据指代文本选择最佳区域）

single stage modeling（文本引导的目标检测）

2.3. Dynamic Neural Networks

优点：can adapt their structures or parameters to the given example during inference（在测试过程中能够根据给定的样本自适应调整模型结构和参数）

3. Transformer Routing

3.1. Routing Process

X表示上一层推理步骤的特征，Fi表示特征空间，这里有n个，X'表示下一个推理步骤的输出，α可以是soft，也可以是hard

标准的Self-attention机制（该函数可以被视为特征更新函数）：

本文改进：

D为二进制，如果该注意力在范围内，则值为1，否则值为0，即注意力矩阵（邻接矩阵）变得相对比较稀疏。

修正后的Self-attention为：

为了简化计算量，进一步将self-attention改为：

将控制信息流动的α的加权求和过程转移到了D矩阵上。

3.2. Path Controller

3.3. Attention Span

借助sliding-window design of convolution（卷积操作中的滑动窗的设置），不过作者重新命名了这个模块，即order neighborhood（图论中的顺序邻接，一阶表示3*3，二阶表示5*5）

3.4. Optimization

Soft routing：连续可微的α

Hard routing：Gumbel-max trick

3.5. Network Structure

weixin_45154287

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering——论文

the superior ability of global dependency modeling（全局关系建模的能力很强）Transformer has become an emerging issue（如何动态规划transformer中的全局和局部依赖关系建模已经成为一个问题）example-dependent routing scheme called TRAnsformer Routing (TRAR) to address this issue（依赖例子的路由策略）
复制链接

扫一扫