Hierarchical Question-Image Co-Attention for Visual Question Answering

最新推荐文章于 2022-11-11 03:56:45 发布

这样子的话

最新推荐文章于 2022-11-11 03:56:45 发布

阅读量2.7k

点赞数

分类专栏： VQA 文章标签： VQA 视觉问答

本文链接：https://blog.csdn.net/lsh894609937/article/details/70057577

版权

当前基于视觉注意的一些VQA方法主要关注：”where to look”或者 visual attention。本文认为基于问题的attention “which word to listen to ” 或者question attenion也相当重要。基于这个动机，文中提出一种多模注意模型：Co-attention + Question Hierarchy。
Co-attention：这个部分包括基于图像的attention和基于问题的attention。图像的表示有助于提取Question Attention，同理问题的表示也有助于视觉注意的提取。
Question Hierarchy:论文提出一种图像和问题协同注意的分层架构，主要分为三层。
a).word level。将每个单词表示成向量
b) phrase level 利用一个1D CNN提取特征
c) question level 利用RNN编码整个问题。
这篇论文的主要贡献：
1.提出co-attention mechanism 机制处理VQA任务，并且采用两种策略应用这中机制，parallel and alternating co-attention。
2.采用分层结构表示问题，因此构建的image-question co-attention maps分为三个层次：word level, phrase level and question level.
3.在phrase level，采用convolution-pooling strategy 自适应选择phrase size。
4.在VQA dataset和COCO-QA上进行测试。
论文整体框架：

最低0.47元/天解锁文章

这样子的话

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hierarchical Question-Image Co-Attention for Visual Question Answering

当前基于视觉注意的一些VQA方法主要关注：”where to look”或者 visual attention。本文认为基于问题的attention “which word to listen to ” 或者question attenion也相当重要。基于这个动机，文中提出一种多模注意模型：Co-attention + Question Hierarchy。 Co-attention：这个部
复制链接

扫一扫