解释与注意:用于视觉问答的一场获得注意的两人游戏模型《Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA》

本文提出了一种对抗性注意网络(AAN),结合Grad-CAM,通过对抗训练提高VQA任务的注意力。通过让注意力网络与鉴别器博弈,使注意力分布更接近Grad-CAM,从而提升模型性能并增强与人类注意力的相关性。
摘要由CSDN通过智能技术生成

目录

一、文献摘要介绍

二、网络框架介绍

三、实验分析

四、结论


这是视觉问答论文阅读的系列笔记之一,本文有点长,请耐心阅读,定会有收货。如有不足,随时欢迎交流和探讨。

一、文献摘要介绍

In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is challenging to provide supervision for attention. An observation we make is that visual explanations as obtained through class activation mappings (specifically Grad-CAM) that are meant to explain the performance of various networks could form a means of supervision. However, as the distributions of attention maps and that of Grad-CAMs differ, it would not be suitable to directly use these as a form of supervision. Rather, we propose the use of a discriminator that aims to distinguish samples of visual explanation and attention maps. The use of adversarial training of the attention regions as a two-player game between attention and explanation serves to bring the distributions of attention maps and visual explanations closer. Significantly, we observe that providing such a means of supervision also results in attention maps that are more closely related to human attention resulting in a substantial improvement over baseline stacked attention network (SAN) models. It also results in a good improvement in rank correlation metric on the VQA task. This method can also be combined with recent MCB based methods and results in consistent improvement. We also provide comparisons with other means for learning distributions such as based on Correlation Alignment (Coral), Maximum Mean Discrepancy (MMD) and Mean Square Error (MSE) losses and observe that the adversarial loss outperforms the other forms of learning the attention maps. Visualization of the results also confirms our hypothesis that attention maps improve using this form of supervision.

在本文中,作者旨在提高对视觉问题解答(VQA)任务的关注。提供监督以引起注意是一项挑战。作者所做的观察是,通过类激活映射(特别是Grad-CAM)获得的视觉解释(旨在解释各种网络的性能)可以形成一种监管手段。但是,由于注意力图的分布和Grad-CAM的分布不同,因此不适合将其直接用作监督形式。相反,作者建议使用区分器,以区分视觉解释和注意图的样本。使用注意力区域的对抗训练作为注意力和解释之间的两人游戏,可以使注意力图和视觉解释的分布更加接近。重要的是,我们观察到,提供这种监管手段还可以使注意力图与人的注意力更加紧密相关,从而大大改善了基线堆叠注意力网络(SAN)模型。这也导致VQA任务的等级相关度量得到了很好的改善。该方法也可以与最近基于MCB的方法结合使用,从而获得一致的改进。作者还提供了与其他学习分布方式的比较,例如基于相关比对(Coral),最大平均差异(MMD)和均方误差(MSE)损失,并观察到对抗损失优于学习注意图的其他形式。结果的可视化也证实了我们的假设,即使用这种形式的监督可以改善注意力图。

二、网络框架介绍

作者解决视觉问题解答(VQA)的方法的主要重点是利用从视觉解释方法(例如Grad-

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值