用于视觉问题回答的差异化注意力模型《Differential Attention for Visual Question Answering》

最新推荐文章于 2025-07-08 17:13:29 发布

原创

最新推荐文章于 2025-07-08 17:13:29 发布 · 1.1k 阅读

1 ·

CC 4.0 BY-SA版权

本文介绍了用于视觉问题回答的差异化注意力模型，包括Differential Attention Network (DAN) 和 Differential Context Network (DCN)。通过寻找支持和对立示例，模型能更好地聚焦于人类注意力相关区域，提高了回答问题的准确性。实验表明，该方法在VQA任务上优于基于图像的注意力方法，并与其他最先进的方法竞争。

这是视觉问答论文阅读的系列笔记之一，本文有点长，请耐心阅读，定会有收货。如有不足，随时欢迎交流和探讨。

一、文献摘要介绍

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that the previous systems focus on are not correlated with the regions that humans focus on. The accuracy is limited due to this drawback. In this paper, we propose to solve this problem by using an exemplar based method. We obtain one or more supporting and opposing exemplars to obtain a differential attention region. This differential attention is closer to human attention than other image based attention methods. It also helps in obtaining improved accuracy when answering questions. The method is evaluated on challenging benchmark datasets. We perform better than other image based attention methods and are competitive with other state of the art methods that focus on both image and questions.

在本文中，作者的目标是在训练过程中为图像提供问题-答案对数据集时，基于图像回答问题。许多方法已经集中于通过使用基于图像的注意力来解决这个问题。这是通过在回答问题时专注于图像的特定部分来完成的。解决这个问题时，人类也会这样做。但是，以前系统关注的区域与人类关注的区域不相关。由于这个缺点，精度受到限制。在本文中，我们建议使用基于示例的方法来解决此问题。我们获得一个或多个支持和对立示例，以获得差异化的注意力区域。与其他基于图像的注意力方法相比，这种差异注意力更接近人类注意力。它也有助于在回答问题时提高准确性。在具有挑战性的基准数据集上评估了该方法。作者提出的模型比其他基于图像的注意力方法表现更好，并且与关注图像和问题的其他最新方法相比具有竞争力，大致流程如下图1所示。