【文献阅读】能兼顾图像理解和推理能力的VQA模型（CVPR，2019）

最新推荐文章于 2023-07-03 17:27:06 发布

全部梭哈一夜暴富

最新推荐文章于 2023-07-03 17:27:06 发布

阅读量1.2k

点赞数

分类专栏：视觉问答（VQA）

本文链接：https://blog.csdn.net/AI_girl/article/details/115346474

版权

一、文章概况

文章题目：《Answer Them All! Toward Universal Visual Question Answering Models》

文章下载地址：http://openaccess.thecvf.com/content_CVPR_2019/papers/Shrestha_Answer_Them_All_Toward_Universal_Visual_Question_Answering_Models_CVPR_2019_paper.pdf

二、文献导读

摘要部分：

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-ofthe-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.

VQA的研究分为两个阵营：一个是研究自然图像理解的VQA数据集，一个是研究合成图像推理的VQA数据集。一个好的算法应该同时具备这两种能力，很少有VQA算法进行这种测试。本文利用这两个领域的8个数据集将5个顶尖的VQA算法进行了比较。实验发现这些算法不能同时在在两个领域中运行。为了解决这个问题，作者提出了一种新的VQA算法，在这个算法在两个领域中的性能超越了那些顶尖的算法。

三、文章详细介绍

对图像提出的一些开放性问题，VQA需要一个能够理解和推理这些视觉-语言的模型。然而，大多数VQA模型的顶尖算法仅仅使用偏见（biase）和表面关联（superficial correlations），没有真正的理解图像视觉内容。后来为了减少这些问题，将每个问题与互补的图像和不同的答案联系起来。列如，VQAv2减少偏见，TDIUC将多种问题的答案和罕见问题进行分析归纳，CVQA进行概念的综合性测试，VQACPv2用不同的训练和测试分布测试其性能。

最低0.47元/天解锁文章

全部梭哈一夜暴富

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
【文献阅读】能兼顾图像理解和推理能力的VQA模型（CVPR，2019）

一、文章概况文章题目：《Answer Them All! Toward Universal Visual Question Answering Models》文章下载地址：http://openaccess.thecvf.com/content_CVPR_2019/papers/Shrestha_Answer_Them_All_Toward_Universal_Visual_Question_Answering_Models_CVPR_2019_paper.pdf二、文献导读摘要部分：
复制链接

扫一扫

专栏目录