论文-《Visual Question Answering as Reading Comprehension Hui》笔记

论文下载

摘要:

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help ofcommon sense or general knowledge which usually appear in the form of text. Current methods jointly embed both the visual information and the textual feature into the same space. Nevertheless, how to model the complex interactions between the two different modalities is not an easy work. In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem. With this transformation, our method not only can tackle VQA datasets that focus on observation based questions, but can also be naturally extended to handle knowledge-based VQA which requires to explore large-scale external knowledge base. It is a step towards being able to exploit large volumes oftext and natural language processing techniques to address VQA problem. Two types of models are proposed to deal with open-ended VQA and multiple-choice VQA respectively. We evaluate our models on three VQA benchmarks. The comparable performance with the state-of-the- art demonstrates the effectiveness ofthe proposed method.

         VQA需要同时理解图片内容和自然语言问题,某些情况下还需要使用外部知识。当前的方法大多数都是将视觉信息和文本特征在同一个空间中融合,这并不是一件容易的工作,因此,作者提出将所有的输入信息通过自然语言表达,将VQA任务转换成机器阅读理解的问题。通过这种转换,不仅可以处理VQA中目标检测的问题,还可以处理需要大量外部知识库的VQA数据集。这种方法可以利用大量的文本和自然语言处理技术来处理VQA问题。作者针对开放式VQA问题和多选择式VQA问题提出了两种不同的方法。在三个VQA基准进行评估,比较结果表明提出的这种方法具有高效性。

 

背景:

介绍:

        VQA是一个新兴的问题,要求算法根据图像回答自然语言问题,在计算机视觉和自然语言处理领域有着很好的应用前景。某种程度上来说,VQA和TQA任务很相近,但是VQA因为引入了视觉信息变得更有挑战性。

        大多数的VQA都使用CNN来表示图

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值