用于视觉问答的图形推理网络模型《Graph Reasoning Networks for Visual Question Answering》

目录

一、文献摘要介绍

二、网络框架介绍

三、实验分析

四、结论


这是视觉问答论文阅读的系列笔记之一,本文有点长,请耐心阅读,定会有收货。如有不足,随时欢迎交流和探讨。

一、文献摘要介绍

The interaction between language and visual information has been emphasized in visual question answering (VQA) with the help of attention mechanism. However, the relationship between words in question has been underestimated, which makes it hard to answer questions that involve the relationship between multiple entities, such as comparison and counting. In this paper, we develop the graph reasoning networks to tackle this problem. Two kinds of graphs are investigated, namely inter-graph and intra-graph. The inter-graph transfers features of the detected objects to their related query words, enabling the output nodes to have both semantic and factual information. The intra-graph exchanges information between these output nodes from inter-graph to amplify implicit yet important relationship between objects. These two kinds of graphs cooperate with each other, and thus our resulting model can reason the relationship and dependence between objects, which leads to realization of multi-step reasoning. Experimental results on the GQA v1.1 dataset demonstrate the reasoning ability of our method to handle compositional questions about real-world images. We achieve state-of-the-art performance, boosting accuracy to 57.04%. On the VQA 2.0 dataset, we also receive a promising improvement on overall accuracy, especially on counting problem.

作者认为在视觉问答系统(VQA)中,语言和视觉信息之间的交互作用一直受到重视。然而,有关词语之间的关系被低估了,这使得人们很难回答涉及多个实体之间关系的问题,例如比较和计数为了解决这一问题,本文开发了图形推理网络。研究了两类图,即图间图和图内图。图间将被检测对象的特征传递给相关的查询词,使得输出节点同时具有语义和事实信息。图内从图间交换这些输出节点之间的信息,以放大对象之间隐含但重要的关系。这两种图相互协作,从而我们得到的模型能够推理对象之间的关系和依赖关系,从而实现多步推理。在GQA v1.1数据集上的实验结果证明了我们的方法处理真实图像合成问题的推理能力。我们实现了最先进的性能,精度提高到57.04%。在VQA 2.0数据集上,我们也得到了一个有希望的整体精度改进,特别是在计数问题上。

二、网络框架介绍

        VQA任务的目标是根据图像I回答给定的问题Q。使用对象检测器Faster-RCNN,我们将输入图像 I 转换为对象特征,其中,其中 n 是检测到的对象的数量,D 是特征维度。问题是 m 个单词的序列,可以使用LSTM将其编码为,其中,下图1是网络模型框架。

        引入BAN 可以同时减少两个输入通道,并获得问题特征 Q 和图像特征 V 的统一表示。 它首先计算 Q 和V之间的双线性注意图,并在此条件下生成联合嵌入 z ,如下所示:

注意图G定义为:

其中,是要学习的变量,

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值