VQA文献阅读 Learning Conditioned Graph Structures for Interpretable Visual Question Answering

今天也要学习！

于 2021-05-21 10:55:56 发布

阅读量318

点赞数

分类专栏： VQA 文章标签：深度学习机器学习 pytorch

本文链接：https://blog.csdn.net/avast510/article/details/117112223

版权

该文提出了一种新的基于图卷积网络的视觉问答（VQA）方法，强调了解释性。通过结合空间、图像和文本特征，模型能够根据问题信息学习图像中对象的邻接矩阵。实验在VQAv2数据集上达到66.18%的准确率，展示了模型的可解释性和性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.动机

作者认为：
1.现有的基于图结构的VQA方法是定制的不能从抽象图像扩展到真实图像
2.没有考虑将问题信息添加进来
3.没有直观的展示得到结果的过程（Interpretable）

2.贡献

1.提出一个新的、Interpretable、基于图卷积网络的VQA方法
图中的节点表示Image features中的Bounding box ,节点之间的线条表示image中各个节点的联系强度(联系越强，线条越粗)。
线条的学习中，引入了先验知识----问题信息
2.模型的可解释性
通过Image上的bounding box 和 edges 之间的关联，来展示模型的可解释性
3.实验结果
66.18% on VQAv2数据集

3.网络结构

在这里插入图片描述

1.We develop a deep neural network that combines spatial, image and textual features in a novel manner in order to answer a question about an image.
2.Our graph learning module then learns an adjacency matrix of the image objects that is conditioned on a given question
3.the spatial gra