Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning

最新推荐文章于 2022-03-15 10:54:51 发布

judgechen1997

最新推荐文章于 2022-03-15 10:54:51 发布

阅读量535

点赞数

分类专栏： Reasoning

本文链接：https://blog.csdn.net/judgechen1997/article/details/106751306

版权

Reasoning 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

Inferring and Executing Programs for Visual Reasoning

Introduction
Methods
- Training
Experiments
Conclusion

这里有一些介绍：
https://zhuanlan.zhihu.com/p/28654835
这个工作最后还搞了一部分新数据集
The CLEVR-Humans Dataset，就是用CLEVR的合成图片，让人重新写了一些question & answer，语法逻辑更natural
在这里插入图片描述

Introduction

Motivation
原先的VQA model都是input-output mappings，不具备推理能力
所以提出
a new model for visual question answering that consists of two parts: a program generator and an execution engine.
打破了以前做VQA就用CNN叠LSTM的简单粗暴套路

我们知道，CLEVR生成的时候是先有其专门的functional programs，填入参数可以得到answer
在这里插入图片描述
所以这篇工作，第一步是先去预测这些program，第二步是通过这些program预测最终的answer。
（感觉完全是针对CLEVR的生成方式而设计的模型啊。感觉本质是人依据程序逻辑创造出一个虚拟的数据，再让算法观察数据去模拟这种程序思维。而实际真实世界场景，并不一定可以这样用清晰地用逻辑解析出来）

有两种训练方式：
they can be trained separately when ground-truth programs are
available, or jointly in an end-to-end fashion.
把CLEVR生成时中间过程的programs也可拿来训练，怎么有种作弊的感觉。。。挺tricky的，这和之前的VQA比有点不太公平了

Methods

program generator是左边的seq2seq，预测出program
右边的execution也全都是neural network组成，输入是program和image，输出是所有可能答案的概率分布，相当于一个分类器：
在这里插入图片描述
关于Execution Engine，是由若干个模块组装而成的
而且会依据不同的program z选择不同的module组装，然后执行得出answer

参看下知乎：
https://zhuanlan.zhihu.com/p/28654835
在这里插入图片描述

Training

利用ground-truth programs分别训练 both the program generator and execution engine，效果自然非常好。然鹅，
Annotating ground-truth programs for free-form natural language questions is expensive, so in practice we may have few or no ground-truth programs.

没有ground-truth programs只得使用REINFORCE，jointly training
实际效果很差，不好训练优化，所以又提出 semi-supervised learning approach

First, use a small set of ground-truth programs to train the program generator
Then, fix the program generator and train the execution engine using predicted programs on a large dataset of (x,q,a) triples.
Finally, we use REINFORCE to jointly finetune the program generator and execution engine.