目录
1. Scene Graph Generation with External Knowledge and Image Reconstruction

2. Knowledge Acquisition for Visual Question Answering via Iterative Querying
Author: 李飞飞
publish: CVPR 2017

3. Towards VQA Models That Can Read


4. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

1.1 将BERT用于image roi 和text, 然后将text的word embedding和image roi的embedding用于其他任务,取得2-10个百分点的提升。

5. VL-BERT: PRE-TRAINING OF GENERIC VISUAL LINGUISTIC REPRESENTATIONS

6. VISUALBERT: A SIMPLE AND PERFORMANT BASELINE FOR VISION AND LANGUAGE


7.Dynamic Memory Networks for Visual and Textual Question Answering
publlish: ICML 2016


427

被折叠的 条评论
为什么被折叠?



