Tasks
Visual Description Generation
Image Description Generation
Standard Image Description Generation

Dense Image Description Generation:旨在局部目标处生成描述
Image Paragraph Generation:生成段落
Spoken Language Image Description Generation:变写为说
Stylistic Image Description Generation:添加语言风格,例如幽默,
Unseen Objects Image Description Generation:
Diverse Image Description Generation:
Controllable Image Description Generation: control and select the objects in an image to generate descriptions.
Video Description Generation
Global Video Description Generation:

Dense Video Description Generation: 类似与Dense Image Description Generation
Movie Description Generation: movie clips are used as input
Visual Storytelling
Image Storytelling:

Video Storytelling:

Visual Question Answering
Image Question Answering

Video Question Answering

Visual Dialog
Image Dialog

Video Dialog

Visual Reasoning
Image Reasoning

Video Reasoning

Video Referring Expression
Image Referring Expression

Video Referring Expression

Visual Entailment
Image Entailment

Language-to-Vision Generation
Language-to-Image Generation
Sentence-level Language-to-Image Generation
Image Manipulation(图像编辑):生通过本文来引导图像的编辑, 同时保持其他文本不相关的区域,另一种方法是交互式的修改图像内容,还有一种是通过对话修改。
Fine-grain Image Generation(细粒度的图像生成):
Sequential Image Generation(序列图像生成):给定一段文字(多个句子),生成一系列的图像,就像故事的可视化,与image storytelling相反。
Language-to-Video Generation
需要更强的条件生成器,因为需要考虑时间维度

Vision-and-Language Navigation
Image and Language Navigation

Multimodal Machine Translation
Machine Translation with Image:将描述一副图像的源语言句子翻译成目标语言。

Multisource MMT:不同点:多种语言同时描述一副图像

Machine Translation with Video

Dataset
Image Description Generation
- Flickr

本文概述了视觉与语言集成领域的最新进展,包括视觉描述生成、视觉叙事、视觉问答、视觉对话和视觉推理等任务,以及相关的数据集和评价指标。此外,还讨论了未来的研究方向,如利用外部知识、解决大规模数据限制和新型神经架构的发展。
最低0.47元/天 解锁文章
98

被折叠的 条评论
为什么被折叠?



