Tasks
Visual Description Generation
Image Description Generation
Standard Image Description Generation
Dense Image Description Generation:旨在局部目标处生成描述
Image Paragraph Generation:生成段落
Spoken Language Image Description Generation:变写为说
Stylistic Image Description Generation:添加语言风格,例如幽默,
Unseen Objects Image Description Generation:
Diverse Image Description Generation:
Controllable Image Description Generation: control and select the objects in an image to generate descriptions.
Video Description Generation
Global Video Description Generation:
Dense Video Description Generation: 类似与Dense Image Description Generation
Movie Description Generation: movie clips are used as input
Visual Storytelling
Image Storytelling:
Video Storytelling:
Visual Question Answering
Image Question Answering
Video Question Answering
Visual Dialog
Image Dialog
Video Dialog
Visual Reasoning
Image Reasoning
Video Reasoning
Video Referring Expression
Image Referring Expression
Video Referring Expression
Visual Entailment
Image Entailment
Language-to-Vision Generation
Language-to-Image Generation
Sentence-level Language-to-Image Generation
Image Manipulation(图像编辑):生通过本文来引导图像的编辑, 同时保持其他文本不相关的区域,另一种方法是交互式的修改图像内容,还有一种是通过对话修改。
Fine-grain Image Generation(细粒度的图像生成):
Sequential Image Generation(序列图像生成):给定一段文字(多个句子),生成一系列的图像,就像故事的可视化,与image storytelling相反。
Language-to-Video Generation
需要更强的条件生成器,因为需要考虑时间维度
Vision-and-Language Navigation
Image and Language Navigation
Multimodal Machine Translation
Machine Translation with Image:将描述一副图像的源语言句子翻译成目标语言。
Multisource MMT:不同点:多种语言同时描述一副图像
Machine Translation with Video
Dataset
Image Description Generation
- Flickr