一. 基本信息
标题:Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
时间:2016
出版源:CVPR
领域分类:video captioning
二. 研究背景
问题定义:given a video, generate a paragraph(multiple sentences)
难点:inter-sentence dependency and a paragraph is inherently hierarchical.
三. 创新方法
- Framework:
(A) sentence generator —RNN
(B) paragraph generator —RNN
四. 实验
dataset:
- YouTube2Text
open-domain
1,970 videos, ~80k video-sentence pairs, 12k unique words > only one sentence for a video (special case)
- TACoS-MultiLevel
closed-domain: cooking
173 videos, 16,145 intervals, ~40k interval-sentence pairs, 2k unique words > several dependent sentences for a video
evaluation metrics:
- BLEU
- METEOR
- CIDEr
五. Conclusions & Discussions
Hierarchical RNN improves paragraph generation
Issues:
- Most errors occur when generating nouns; small objects hard
to recognize (on TACoS-MultiLevel) - One-way information flow
- Language model helps, but sometimes overrides computer vision result in a wrong way