一. 基本信息
标题:Localizing moments in video with natural language
时间:2017
出版源:ICCV
领域分类:video retrieval
二. 研究背景
问题定义:effectively localizing natural language queries in videos,given a video and text description, we identify start and end points in the video which correspond to the given text description.
难点:
1. current video datasets do not include pairs of localized video segments and referring expressions.
2. require both language and video understanding
相关工作:
三. 创新方法
1. propose the Moment Context Network (MCN) which relies on local and global video features.
2. collect the DistinctDescribable Moments (DiDeMo) dataset which consists of over 40,000 pairs of referring descriptions and localized moments in unedited videos.
四. 实验
dataset:Distinct Describable Moments (DiDeMo) dataset(新提出)
evaluation index :.Rank@1,Rank@5,mIoU
baseline comparsion:
五. 结论
作者的总结:introduce the task of localizing moments in video with natural language
自己的评价:modeling complex (temporal) sentence structure and add some complex language model to improve the accuracy.