Localizing moments in video with natural language

最新推荐文章于 2023-07-23 16:10:06 发布

sakus

最新推荐文章于 2023-07-23 16:10:06 发布

阅读量1.1k

点赞数

分类专栏： NII

本文链接：https://blog.csdn.net/sakus/article/details/84035370

版权

NII 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一. 基本信息

标题：Localizing moments in video with natural language

时间：2017

出版源：ICCV

领域分类：video retrieval

二. 研究背景

问题定义：effectively localizing natural language queries in videos,given a video and text description, we identify start and end points in the video which correspond to the given text description.
 
难点：
1. current video datasets do not include pairs of localized video segments and referring expressions.
2. require both language and video understanding 

相关工作：

三. 创新方法

1. propose the Moment Context Network (MCN) which relies on local and global video features.
2. collect the DistinctDescribable Moments (DiDeMo) dataset which consists of over 40,000 pairs of referring descriptions and localized moments in unedited videos.

在这里插入图片描述
四. 实验

    dataset：Distinct Describable Moments (DiDeMo) dataset（新提出)

    evaluation index ：.Rank@1,Rank@5,mIoU

    baseline comparsion：

在这里插入图片描述

五. 结论

作者的总结：introduce the task of localizing moments in video with natural language

自己的评价：modeling complex (temporal) sentence structure and add some complex language model to improve the accuracy.