【Paper】CVPR13_Learning video saliency from human gaze using candidate selection

这是研究生生涯在saliency方向上迈出的第一步,即看的第一篇论文,更是写的第一篇阅读笔记,希望大家多多包涵,更欢迎大家批评指正。

Learning video saliency from human gaze using candidate selection_CVPR2013

从人眼目光中使用候选集选择的方式学习视频的显著性


本文由Technion(以色列理工大学)和Adobe Research Seattle合作研究的。主要内容是由于视频和静态图像的不同,所以视频显著性和图像显著性也大有不同。提出了一种用于视频显著性估计的新方法,其灵感来自于人们观看视频的方式。


目录
-----------------------------------------
1 动机
2 方法
3 实验
4 贡献点
——————————————————————————————————————————————————————————————————————————————

1 动机

·Image saliency vs. video saliency
图像显著性和视频显著性的不同,图(a)是图像显著性热度图,图(b)是视频中一帧的显著性热度图,可以看出视频显著性更紧密,更集中在单个对象上,而图像显著性涵盖几个有趣的位置。
 
·Predicting where people look in video is relevant in many applications.
预测人眼注视视频中的位置在很多应用中息息相关。比如广告视频需要牢牢抓住受众眼球;在视频剪辑种,可以更平滑地处理镜头转换;还比如视频压缩和关键帧选择等。
·Most previous saliency modeling methods calculate a saliency value for every pixel.
过去大多数显著性建模的方法是对每一个像素的显著性值进行计算,本文选择一部分进行计算,降低计算量。


2 方法

Our work differs from previous video saliency methods by narrowing the focus to a small number of candidate gaze locations, and learning conditional gaze transitions over time.
我们工作和过去视频显著性方法不同点在于,通过将焦点缩小到一小部分人眼注视的位置上,并且随着时间学习有条件的目光改变。

2.1 Candidate selection 候选集选择

考虑了三种类型的候选集:
a.Static candidates indicate the locations that capture attention due to local contrast or uniqueness, irrespective of motion.
静态候选集根据局部对比度和特性进行选择。使用经典的GBVS方法。【the graph-based visual saliency (GBVS)】
 


b.Motion candidates refelect the areas that are attractive due to the motion between frames.
运动候选集反映了由于帧间运动吸引眼球的区域。方法:计算连续两帧的光流,并应用Difference-of-Gaussians (DoG)高斯差分滤波处理光流。
 


c.semantic candidates are those that arise from higher-level human visual processing.
语义候选集从更高级别的人类视觉处理中产生。首先根据人类常常观看视频中心点,在帧的中心创建一个恒定大小的中心候选区域。
small detections : create a single candidate at their center.
large detections : create several candidates
four for body detections (head, shoulders and torso)
three for faces (eyes and nose with mouth)
【下图为center_red,face_green,body_blue】
 

2.2 Modeling gaze dynamics 目光运动的建模

找到候选集之后,需要从中选择最显著的一个,通过学习转换概率transition probability来实现,其中transition probability是指从源帧中的一个注视位置转换到目的帧中的新注视位置的概率。需要对目光运动进行建模。
a.Features 特征
特征分类为目的帧特征和帧间特征。本文还试验了源帧特征的使用,但是发现这些特征导致学习过程中的过拟合,因为它们与目的帧特征只有一点点不同。
根据候选集分别计算static ,motion and semantic features。
b.Gaze transitions for training 训练目光运动
把学习问题看成分类问题,是否一个目光转变是从给定帧候选集到目标候选集。
需要(i)选择相关的帧对,eg.Scene cut 以及(ii)标记这些帧之间的正和负目光转变。
 
c.Learning transition probability 学习转换概率
首先计算每个特征的均值及其跨训练集的标准差。归一化特征。接着训练一个标准的随机森林分类器使用归一化的特征和他们的标签。
得到的最终概率是:Sal(si)是源候选显著性,S是所有源的集合。
 
最后,用对应协方差的高斯替换每个候选集,并使用候选集显著性作为权重对它们求和。

3 实验

Dataset :
DIEM (Dynamic Images and Eye Movements)dataset
CRCNS dataset
 


4 贡献点

·The method is substantially different from existing methods and uses a sparse candidate set to model the saliency map.
使用了稀疏候选集去对显著图建模
·using candidates boosts the accuracy of the saliency prediction and speeds up the algorithm.
使用候选集提高了显著性预测的准确度并加快了算法速度
·the proposed method accounts for the temporal dimension of the video by learning the probability to shift between saliency locations.
考虑到视频的时间维度,学习显着性位置移动的概率。






 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Gatys et al. (2016) proposed an algorithm for style transfer, which can generate an image that combines the content of one image and the style of another image. The algorithm is based on the neural style transfer technique, which uses a pre-trained convolutional neural network (CNN) to extract the content and style features from the input images. In this algorithm, the content and style features are extracted from the content and style images respectively using the VGG-19 network. The content features are extracted from the output of one of the convolutional layers in the network, while the style features are extracted from the correlations between the feature maps of different layers. The Gram matrix is used to measure these correlations. The optimization process involves minimizing a loss function that consists of three components: the content loss, the style loss, and the total variation loss. The content loss measures the difference between the content features of the generated image and the content image. The style loss measures the difference between the style features of the generated image and the style image. The total variation loss is used to smooth the image and reduce noise. The optimization is performed using gradient descent, where the gradient of the loss function with respect to the generated image is computed and used to update the image. The process is repeated until the loss function converges. The code for this algorithm is available online, and it is implemented using the TensorFlow library. It involves loading the pre-trained VGG-19 network, extracting the content and style features, computing the loss function, and optimizing the generated image using gradient descent. The code also includes various parameters that can be adjusted, such as the weight of the content and style loss, the number of iterations, and the learning rate.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值