论文笔记之视频:Slow and steady feature analysishigher order temporal coherence in video

Abstract

Capture how the visual content changes
Generalize slow feature analysis to “steady” feature analysis
Key idea
Impose a prior that higher order derivatives in the learned feature space must be small.
Method
Train a convolutional neural network with a regularizer on tuples of sequential frames from unlabeled video.

1.Introduction

predictable
Key idea
Impose higher order temporal constraints on the learned visual representation
Develop a regularizer that uses contrastive loss over tuples of frames to achieve such mappings with CNNS( z ( b ) − z ( a ) ≈ z ( c ) − z ( b ) z(b)-z(a) ≈ z(c) - z(b) z(b)z(a)z(c)z(b))

2.Related Work

Target

  • Previous: Build a robust object recognition system

the image representation must incorporate some degree of invariance to changes. Invariance can be manully crafted, sucn as with spatial pooling operations or gradient descriptors.

The authors’ method: pad pad the training data by systematically perturbing raw images with label-preserving transformations

results: jittered versions originating from the same content all map close by in the learned feature space.
idea: Unlabeled video is an appealing resource for recovering invariance.

  • Authors’: Learn features from unlabeled video

idea: Preserve higher order feature steadiness

  • Authors’: Exploit transformation patterns in unlabeled video to learn features that are useful for recognition.

idea: Explicitly predict the transformation parameters

3.Approach

Tuned the hidden layers of the network not only with the backpropagation gradients from a classification loss, but also with gradients computed from the unlabeled video that exploit its temporal steadiness.

3.1 Notation and framework overview

在这里插入图片描述

4.Experimental set

4.1.Experimental setup

  • dataset
  • Baseline
  • 1.Unreg: An unregularized network trained only on the supervised training samples S
  • 2.SFA-1: Distance l2 to calculate R2
  • 3.SFA-2: Distance l2 to calculate R2

Siamese network configuration
在这里插入图片描述

  • Network architectures
  • 1.NORB -> NORB task: input → 25 hidden units → ReLU nonlinearity → D=25 features
  • 2.The other two tasks: (1)Resize images to 32 × 32;(2)standard CNN architectures known to work well with tiny images, producing D=64-dimensional features
  • Optimize Nesterov-accelerated stochastic gradient descent, validatin classification

4.2.Quantifying steadiness

    1. At test time, given the first two images of each triplet, the task is to predict what the third looks like
      Method: Indentify an image closest to zθ(x3)
    1. Take a large poo C of candidate images, map them all to their features via
    1. Rank in incresing order of distance from zθ(x3)

4.3.Recognition results

  • Unlabeled video as a prior for supervised recognition
  • Pairing unsupervised and supervised datasets
  • Unlabeled video as a prior for supervised recognition
  • Varying unsupervised training set size
  • Comparison to supervised pretraining and finetuning
  • how the proposed unsupervised feature learning idea competes with the popular supervised pretraining model
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值