论文阅读:Unsupervised Representation Learning by Sorting Sequences

目录

Summary

Details

1、Task

2、The proposed Order Prediction Network(OPN)

3、Data sampling strategies

4、Ablation analysis

Trick 1

Trick 2

Trick 3

想法 & 思考

 


 

论文名称:Unsupervised Representation Learning by Sorting Sequences

下载地址https://openaccess.thecvf.com/content_ICCV_2017/papers/Lee_Unsupervised_Representation_Learning_ICCV_2017_paper.pdf

 


 

Summary

  • 本篇文章的上游任务是:正确识别(给出)视频 4 帧打乱视频帧的正确顺序
  • 所用到的 Tricks:(文章中对这些 tricks 中的每一个都有大量消融实验证明是有效的)
  1. Data sampling strategies. (a) We use a sliding windows approach on the optical flow fields to extract patches tuple with large motion magnitude.
  2. apply spatial jittering and channel splitting on selected patches to guide the network to focus on the semantics of the images rather than fixating on low-level features.
  3. The proposed Order Prediction Network(OPN) consists of three main components: (1) feature extraction, (2) pairwise feature extraction, and (3) order prediction. Features for each frame ( fc6) are encoded by convolutional layers. The pairwise feature extraction stage then extracts features from every pair of frames. We then have a final layer that takes these extracted features to predict order.
  • 下游任务的:action recognition(UCF101), image classification(VOC), and object detection tasks(VOC)
  • backbone:CaffeNet [16], a slight modification of AlexNet

 


 

Details

1、Task

Specifically, we use up to four randomly shuffled frames sampled from a video as our input.

Similar to the jigsaw puzzle problem in the spatial domain [27], we formulate the sequence sorting problem as a multi-class classification task.

For each tuple of four frames, there are 4! = 24 possible permutations.

However, as some actions are both coherent forward and backward (e.g., opening/closing a door), we group both forward and backward permutations into the same class (e.g., 24/2 classes for four frames). 

 

 

2、The proposed Order Prediction Network(OPN)

 

 

3、Data sampling strategies

 

 

4、Ablation analysis

Trick 1

Data sampling strategies. (a) We use a sliding windows approach on the optical flow fields to extract patches tuple with large motion magnitude.

 

Trick 2

apply spatial jittering and channel splitting on selected patches to guide the network to focus on the semantics of the images rather than fixating on low-level features.

 

Trick 3

The proposed Order Prediction Network(OPN) consists of three main components: (1) feature extraction, (2) pairwise feature extraction, and (3) order prediction. Features for each frame ( fc6) are encoded by convolutional layers. The pairwise feature extraction stage then extracts features from every pair of frames. We then have a final layer that takes these extracted features to predict order.

 

 


 

想法 & 思考

原本我以为 temporal order recognition 的单位是 video clips,所以 backbone 应该是 C3D 之类的;万万没想到是 temporal order verification 的单位是 frames,所以 backbone 是 CNN

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值