论文阅读：Unsupervised Representation Learning by Sorting Sequences

最新推荐文章于 2023-02-27 23:18:08 发布

小吴同学真棒

最新推荐文章于 2023-02-27 23:18:08 发布

阅读量704

点赞数

分类专栏：学习人工智能文章标签：自监督学习视频动作识别 ICCV2017 无监督学习排序

本文链接：https://blog.csdn.net/qq_36627158/article/details/117035633

版权

学习同时被 2 个专栏收录

115 篇文章 7 订阅

订阅专栏

人工智能

72 篇文章 5 订阅

订阅专栏

Summary

本篇文章的上游任务是：正确识别（给出）视频里 4 帧打乱视频帧的正确顺序。

所用到的 Tricks：（文章中对这些 tricks 中的每一个都有大量消融实验证明是有效的）

Data sampling strategies. (a) We use a sliding windows approach on the optical flow fields to extract patches tuple with large motion magnitude.
apply spatial jittering and channel splitting on selected patches to guide the network to focus on the semantics of the images rather than fixating on low-level features.
The proposed Order Prediction Network（OPN） consists of three main components: (1) feature extraction, (2) pairwise feature extraction, and (3) order prediction. Features for each frame ( fc6) are encoded by convolutional layers. The pairwise feature extraction stage then extracts features from every pair of frames. We then have a final layer that takes these extracted features to predict order.

下游任务的：action recognition（UCF101）, image classification（VOC）, and object detection tasks（VOC）

backbone：CaffeNet [16], a slight modification of AlexNet

Details

1、Task

Specifically, we use up to four randomly shuffled frames sampled from a video as our input.

Similar to the jigsaw puzzle problem in the spatial domain [27], we formulate the sequence sorting problem as a multi-class classification task.

For each tuple of four frames, there are 4! = 24 possible permutations.

However, as some actions are both coherent forward and backward (e.g., opening/closing a door), we group both forward and backward permutations into the same class (e.g., 24/2 classes for four frames).

2、The proposed Order Prediction Network（OPN）

3、Data sampling strategies

4、Ablation analysis

Trick 1

Data sampling strategies. (a) We use a sliding windows approach on the optical flow fields to extract patches tuple with large motion magnitude.

Trick 2

apply spatial jittering and channel splitting on selected patches to guide the network to focus on the semantics of the images rather than fixating on low-level features.

Trick 3

The proposed Order Prediction Network（OPN） consists of three main components: (1) feature extraction, (2) pairwise feature extraction, and (3) order prediction. Features for each frame ( fc6) are encoded by convolutional layers. The pairwise feature extraction stage then extracts features from every pair of frames. We then have a final layer that takes these extracted features to predict order.