视频中的自监督学习：Predicting Motion and Appearance Statistics

Republix

于 2021-04-12 23:21:29 发布

阅读量1.3k

点赞数 2

分类专栏：视频分类/动作识别

本文链接：https://blog.csdn.net/weixin_42443072/article/details/115645112

版权

本文介绍了一篇2019年CVPR论文，研究如何通过预测视频中的运动和外观统计来实现时空表示学习。作者受人类视觉系统启发，设计了预训练任务，包括提取光流、计算动作位置和方向、颜色变化等。网络分为运动分支和外观分支，分别预测最大动作位置、方向和颜色变化。训练策略采用MSE损失函数，标签包括运动和外观的多个维度信息。

摘要由CSDN通过智能技术生成

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

2019年的CVPR文章，作者所设计的pretext task是由运动、颜色衍生出来的统计量（具体来说是最大的动作位置以及方向，颜色改变最大 / 最小的位置以及颜色的值），在文章的Introduction中作者提到了动作的表示在人的视觉系统中是基于一系列learned patterns，文章的思路跟这息息相关。

The idea is inspired by Giese and Poggio’s work on human visual system [14], in which the representation of motion is found to be based on a set of learned patterns.

These patterns are encoded as sequences of snapshots of body shapes by neurons in the form pathway, and by sequences of complex optic flow patterns in the motion pathway.