《SlowFast Networks for Video Recognition》2019

sdusgq

于 2022-01-08 10:05:46 发布

阅读量225

点赞数

分类专栏：论文阅读文章标签：计算机视觉人工智能深度学习 pytorch

本文链接：https://blog.csdn.net/sdusgq/article/details/122368511

版权

论文阅读专栏收录该内容

4 篇文章 0 订阅

订阅专栏

这里写目录标题

一、Abstract
二、Introduction
三、SlowFast Networks
四、Code
- 4.1 readme.md
- 4.2 INSTALL.md
五、Personal Feeling

一、Abstract

双路网络，一路慢帧处理获取空间信息，一路快帧获取时间域上的运动信息。快帧支路可以通过减少通道数学习有用的时间域信息。
作者说该model可以很好的用于action classification，对我的研究或许有帮助。

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA.

二、Introduction

第一段作者阐述了慢运动的意义，我不太看得懂。

Para.1:
It is customary in the recognition of images I(x, y) to treat the two spatial dimensions x and y symmetrically. This is justified by the statistics of natural images, which are to a first approximation isotropic—all orientations are equally likely—and shift-invariant [41, 26]. But what about video signals I(x, y, t)? Motion is the spatiotemporal counterpart of orientation [2], but all spatiotemporal orientations are not equally likely. Slow motions are more likely than fast motions (indeed most of the world we see is at rest at a given moment) and this has been exploited in Bayesian accounts of how humans perceive motion stimuli [58]. For example, if we see a moving edge in isolation, we perceive it as moving perpendicular to itself, even though in principle it could also have an arbitrary component of movement tangential to itself (the aperture problem in optical flow). This percept is rational if the prior favors slow movements.

第三段，相当于对abstract进行了一个更详细的介绍。

Para.3.
Based on this intuition, we present a two-pathway SlowFast model for video recognition (Fig. 1). One pathway is designed to capture semantic information that can be given by images or a few sparse frames, and it operates at low frame rates and slow refreshing speed. In contrast, the other pathway is responsible for capturing rapidly changing motion, by operating at fast refreshing speed and high temporal resolution. Despite its high temporal rate, this pathway is made very lightweight, e.g., 20% of total computation. This is because this pathway is designed to have fewer channels and weaker ability to process spatial information, while such information can be provided by the first pathway in a less redundant manner. We call the first a Slow pathway and the second a Fast pathway, driven by their different temporal speeds. The two pathways are fused by lateral connections.

Para.4.
Our conceptual idea leads to flexible and effective designs for video models. The Fast pathway, due to its lightweight nature, does not need to perform any temporal pooling—it can operate on high frame rates for all intermediate layers and maintain temporal fidelity. Meanwhile, thanks to the lower temporal rate, the Slow pathway can be more focused on the spatial domain and semantics. By treating the raw video at different temporal rates, our method allows the two pathways to have their own expertise on video modeling.

第五段作者表示和[44]工作的不同之处。
（1）没有采用不同的时域速度；
（2）双流是相同的网络架构，本篇工作是不同的网络架构；
（3）[44]工作用光流，本篇工作用原始数据；
笔者认为[44]对笔者的研究工作有参考价值。

Para.5.
There is another well known architecture for video recognition which has a two-stream design [44], but provides conceptually different perspectives. The Two-Stream method [44] has not explored the potential of different temporal speeds, a key concept in our method. The two-stream method adopts the same backbone structure to both streams, whereas our Fast pathway is more lightweight. Our method does not compute optical flow, and therefore, our models are learned end-to-end from the raw data. In our experiments we observe that the SlowFast network is empirically more effective.

三、SlowFast Networks

四、Code

4.1 readme.md

请在 INSTALL.md 中找到 PyTorch 和 PySlowFast 的安装说明。您可以按照 DATASET.md 中的说明准备数据集。

Please find installation instructions for PyTorch and PySlowFast in INSTALL.md. You may follow the instructions in DATASET.md to prepare the datasets.

按照 GETTING_STARTED.md 中的示例开始使用 PySlowFast 播放视频模型。

Follow the example in GETTING_STARTED.md to start playing video models with PySlowFast.

我们为训练/评估/测试过程、模型分析和使用训练模型运行推理提供了一系列可视化工具。更多信息见可视化工具。

We offer a range of visualization tools for the train/eval/test processes, model analysis, and for running inference with trained model. More information at Visualization Tools.

4.2 INSTALL.md

报错：ERROR: Command errored out with exit status 128: git clone -q
https://github.com/facebookresearch/fvcore /tmp/pip-req-build-e4td7gkk
Check the logs for full command output. 解决：pip install fvcore

五、Personal Feeling

论文结构不难懂；
想要学习其代码需要花大量的时间安装模块、准备数据库、尝试阅读搞懂模块化的代码；
而且并不确定其对笔者的工作有多少帮助，性价比较低，暂时先放一放。

sdusgq

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《SlowFast Networks for Video Recognition》2019

这里写目录标题一、Abstract二、Introduction三、SlowFast Networks四、Code4.1 readme.md4.2 INSTALL.md一、Abstract双路网络，一路慢帧处理获取空间信息，一路快帧获取时间域上的运动信息。快帧支路可以通过减少通道数学习有用的时间域信息。作者说该model可以很好的用于action classification，对我的研究或许有帮助。We present SlowFast networks for video recognition
复制链接

扫一扫