视频动作识别--Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition CVPR2016

http://www.robots.ox.ac.uk/~vgg/software/two_stream_action/
https://github.com/feichtenhofer/twostreamfusion

对视频动作识别 采用 two steam CNN 分开处理 时空信息,这里我们主要探讨这怎么在 CNN中更好的融合时空信息。
我们的发现有以下三点:
(i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters;
在卷积层融合时空网络不会导致性能下降,但是可以减少网络参数

(ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class prediction layer can boost accuracy;
在网络的后卷积层空间融合比浅层要好,在类别预测层融合会增加性能

(iii) that pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance.
在时空邻域加入池化可以增加性能

这里写图片描述

这里写图片描述

针对CNN网络为什么没有在 视频动作识别中取得很好的结果,我们认为的原因是:1)训练数据可能太少了,2)时间信息利用的不够
current ConvNet architectures are not able to take full advantage of temporal information and their performance is consequently often dominated by spatial (appearance) recognition

至少以前的 two-stream architecture 不能很好的解决下面的问题:
1)recognizing what is moving where, i.e. registering appearance recognition (spatial cue) with optical flow recognition (temporal cue) 时空信息的对应
2)how these cues evolve over time. 信息是如何变化

3 Approach
以前的 two-stream architecture 不能很好的融合时空信息,没有时空对应关系
3.1. Spatial fusion 空间融合
这里介绍了好几种融合:Sum fusion,Max fusion,Concatenation fusion,Conv fusion,Bilinear fusion
这里写图片描述

3.2. Where to fuse the networks
这里的选择也是比较多的
这里写图片描述

3.3. Temporal fusion
这里写图片描述

3.4. Proposed architecture
这里写图片描述

We fuse the two networks, at the last convolutional layer (after ReLU) into the spatial stream to convert it into a spatiotemporal stream by using 3D Conv fusion followed by 3D pooling (see Fig. 4, left). Moreover, we do not truncate the temporal stream and also perform 3D Pooling in the temporal network (see Fig. 4, right). The losses of both streams are used for training and during testing we average the predictions of the two streams

这里写图片描述

有没有感觉搞复杂了啊!

  • 0
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值