Two-Stream SR-CNNs for Action Recognition in Videos

paper:http://www.bmva.org/bmvc/2016/papers/paper108/index.html
code:https://github.com/yifita/action.sr_cnn
三作主页:http://wanglimin.github.io/

Two-Stream SR-CNNs for Action Recognition in Videos

dataset : UCF101 JHMDB(split 1)
accuracy: 92.6 53.77

framework

输入仍然是双流,但是将RGB和flow都经过了faster-rcnn,得到不同的区域分为了场景、人、物体三类,分别输入网络进行训练。
这里写图片描述

The inputs are first passed through standard convolutional and pooling layers.We replace the last pooling layer with a RoiPooling [2] layer, which separate features for different semantic cues into parallel fully connected layers (called channels) using bounding boxes proposed from a Faster R-CNN [18] object detector (see subsection 3.2).

每个channel都会得到独立的分数,由于有多个物体,作者采用了MIL((Multiple Instance Learning)来结合最有用的信息。最后所有的score都通过一个fusion layer,得到最终的预测结果。

Fusion

fusion的策略,作者提出了4个:

  • Max fusing takes the maximum score value among all channels for each class, essentially picking the strongest channel.
  • Sum fusion directly adds up the scores from different channels, i.e. each channel is treated equal.
  • Category-wise weighted fusion (Weighted-1) combines channel scores via weighted sum, aiming to represent varied relative contribution of each channel for different classes using their corresponding weights.
  • As for correlationwise weighted fusion (Weighted-2)
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值