READING NOTE: Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

最新推荐文章于 2024-04-18 09:46:21 发布

Joshua_Li_

最新推荐文章于 2024-04-18 09:46:21 发布

阅读量1.3k

点赞数

分类专栏：计算机视觉

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/joshua_1988/article/details/49849531

版权

计算机视觉专栏收录该内容

72 篇文章 0 订阅

订阅专栏

TITLE: Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

AUTHOR: Zhao, Shichao and Liu, Yanbin and Han, Yahong and Hong, Richang

FROM: arXiv:1511.02126

CONTRIBUTIONS

Propose an efficient video representation framework basing on VGGNet and Two-Stream ConcNets.
Trajectory pooling and line pooling are used together to extract features from convolutional layers.
A frame-diff layer is used to get local descriptors.

METHOD

Two succession frames are sent to a siamese VGGNet and a frame-diff layer is used to extract spatial features.
Compute temporal feature in one frame using optical-flow net of Two-Stream ConvNet.
Extract features in ConvNet feature maps along point trajectories or along lines in a dense sampling manner.
Use BoF method to generate video representation
Classify video using a SVM classifier.

ADVANTAGES

Using deeper network to extract features, which are more discriminative.
Different from Two-Stream ConvNet, in this work spatial features are extracted on every frame, which would provide more information.

DISADVANTAGES

The two branches are trained independently. Jointly training in a multi-task manner may benefit.

OTHERS

The difficulty of human action recognition is caused by some inherent characteristics of action videos such as intra-class variation, occlusions, view point changes, background noises, motion speed and actor differences.
Despite the good performance, Dense Trajectory based action recognition algorithms suffer from huge computation costs and large disk affords.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
READING NOTE: Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Pooling the Convolutional Layers in Deep ConvNets for Action Recognition
复制链接

扫一扫

专栏目录

Joshua_Li_ CSDN认证博客专家 CSDN认证企业博客

码龄17年

83: 原创

9万+: 周排名

112万+: 总排名

17万+: 访问

: 等级

2628: 积分

82: 粉丝

30: 获赞

131: 评论

34: 收藏

私信

关注

热门文章

分类专栏

计算机视觉 72篇
DL 42篇
代码 1篇
GPU-CUDA
gpu 1篇
cuda 1篇
caffe 3篇
ubuntu 2篇
django 1篇
vlc
gstreamer 1篇
tracking 1篇
GAN 2篇
物体检测 5篇
CNN 2篇
LSTM 1篇
Face Detection 1篇
人脸检测 2篇

最新评论

READING NOTE: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
水野与小太郎: 你再好好看看？
My Jumble of Computer Vision
Tisfy: Nice!,古人云：宁为百夫长，胜作一书生。
训练MTCNN
chy_hahahahahaha 回复 Joshua_Li_: 好的，谢谢楼主！！！
训练MTCNN
Joshua_Li_ 回复 chy_hahahahahaha: 这个最好能够实际测一下，如果batch_size为64，迭代2000次，那就是使用了12.8万个样本，不知道这个是否已经达到一个epoch了。在验证集上如果精度能够到92%左右，应该就算是比较成功的完成一次训练了。两个loss从数值上看都还可以
训练MTCNN
chy_hahahahahaha 回复 Joshua_Li_: 分类的loss，然后回归的loss感觉变化不大，迭代50w次之后回归的loss为0.01左右，请问这训练的是不是有问题？谢谢楼主

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。