[深度学习论文笔记][Video Classification] Large-scale Video Classification with Convolutional Neural Networks

本文探讨了将视频视为固定大小的短片段并利用时空卷积神经网络来学习特征的方法。通过多分辨率CNN,结合上下文和中心区域的信息,减少输入维度并提高性能。结果显示,即使单帧模型也能表现出强大力量,表明局部运动线索可能并非至关重要。
摘要由CSDN通过智能技术生成
Karpathy, Andrej, et al. “Large-scale video classification with convolutional neural networks.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014. (Citations: 654).


1 Spatio-Temporal CNN

We treat every video as a bag of short, fixed-sized clips (15 frames in our case). Since each clip contains several contiguous frames in time, we can extend the connectivity of the network in time dimension to learn spatio-temporal features. There are four fuse information across temporal domain. See Fig.


[Single-frame] Process each single frame independently. 
Late Fustion] Place two separate single-frame neworks with shared parameters a distance of 15 frames apart, and then merges the two streams in the first fully co
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值