[深度学习论文笔记][Video Classification] Large-scale Video Classification with Convolutional Neural Networks

最新推荐文章于 2024-08-15 17:20:12 发布

Hao_Zhang_Vision

最新推荐文章于 2024-08-15 17:20:12 发布

阅读量2.5k

点赞数

分类专栏： CNN Papers 文章标签： Video Classification CNN Computer Vision Deep Learning Papers

本文链接：https://blog.csdn.net/hao_zhang_vision/article/details/53183780

版权

本文探讨了将视频视为固定大小的短片段并利用时空卷积神经网络来学习特征的方法。通过多分辨率CNN，结合上下文和中心区域的信息，减少输入维度并提高性能。结果显示，即使单帧模型也能表现出强大力量，表明局部运动线索可能并非至关重要。

摘要由CSDN通过智能技术生成

Karpathy, Andrej, et al. “Large-scale video classification with convolutional neural networks.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014. (Citations: 654).

1 Spatio-Temporal CNN

We treat every video as a bag of short, fixed-sized clips (15 frames in our case). Since each clip contains several contiguous frames in time, we can extend the connectivity of the network in time dimension to learn spatio-temporal features. There are four fuse information across temporal domain. See Fig.

[Single-frame] Process each single frame independently.
Late Fustion] Place two separate single-frame neworks with shared parameters a distance of 15 frames apart, and then merges the two streams in the first fully co