1. ECO: Efficient Convolutional Network for Online Video Understanding, European Conference on Computer Vision (ECCV), 2018."
By Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox
开源代码:https://github.com/mzolfaghari/ECO-efficient-video-understanding
——对视频均匀采样得到N帧图像,对这些图像使用共享的2D CNN网络获得一个2D feature map,再堆叠这些feature map,用一个3D CNN网络得到最后的分类结果。
知乎解读文章: https://zhuanlan.zhihu.com/p/36795554
2. Describing Videos by Exploiting Temporal Structure. ICCV 2015.
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville
论文:https://arxiv.org/pdf/1502.08029v4.pdf
开源代码:https://github.com/yaoli/arctic-capgen-vid
CSDN代码实现过程:https://blog.csdn.net/sl_950313/article/details/79144153
基于soft attention,在是时序上做attention
3.Video Description with Spatial-Temporal Attention.
Tu, Yunbin, et al.
Video Caption Tutorial on CSDN:https://blog.csdn.net/u013010889/article/details/80087601
开源代码:https://github.com/tuyunbin/Video-Description-with-Spatial-Temporal-Attention#contact
4.Non-local Neural Networks
Xiaolong Wang1,2∗ Ross Girshick2 Abhinav Gupta1 Kaiming He2
论文:https://arxiv.org/pdf/1711.07971.pdf
开源代码:thttps://github.com/facebookresearch/video-nonlocal-net.