视频理解论文精读系列目录【更新中】

weixin_47341656

已于 2022-04-21 22:04:21 修改

阅读量1.8k

点赞数 1

分类专栏：论文阅读笔记文章标签：计算机视觉目标检测深度学习视觉检测神经网络

于 2022-04-12 10:21:48 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_47341656/article/details/124117561

版权

论文阅读笔记专栏收录该内容

10 篇文章 3 订阅

订阅专栏

目录

0、Introduction

1、ConvNet+LSTM

2、Two-Stream Convolutional Networks

2.1 双流网络泛读

3、3D ConvNets

4、Temporal Segment Networks

4.1TSN泛读：

5、Two-Stream Inflated 3D ConvNets

6、Temporal Shift Module

7、SlowFast Networks

7.1 SlowFast泛读

8、VTN（Video Transformer Network）

9、ViViT: A Video Vision Transformer

10、TimeSformer

10.1 TimeSformer泛读：

0、Introduction

本文主要介绍基于深度学习的视频理解模型，传统手工特征模型会在涉及到时补充，手工特征方法一般出现在2014年之前的文章里（手工特征对深入学习这个领域很重要）。本系列主要介绍视频理解影响力较大的深度学习网络模型，每个模型的论文分为泛读、精读、总结和验证四个部分。有一篇2020年的综述文章可做参考（ A Comprehensive Study of Deep Video Action Recognition-2020年12月），2021之后主要是将transformer引入到网络中，视频transformer可参考综述文章（Video Transformers: A Survey-2022年1月）。

A Comprehensive Study of Deep Video Action Recognition论文下载：

https://arxiv.org/pdf/2012.06567.pdfhttps://arxiv.org/pdf/2012.06567.pdf

Video Transformers: A Survey论文下载：

https://arxiv.org/pdf/2201.05991.pdfhttps://arxiv.org/pdf/2201.05991.pdf

1、ConvNet+LSTM

ConvNet+LSTM论文下载：

Long-term Recurrent Convolutional Networks for Visual Recognition and Descriptionhttps://openaccess.thecvf.com/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf

待续...

2、Two-Stream Convolutional Networks

双流网络论文下载：Two-Stream Convolutional Networks for Action Recognition in Videoshttps://proceedings.neurips.cc/paper/2014/file/00ec53c4682d36f5c4359f4ae7bd7ba1-Paper.pdf

2.1 双流网络泛读

https://blog.csdn.net/weixin_47341656/article/details/124117023https://blog.csdn.net/weixin_47341656/article/details/124117023

待续...

3、3D ConvNets

C3D论文下载：https://openaccess.thecvf.com/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdfhttps://openaccess.thecvf.com/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdf

3.1 C3D泛读

https://blog.csdn.net/weixin_47341656/article/details/124152947https://blog.csdn.net/weixin_47341656/article/details/124152947待续...

4、Temporal Segment Networks

TSN论文下载：Temporal Segment Networks: Towards Good Practices for Deep Action Recognitionhttps://arxiv.org/pdf/1608.00859.pdf

4.1TSN泛读：

TSN泛读【Temporal Segment Networks: Towards GoodPractices for Deep Action Recognition】_weixin_47341656的博客-CSDN博客TSN泛读【Temporal Segment Networks: Towards GoodPractices for Deep Action Recognition】https://blog.csdn.net/weixin_47341656/article/details/124278722

5、Two-Stream Inflated 3D ConvNets

I3D论文下载：Quo Vadis, Action Recognition? A New Model and the Kinetics Datasethttps://openaccess.thecvf.com/content_cvpr_2017/papers/Carreira_Quo_Vadis_Action_CVPR_2017_paper.pdf

5.1 I3D泛读

https://blog.csdn.net/weixin_47341656/article/details/124152968https://blog.csdn.net/weixin_47341656/article/details/124152968

待续...

6、Temporal Shift Module

TSM论文下载：

TSM: Temporal Shift Module for Efficient Video Understandinghttps://openaccess.thecvf.com/content_ICCV_2019/papers/Lin_TSM_Temporal_Shift_Module_for_Efficient_Video_Understanding_ICCV_2019_paper.pdf

6.1TSM泛读

TSM泛读【TSM: Temporal Shift Module for Efficient Video Understanding】_weixin_47341656的博客-CSDN博客TSM泛读【TSM: Temporal Shift Module for Efficient Video Understanding】https://blog.csdn.net/weixin_47341656/article/details/124279298

7、SlowFast Networks

SlowFast论文下载：

SlowFast Networks for Video Recognitionhttps://openaccess.thecvf.com/content_ICCV_2019/papers/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.pdf

7.1 SlowFast泛读

SlowFast泛读【SlowFast Networks for Video Recognition】_weixin_47341656的博客-CSDN博客SlowFast泛读【SlowFast Networks for Video Recognition】https://blog.csdn.net/weixin_47341656/article/details/124286797

8、VTN（Video Transformer Network）

VTN（Video Transformer Network）论文下载：

https://arxiv.org/pdf/2102.00719.pdfhttps://arxiv.org/pdf/2102.00719.pdf

9、ViViT: A Video Vision Transformer

ViViT论文原文下载：

https://arxiv.org/pdf/2103.15691.pdfhttps://arxiv.org/pdf/2103.15691.pdf

10、TimeSformer

TimeSFormer论文下载：Is Space-Time Attention All You Need for Video Understanding?https://arxiv.org/pdf/2102.05095.pdf

10.1 TimeSformer泛读：

TimeSformer泛读【Is Space-Time Attention All You Need for Video Understanding?】_weixin_47341656的博客-CSDN博客泛读我们主要读文章标题，摘要、结论和图表数据四个部分。需要回答用什么方法，解决什么问题，达到什么效果这三个问题。需要了解更多视频理解相关文章可以关注视频理解系列目录了解当前更新情况。https://blog.csdn.net/weixin_47341656/article/details/124253776

待续...

Multiscale Vision Transformers
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Multiview Transformers for Video Recognition

专题待续。。。

weixin_47341656

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
视频理解论文精读系列目录【更新中】

视频理解论文精读系列目录 0、Introduction1、ConvNet+LSTM2、3D ConvNets3、Two-Stream Convolutional Networks3.1 泛读 3.2 精读 3.3 总结 3.4 验证4、Two-Stream Inflated 3D ConvNets5、Temporal Segment Networks6、SlowFast Networks7、Temporal Shift Mod...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。