VTN泛读【Video Transformer Network】

目录

0、前沿

1、标题

2、摘要

3、结论

4、重要图表

5、解决了什么问题

6、采用了什么方法

7、达到了什么效果


0、前沿

        泛读我们主要读文章标题,摘要、结论和图表数据四个部分。需要回答用什么方法,解决什么问题,达到什么效果这三个问题。 需要了解更多视频理解相关文章可以关注视频理解系列目录了解当前更新情况。

        Video Transformer Network论文下载: https://arxiv.org/pdf/2102.00719.pdf

1、标题

         Video Transformer Network

        视频Transformer

2、摘要

        This paper presents VTN, a transformer-based framework for video recognition. Inspired by recent developments in vision transformers, we ditch the standard approach in video action recognition that relies on 3D ConvNets and introduce a method that classifies actions by attending to the entire video sequence information. Our approach is generic and builds on top of any given 2D spatial network. In terms of wall runtime, it trains 16.1× faster and runs 5.1× faster during inference while maintaining competitive accuracy compared to other state-of-the-art methods. It enables whole video analysis, via a single end-to-end pass, while requiring 1.5× fewer GFLOPs. We report competitive results on Kinetics-400 and Moments in Time benchmarks and present an ablation study of VTN properties and the trade-off between accuracy and inference speed. We hope our approach will serve as a new baseline and start a fresh line of research in the video recognition domain. Code and models are available at: https://github.com/bomri/SlowFast/blob/master/projects/vt

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值