video transformer
文章平均质量分 87
Cherry_qy
这个作者很懒,什么都没留下…
展开
-
[Video Transformer] Video Swin Transformer
代码: GitHub - SwinTransformer/Video-Swin-Transformer: This is an official implementation for "Video Swin Transformers".论文: https://arxiv.org/pdf/2106.13230.pdf代码解读: Swin-Transformer代码讲解-Video Swin-Transformer_ly59782的博客-CSDN博客Swin Transformerht...原创 2022-01-18 20:13:23 · 8023 阅读 · 4 评论 -
[Video Transformer] VTN: Video Transformer Network
https://arxiv.org/abs/2102.00719SlowFast/README.md at master · bomri/SlowFast · GitHubICCV2021Video action recognition总结:相当于把CNN+LSTM结构中的LSTM替换为VTN适用于处理长视频,在inference时可以一次输入整个视频模型框架是模块化的,2D backbone可以换成不同的网络,注意力模块也可以设置为不同的transformer模型...原创 2022-01-18 20:05:49 · 1269 阅读 · 0 评论 -
[Video Transformer] TimeSformer: Is Space-Time Attention All You Need for Video Understanding?
论文:https://arxiv.org/pdf/2102.05095.pdf代码:https://github.com/lucidrains/TimeSformer-pytorch参考博客:https://mp.weixin.qq.com/s/E43AaQEcr2_Nm4FqcXXM7gaccept: ICML2021author: Facebook AIInput clips:H*W*3*F从原视频中取出的F帧RGB视频帧,size H*W。Decomposi...原创 2022-01-18 18:25:05 · 1553 阅读 · 0 评论 -
[Video Transformer] ViViT: A Video Vision Transformer
CVPR2021论文:https://arxiv.org/abs/2103.15691代码:scenic/scenic/projects/vivit at main · google-research/scenic · GitHub参考博客:初探Video Transformer(二):谷歌开源更全面、高效的无卷积视频分类模型ViViT (qq.com)1 IntroductionViViT利用纯Transformer结构进行视频分类,是ViT在视频输入上的应用。..原创 2022-01-17 14:46:56 · 2032 阅读 · 0 评论 -
[Video Transformer] UniFormer:Unified Transformer for Efficient Spatial-Temporal Representation Lear
UniFormer:Unified Transformer for Efficient Spatial-Temporal Representation Learning论文: https://openreview.net/pdf?id=nBU_u6DLvoKICLR 20221 Introduction从高维视频中学习多尺度时空语义是很困难的,因为视频帧之间的全局依赖很复杂。在图1中,TimeSformer在浅层中学习视频信息,但是空间和时间注意力都过于冗余。空间注意力...原创 2022-01-14 19:03:38 · 1539 阅读 · 0 评论 -
[Video Transformer] X-ViT: Space-time Mixing Attention for Video Transformer
论文: https://arxiv.org/abs/2106.05968代码:Home | Adrian BulatGitHub - 1adrianb/video-transformers博客:《X-ViT》-基于时空混合attention的视频Transformer,大幅度降低计算复杂度 - 知乎Samsung 视频识别1 Introduction用transformer进行视频识别时,由于时间维度的额外建模,导致计算开销显著提升。本文提出的模型复杂度与视频..原创 2022-01-14 15:13:44 · 1036 阅读 · 0 评论