Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

动机:为啥挑这篇文章,因为效果炸裂,各种改款把各种数据集霸榜了:语义分割/分类/目标检测,前10都有它Swin Transformer, that capably serves as a general-purpose backbone for computer vision.【CC】接着VIT那篇论文挖的坑,transfomer能否做为CV领域的backbone,VIT里面只做了分类的尝试,留了检测/语义分割的坑,这篇文章直接回答swin transfomer可以Transformer from
摘要由CSDN通过智能技术生成

动机:

为啥挑这篇文章,因为效果炸裂,各种改款把各种数据集霸榜了:语义分割/分类/目标检测,前10都有它
在这里插入图片描述
Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
【CC】接着VIT那篇论文挖的坑,transfomer能否做为CV领域的backbone,VIT里面只做了分类的尝试,留了检测/语义分割的坑,这篇文章直接回答swin transfomer可以

Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images. To address these differences, we propose a hierarchical Transformer whose representation is computed with Shifted windows
【CC】CV领域对transfomer有两个困难:各种各样的图片尺度, 高分辨率的图片(需要处理的数据量太大). 这个Shifted windows很像一个Conv Block;果然被人称为披着CNN 的 transfomer。

解题思路

Designed for sequence modeling and transduction tasks, the Transformer is notable for its use of attention to model long-range dependencies in the data
【CC】transfomer设计初衷是为了搞定序列模型里面大跨度间元素依赖关系

visual elements can vary substantially in scale, a problem that receives attention in tasks such as object detection. In existing Transformer-based models, tokens are all of a fifixed scale, a property unsuitable for these vision applications.
【CC】在目标检测任务中,各个物体的尺度大小差异非常大;跟NLP里面一个token就是一个词/或者字差别比较大. 所以,现有按照固定尺度作为token去处理图像不太合适,说的就是vit啊!~

There exist many vision tasks such as semantic segmentation that require dense prediction at the pixel level, and this would be intractable for Transformer on high-resolution images, as the computational complexity of its self-attention is quadratic to image size.
【CC】对于语义分割一类的任务,做像素集的估计对transfomer来说计算量太大,为图像分辨率的平方,不可行

To overcome these issues, we propose a general purpose Transformer backbone, called Swin Transformer, which constructs hierarchical feature maps and has linear computational complexity to image siz

  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值