Swin Transformer
痛点:尺度变化 – 提出Hieracical
attention的变化: sliding window attention,减少了attention的计算复杂度,同时通过sliding增强connection,实现全局attention
和ViT的区别:ViT16*下采样,尺寸单一;Swin Transformer多尺度
local的思维:在一个小范围算attention是基本够用的,全局算self-attention有浪费。(基于假设:属性相似的目标距离是相对接近的)
Patch Partition: 相当于打碎成block
Patch Merging: MaxPooling,提供多尺度特征
从H * W * C变换为H/2 * W/2 * 2C
总backbone:
回顾一下Attention is all you need: 从CNN, RNN变为Transformer
CNN:Transformer借鉴了CNN的多通道以提取不同特性的特征,另外CNN的金字塔结构可以将相隔较远的信息归纳在一起
RNN:认为RNN结构并行化差,序列化执行效率低下 纳在一起 RNN:认为RNN结构并行化差,序列化执行效率低下
总体来说就是借鉴了优点、改进缺陷
SwinIR: Swin Transformer for Image Restoration
Related Work(IR的方法)
-
Traditional model-based
-
CNN-based (SRCNN…):
a flurry of CNN-based models have been proposed to improve model representation ability by using more elaborate neural network architecture designs, such as residual block, dense block and others . Some of them have exploited the attention mechanism inside the CNN framework, such as channel attention, non-local attention and adaptive patch aggregation.
-
Vision Transformer
SwinIR Model
SwinIR consists of three modules:
shallow feature extraction, deep feature extraction and high-quality (HQ) image reconstruction modules.
1.Shallow feature extraction
Shallow Features: we use a 3*3 convolutional layer HSF() to extract shallow feature as