【PSA】《Polarized Self-Attention: Towards High-quality Pixel-wise Regression》

在这里插入图片描述

arXiv-2020



1 Background and Motivation

论文名字的又来(参考来自 Polarized Self-Attention: Towards High-quality Pixel-wise Regression

In photography, there are always random lights in transverse directions that produce glares/reflections. Polarized filtering, by only allowing the light pass orthogonal to the transverse direction, can potentially improve the contrast of the photo. Due to the loss of total intensity, the light after filtering usually has a small dynamic range, therefore needs a additional boost, e.g. by High Dynamic Range(HDR), to recover the details of the original scene

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

为防止眩光/反射,需偏振片滤光,但总能量会消失,需要额外的提升(eg:HDR),用来以恢复原始场景的详细信息

很像 attention

为了解决同时对空间和通道建模时,如果不进行维度缩减,就会导致计算量、显存爆炸的问题,作者在PSA中采用了一种极化滤波(polarized filtering)机制。

(1)滤波(Filtering):使得一个维度的特征(比如通道维度)完全坍塌,同时让正交方向的维度(比如空间维度)保持高分辨率。

(2)High Dynamic Range(HDR):首先在 attention 模块中最小的 tensor 上用 Softmax 函数来增加注意力的范围,然后再用Sigmoid 函数进行动态的映射。


深度学习发展 coarse-grained(分类 / 检测)-> fine-grained computer vision tasks(关键点 / 分割)

the pixel-wise regression problem has a higher problem complexity by the order of output element numbers

  • Keeping high internal resolution at a reasonable cost
  • Fitting output distribution such as that of the key-point heatmaps or segmentation masks

面对上面的难点,作者从 plug-and-play solution 的方向研究来提升 the pixel-wise regression problem 的精度,提出 Polarized Self-Attention

注意力机制设计过程中,通道注意力尽可能保留通道信息,空间注意力尽可能保留空间信息

输出时结合了 softmax 的高斯分布与 sigmoid 的二项式分布

在这里插入图片描述

2 Related Work

  • Pixel-wise Regression Tasks
    • keypoint estimation(heatmaps)
    • semantic segmentation
  • Self-attention and its Variants
  • Full-tensor and simplified attention blocks
    non-local 优化

3 Advantages / Contributions

借鉴光学中偏振滤波的思想,提出 Polarized self attention,关键点检测和语义分割任务上公开数据集提点明显

4 Method

2D Gaussian distribution (keypoint heatmaps)

2D Binormial distribution (binary segmentation masks)

fuse softmax-sigmoid composition in both channel-only and spatial-only attention branches
在这里插入图片描述

通道注意力获取 C × 1 × 1 C\times1\times1 C×1×1

C × 1 × 1 C\times1\times1 C×1×1 也可以由 ( C × H W ) × ( H W × 1 × 1 ) (C\times HW) \times (HW\times1\times1) (C×HW)×(HW×1×1) 获取得到

空间注意力获取 1 × H × W 1\times H\times W 1×H×W

1 × H × W 1\times H\times W 1×H×W 也可以是 ( 1 × C ) × ( C × H W ) (1\times C) \times (C\times HW) 1×C×C×HW 再 reshape 一下得到

和 CBAM 相比(【Attention】《CBAM: Convolutional Block Attention Module》

通道注意力中利用了更多的空间信息(not global pooling),空间注意力中更加充分的利用了通道信息(not mean)

看看公式表达

(1)通道注意力

在这里插入图片描述

在这里插入图片描述
σ \sigma σ 是 reshape 操作

F S M F_{SM} FSM softmax

(2)空间注意力
在这里插入图片描述
在这里插入图片描述
通道空间注意力并行

在这里插入图片描述
通道空间注意力串行

在这里插入图片描述

Relation of PSA to other Self-Attentions

  • Internal Resolution vs Complexity:higher-resolution squeeze-and-excitation
  • Output Distribution/Non-linearity:Both the PSA channel-only and spatial-only branches use a Softmax-Sigmoid composition

看看代码

import numpy as np
import torch
from torch import nn
from torch.nn import init

class ParallelPolarizedSelfAttention(nn.Module):
    def __init__(self, channel=512):
        super().__init__()
        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))
        self.softmax_channel=nn.Softmax(1)
        self.softmax_spatial=nn.Softmax(-1)
        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))
        self.ln=nn.LayerNorm(channel)
        self.sigmoid=nn.Sigmoid()
        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.agp=nn.AdaptiveAvgPool2d((1,1))

    def forward(self, x):
        b, c, h, w = x.size()

        #Channel-only Self-Attention
        channel_wv=self.ch_wv(x) #bs,c//2,h,w
        channel_wq=self.ch_wq(x) #bs,1,h,w
        channel_wv=channel_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        channel_wq=channel_wq.reshape(b,-1,1) #bs,h*w,1
        channel_wq=self.softmax_channel(channel_wq)
        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1
        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1
        channel_out=channel_weight*x

        #Spatial-only Self-Attention
        spatial_wv=self.sp_wv(x) #bs,c//2,h,w
        spatial_wq=self.sp_wq(x) #bs,c//2,h,w
        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1
        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2
        spatial_wq=self.softmax_spatial(spatial_wq) #bs,1,c//2
        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w
        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w
        spatial_out=spatial_weight*x
        out=spatial_out+channel_out
        return out

class SequentialPolarizedSelfAttention(nn.Module):
    def __init__(self, channel=512):
        super().__init__()
        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))
        self.softmax_channel=nn.Softmax(1)
        self.softmax_spatial=nn.Softmax(-1)
        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))
        self.ln=nn.LayerNorm(channel)
        self.sigmoid=nn.Sigmoid()
        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))
        self.agp=nn.AdaptiveAvgPool2d((1,1))

    def forward(self, x):
        b, c, h, w = x.size()

        #Channel-only Self-Attention
        channel_wv=self.ch_wv(x) #bs,c//2,h,w
        channel_wq=self.ch_wq(x) #bs,1,h,w
        channel_wv=channel_wv.reshape(b,c//2,-1) # bs,c//2,h*w
        channel_wq=channel_wq.reshape(b,-1,1) # bs,h*w,1
        channel_wq=self.softmax_channel(channel_wq) # bs,h*w,1
        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1
        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1
        channel_out=channel_weight*x

        #Spatial-only Self-Attention
        spatial_wv=self.sp_wv(channel_out) #bs,c//2,h,w
        spatial_wq=self.sp_wq(channel_out) #bs,c//2,h,w
        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1
        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w
        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2
        spatial_wq=self.softmax_spatial(spatial_wq)
        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w
        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w
        spatial_out=spatial_weight*channel_out
        return spatial_out

if __name__ == '__main__':
    input=torch.randn(1,512,7,7)
    psa = SequentialPolarizedSelfAttention(channel=512)
    output=psa(input)
    print(output.shape)

还是比较直观的

5 Experiments

we add PSAs after the first 3 × 3 convolution in every residual blocks, respectively.

在这里插入图片描述

5.1 Datasets and Metrics

  • MS-COCO 2017 human pose estimation(AP)

  • Pascal VOC2012 semantic segmentation(mIoU)

5.2 PSA vs. Baselines

(1)Top-Down 2D Human Pose Estimation

在这里插入图片描述

输出热力图尺寸 96 × 72,(1/4)

(2)Semantic Segmentation
在这里插入图片描述
没有关键点提升明显

5.3 Semantic Segmentation

在这里插入图片描述

通道注意力和空间注意力并行(p)和串行(s)并无太大差别,marginal metric differences

在这里插入图片描述
均有提点

5.4 Ablation Study

在这里插入图片描述
通道注意力和空间注意力都用比单用好,串行和并行两者效果相仿

在这里插入图片描述

6 Conclusion(own)

更多论文笔记,可以参考 【Paper Reading】

  • Channel-only attention blocks put the same weights on different spatial locations, such that the classification task still benefits since its spatial information eventually collapses by pooling, and the anchor displacement regression in object detection benefits since the channel-only attention unanimously highlights all foreground pixels

  • PSA in complex DCNN heads 效果如何作者还没有做

  • 光学故事背景讲的可以,提点也OK,但是没有全文反复强调,力没有集中发到一处,不够精确打击

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值