接着上一篇博文介绍2019 CVPR DANet (Dual Attention Network for Scene Segmentation),这一篇DRANet可以看作是DANet进阶版,或者轻量化计算量版本。
原文中表述DANet的问题是虽然没有增加模型参数,但是每个点间和通道间的相关系数求解增加了模型的计算量和GPU内存使用
attention modeling brings a heavy burdern on computation and memory if the number of pixels/channels is huge
于是乎,由原来的relationship between any two pixels/channels(PAM/CAM)替换为relationship between any two pixel/channl and gathering centers(CPAM/CAM)模块
。
Compact position attention module
相比PAM模块,CPAM主要增加了方框中的内容,目的就是降少计算量。方法:采用了不同尺度的pooling采样,标准像素子集的gathering centers。原文描述如下(可以学习一下如何把一个简单的下采样说得更高逼格)
construct the relationships between each pixel and a few numbers of gathering centers The gathering centers are formally defined as a compact feature vector by gathering feature vectors from a pixel subset in the input tensor. They are implemented by a spatial pyramid pooling scheme that provides context information from different spatial scales.
原文描述:
配合代码,通俗表达:
- 特征 A ∈ R C × H × W \mathbf{A} \in \mathbb{R}^{C \times H \times W} A∈RC×H×W通过一组不同pooling
kernel size的下采样后得到 L × L R C × L 2 L \times L \ \mathbb{R}^{C \times L^{2}} L×L RC×L2大小的特征bin(图中的立方体的最小单元),再将所有bin在通道层进行堆叠成 F ∈ R C × M \mathbf{F} \in \mathbb{R}^{C \times M} F∈RC×M
代码对应encoding.nn.dran_att.CPAMEnc
class CPAMEnc(Module):
"""
CPAM encoding module
"""
def __init__(self, in_channels, norm_layer):
super(CPAMEnc, self).__init__()
self.pool1 = AdaptiveAvgPool2d(1)
self.pool2 = AdaptiveAvgPool2d(2)
self.pool3 = AdaptiveAvgPool2d(3)
self.pool4 = AdaptiveAvgPool2d(6)
self.conv1 = Sequential(Conv2d(in_channels, in_channels, 1, bias=False),
norm_layer(in_channels),
ReLU(True))
self.conv2 = Sequential(Conv2d(in_channels, in_channels, 1, bias=False),
norm_layer(in_channels),
ReLU