【Pytorch】Squeeze-and-Excitation Networks

最新推荐文章于 2024-03-26 07:25:28 发布

leSerein_

最新推荐文章于 2024-03-26 07:25:28 发布

阅读量1.1k

点赞数 16

文章标签： pytorch 人工智能 python

本文链接：https://blog.csdn.net/a_piece_of_ppx/article/details/136751768

版权

文章的分享和Pytorch实现

Squeeze-and-Excitation Blocks, CVPR 2018

在这里插入图片描述

Let $\mathbf{V}=\left[\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_C\right]$ denote the learned set of filter kernels, where $\mathbf{v}_c$ refers to the parameters of the $c$ -th filter. We can then write the outputs of $\mathbf{F}_{t r}$ as $\mathbf{U}=\left[\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_C\right]$ , where
$\mathbf{u}_c=\mathbf{v}_c * \mathbf{X}=\sum_{s=1}^{C^{\prime}} \mathbf{v}_c^s * \mathbf{x}^s .$

Here $*$ denotes convolution, $\mathbf{v}_c=\left[\mathbf{v}_c^1, \mathbf{v}_c^2, \ldots, \mathbf{v}_c^{C^{\prime}}\right]$ and $\mathbf{X}=\left[\mathbf{x}^1, \mathbf{x}^2, \ldots, \mathbf{x}^{C^{\prime}}\right]$ (to simplify the notation, bias terms are omitted), while $\mathbf{v}_c^s$ is a 2D spatial kernel, and therefore represents a single channel of $\mathbf{v}_c$ which acts on the corresponding channel of $\mathbf{X}$ .

就是说正常的卷积网络，通道输出是和空间的卷积核挂钩的（一个通道的结果是不同通道卷积后相加）, 即通道依赖关系与过滤器捕获的空间相关性纠缠在一起。作者的目标是

Our goal is to ensure that the network is able to increase its sensitivity to informative features so that they can be exploited by subsequent transformations, and to suppress less useful ones. We propose to achieve this by explicitly modelling channel interdependencies to recalibrate filter responses in two steps, squeeze and excitation, before they are fed into next transformation.

Squeeze: Global Information Embedding

每个学习到的过滤器都与一个局部接受域一起操作，因此转换输出U的每个单元都无法利用该区域之外的上下文信息。这个问题在网络的较低层变得更加严重，因为它们的接受域大小很小。为了让网络不只关注局部信息，作者讲空间信息进行平均，获得了全局信息。

Formally, a statistic $\mathbf{z} \in \mathbb{R}^C$ is generated by shrinking $\mathbf{U}$ through spatial dimensions $\times W$ , where the $c$ -th element of $\mathbf{z}$ is calculated by:
$z_c=\mathbf{F}_{s q}\left(\mathbf{u}_c\right)=\frac{1}{H \times W} \sum_{i=1}^H \sum_{j=1}^W u_c(i, j)$

Excitation: Adaptive Recalibration

为了利用在挤压操作中聚合的信息，我们在它之后进行第二个操作，目的是to fully capture channel-wise dependencies.
具体来说包含两个全连接层。对squeeze全局池化后得到的结果（已经是可以看作一个C维向量），进行全连接，得到C/r维的向量，在进行Reulu激活，再对进行一次全连接，将C/r维的向量变回C维向量，再进行sigmoid激活（使得数值位于0-1之间），这便是得到了权重矩阵。
这个其实就是通道级别的注意力¹。

The final output of the block is obtained by rescaling the transformation output $\mathrm{U}$ with the activations:
$\widetilde{\mathbf{x}}_c=\mathbf{F}_{\text {scale }}\left(\mathbf{u}_c, s_c\right)=s_c \cdot \mathbf{u}_c,$
where $\widetilde{\mathbf{X}}=\left[\widetilde{\mathbf{x}}_1, \widetilde{\mathbf{x}}_2, \ldots, \widetilde{\mathbf{x}}_C\right]$ and $\mathbf{F}_{\text {scale }}\left(\mathbf{u}_c, s_c\right)$ refers to channel-wise multiplication between the feature map $\mathbf{u}_c \in$ $\mathbb{R}^{H \times W}$ and the scalar $s_c$ .

在ResNet上的插入应用

在这里插入图片描述

提供两份儿Pytorch代码

第一种¹

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c) #对应Squeeze操作
        y = self.fc(y).view(b, c, 1, 1) #对应Excitation操作
        return x * y.expand_as(x)

第二种²

import torch
import torch.nn as nn
import torch.nn.functional as F

# Squeeze and Excitation module

class SqEx(nn.Module):

    def __init__(self, n_features, reduction=16):
        super(SqEx, self).__init__()

        if n_features % reduction != 0:
            raise ValueError('n_features must be divisible by reduction (default = 16)')

        self.linear1 = nn.Linear(n_features, n_features // reduction, bias=True)
        self.nonlin1 = nn.ReLU(inplace=True)
        self.linear2 = nn.Linear(n_features // reduction, n_features, bias=True)
        self.nonlin2 = nn.Sigmoid()

    def forward(self, x):

        y = F.avg_pool2d(x, kernel_size=x.size()[2:4])
        y = y.permute(0, 2, 3, 1)
        y = self.nonlin1(self.linear1(y))
        y = self.nonlin2(self.linear2(y))
        y = y.permute(0, 3, 1, 2)
        y = x * y
        return y

# Residual block using Squeeze and Excitation

class ResBlockSqEx(nn.Module):

    def __init__(self, n_features):
        super(ResBlockSqEx, self).__init__()

        # convolutions

        self.norm1 = nn.BatchNorm2d(n_features)
        self.relu1 = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(n_features, n_features, kernel_size=3, stride=1, padding=1, bias=False)

        self.norm2 = nn.BatchNorm2d(n_features)
        self.relu2 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(n_features, n_features, kernel_size=3, stride=1, padding=1, bias=False)

        # squeeze and excitation

        self.sqex  = SqEx(n_features)

    def forward(self, x):
        
        # convolutions

        y = self.conv1(self.relu1(self.norm1(x)))
        y = self.conv2(self.relu2(self.norm2(y)))

        # squeeze and excitation

        y = self.sqex(y)

        # add residuals
        
        y = torch.add(x, y)

        return y