各类网络结构

最新推荐文章于 2023-04-26 18:25:05 发布

小涵涵

最新推荐文章于 2023-04-26 18:25:05 发布

阅读量252

点赞数

分类专栏： pytorch 文章标签：神经网络人工智能卷积神经网络

本文链接：https://blog.csdn.net/qq_34929889/article/details/107512126

版权

pytorch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

各类网络结构

注意力机制
- 通道注意力
- 空间注意力

注意力机制

总的来说，注意力机制可分为两种：一种是软注意力(soft attention)，另一种则是强注意力(hard attention)。
软注意力(soft attention)与强注意力(hard attention)的不同之处在于：
软注意力更关注区域或者通道，而且软注意力是确定性的注意力，学习完成后直接可以通过网络生成，最关键的地方是软注意力是可微的，这是一个非常重要的地方。可以微分的注意力就可以通过神经网络算出梯度并且前向传播和后向反馈来学习得到注意力的权重。在计算机视觉中，很多领域的相关工作(例如，分类、检测、分割、生成模型、视频处理等)都在使用Soft Attention，典型代表：SENet、SKNet。
强注意力是更加关注点，也就是图像中的每个点都有可能延伸出注意力，同时强注意力是一个随机的预测过程，更强调动态变化。当然，最关键是强注意力是一个不可微的注意力，训练过程往往是通过增强学习(reinforcement learning) 来完成的。就是0/1问题，哪些区域是被 attentioned，哪些区域不关注.硬注意力在图像中的应用已经被人们熟知多年：图像裁剪（image cropping）

g = I[y:y+h, x:x+w]

通道注意力

对于不同通道上的feature map做一个全局的Maxpooling或者Averagepooling，得到MaxPool Channel Attention vector([C, 1, 1])和AvgPool Channel Attention vector([C, 1, 1])。
然后将这两个vector输入到一个权重共享的只有一个隐藏层的感知机里(MLP)，其中隐藏层的的权重向量的shape为C/r, 1, 1，最终得到两个经过处理的Channel Attention vector。
最后一步是将这两个vector进行element-wise summation操作和sigmoid激活函数处理，并和原特征图进行元素之间的相乘，得到新的特征图。

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

空间注意力

这一次是在轴的方向上对不同特征图上相同位置的像素值进行全局的MaxPooling和AvgPooling操作，分别得到两个spatial attention map并将其concatenate，shape为[2, H, W]。
再利用一个7*7的卷积对这个feature map进行卷积。后接一个sigmoid函数。得到一个语言特征图维数相同的加上空间注意力权重的空间矩阵。

class ChannelPool(nn.Module):
    def forward(self, x):
        return torch.cat( (torch.max(x,1)[0].unsqueeze(1), torch.mean(x,1).unsqueeze(1)), dim=1 )
class BasicConv(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True, bias=False):
        super(BasicConv, self).__init__()
        self.out_channels = out_planes
        self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=bias)
        self.bn = nn.BatchNorm2d(out_planes,eps=1e-5, momentum=0.01, affine=True) if bn else None
        self.relu = nn.ReLU() if relu else None

    def forward(self, x):
        x = self.conv(x)
        if self.bn is not None:
            x = self.bn(x)
        if self.relu is not None:
            x = self.relu(x)
        return x

class SpatialGate(nn.Module):
    def __init__(self):
        super(SpatialGate, self).__init__()
        kernel_size = 7
        self.compress = ChannelPool()
        self.spatial = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size-1) // 2, relu=False)
    def forward(self, x):
        x_compress = self.compress(x)
        print("x_compress {}".format(x_compress.size()))
        x_out = self.spatial(x_compress)
        print("s out{}".format(x_out.size()))
        scale = F.sigmoid(x_out) # broadcasting
        print("scale{}".format(scale.size()))
        return x * scale