YOLOv9有效提点|加入SGE、Ge、Global Context、GAM等几十种注意力机制（四）

最新推荐文章于 2024-04-30 10:46:14 发布

今天炼丹了吗

最新推荐文章于 2024-04-30 10:46:14 发布

阅读量1.4k

点赞数 13

分类专栏： YOLOv9涨点改进专栏文章标签： YOLO pytorch 机器学习深度学习人工智能目标检测

本文链接：https://blog.csdn.net/StopAndGoyyy/article/details/136421341

版权

YOLOv9涨点改进专栏专栏收录该内容

45 篇文章 285 订阅

订阅专栏

专栏介绍：YOLOv9改进系列 | 包含深度学习最新创新，主力高效涨点！！！

一、本文介绍

本文只有代码及注意力模块简介，YOLOv9中的添加教程：可以看这篇文章。

YOLOv9有效提点|加入SE、CBAM、ECA、SimAM等几十种注意力机制（一）

SGE:《Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks》

SGE是一种轻量级的神经网络模块，它可以调整卷积神经网络中每个子特征的重要性，从而提高图像识别任务的性能。SGE通过生成注意力因子来调整每个子特征的强度，有效抑制噪声。与流行的CNN主干网络集成时，SGE可以显著提高图像识别性能。。

import numpy as np
import torch
from torch import nn
from torch.nn import init

class SpatialGroupEnhance(nn.Module):
    def __init__(self, groups=8):
        super().__init__()
        self.groups=groups
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.weight=nn.Parameter(torch.zeros(1,groups,1,1))
        self.bias=nn.Parameter(torch.zeros(1,groups,1,1))
        self.sig=nn.Sigmoid()
        self.init_weights()

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                init.constant_(m.weight, 1)
                init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                init.normal_(m.weight, std=0.001)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def forward(self, x):
        b, c, h,w=x.shape
        x=x.view(b*self.groups,-1,h,w) #bs*g,dim//g,h,w
        xn=x*self.avg_pool(x) #bs*g,dim//g,h,w
        xn=xn.sum(dim=1,keepdim=True) #bs*g,1,h,w
        t=xn.view(b*self.groups,-1) #bs*g,h*w

        t=t-t.mean(dim=1,keepdim=True) #bs*g,h*w
        std=t.std(dim=1,keepdim=True)+1e-5
        t=t/std #bs*g,h*w
        t=t.view(b,self.groups,h,w) #bs,g,h*w
        
        t=t*self.weight+self.bias #bs,g,h*w
        t=t.view(b*self.groups,1,h,w) #bs*g,1,h*w
        x=x*self.sig(t)
        x=x.view(b,c,h,w)
        return x

Ge《Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks》

Ge通过引入两个操作符“收集”和“激发”，来改善卷积神经网络（CNN）对上下文的利用。这两个操作符可以有效地从大范围空间中聚合响应，并将信息重新分配给本地特征。这个方法简单且轻量级，可以轻松集成到现有的CNN架构中，而且只增加了很少的参数和计算复杂性。此外，作者还提出了一种参数化的收集-激发操作符对，进一步提高了性能，并将其与最近引入的挤压和激励网络进行了关联。。

这个暂时没调试，代码地址：https://github.com/hujie-frank/GENet

《Global Context Networks》

全局上下文建模注意力机制。论文发现非局部网络对全局上下文的建模对于不同查询位置是相同的。因此，作者创建了一个更简单的网络，只考虑查询无关的全局上下文，减少了计算量。作者还将非局部块的一个转换函数替换为两个瓶颈函数，进一步减少了参数数量。这个新网络叫做全局上下文网络（GCNet），它在各种识别任务的主要基准上表现得比非局部网络更好。。

暂没调试，代码地址：https://github.com/xvjiarui/GCNet

《Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions》

保留信息以增强通道-空间交互注意力机制-GAM是一种全局注意力机制，通过减少信息损失和增强全局交互表示来提高深度神经网络的性能。作者引入了3D排列和多层感知器来进行通道注意力，同时引入卷积空间注意力子模块。在CIFAR-100和ImageNet-1K图像分类任务上的评估表明，该方法优于几种最近的注意力机制，包括ResNet和轻量级的MobileNet。。

import torch.nn as nn
import torch
 
class GAM_Attention(nn.Module):
    def __init__(self, in_channels, rate=4):
        super(GAM_Attention, self).__init__()
 
        self.channel_attention = nn.Sequential(
            nn.Linear(in_channels, int(in_channels / rate)),
            nn.ReLU(inplace=True),
            nn.Linear(int(in_channels / rate), in_channels)
        )
 
        self.spatial_attention = nn.Sequential(
            nn.Conv2d(in_channels, int(in_channels / rate), kernel_size=7, padding=3),
            nn.BatchNorm2d(int(in_channels / rate)),
            nn.ReLU(inplace=True),
            nn.Conv2d(int(in_channels / rate), in_channels, kernel_size=7, padding=3),
            nn.BatchNorm2d(in_channels)
        )
 
    def forward(self, x):
        b, c, h, w = x.shape
        x_permute = x.permute(0, 2, 3, 1).view(b, -1, c)
        x_att_permute = self.channel_attention(x_permute).view(b, h, w, c)
        x_channel_att = x_att_permute.permute(0, 3, 1, 2).sigmoid()
 
        x = x * x_channel_att
 
        x_spatial_att = self.spatial_attention(x).sigmoid()
        out = x * x_spatial_att
 
        return out