YOLOv5改进系列(二十一) 更换全维动态卷积OMNI

小酒馆燃着灯

已于 2024-01-30 08:47:13 修改

阅读量896

点赞数 21

文章标签： YOLO 深度学习 cnn 神经网络人工智能

于 2023-12-15 14:04:28 首次发布

原文链接：https://blog.csdn.net/weixin_43694096/article/details/129291814

版权

深度学习同时被 2 个专栏收录

213 篇文章 45 订阅

订阅专栏

YOLOv5改进

27 篇文章 30 订阅

订阅专栏

本文介绍了一种新的动态卷积设计——全维动态卷积(ODConv)，它通过学习四维卷积核的注意力，显著提高轻量级CNN的准确性。实验显示，ODConv在ImageNet和MS-COCO上为不同CNN架构带来显著的性能提升，即使单个卷积核也能与现有动态卷积模块竞争。

摘要由CSDN通过智能技术生成

论文详细翻译

文章目录

动态卷积原理图
动态卷积原理解析
参数量与计算量
YOLO系列算法添加方式

单个静态卷积核是现代卷积神经网络（CNNs）的常见训练范式。然而，最近的动态卷积研究表明，学习加权为其输入依赖注意力的n个卷积核的线性组合可以显著提高轻量级CNNs的准确性，同时保持高效的推理。然而，我们观察到现有的作品通过卷积核空间的一个维度（关于卷积核数量）赋予卷积核动态属性，但是另外三个维度（关于每个卷积核的空间尺寸，输入通道数量和输出通道数量）被忽视了。受此启发，我们提出全维动态卷积（ODConv)，这是一种更加通用但优雅的动态卷积设计，以推进这一研究领域。ODConv利用一种新的多维注意力机制和并行策略，以学习沿着卷积核空间的所有四个维度的卷积核的互补注意力。作为常规卷积的替代品，ODConv可以插入到许多CNN体系结构中。在ImageNet和MS-COCO数据集上的大量实验证明，ODConv为各种流行的CNN骨干结构带来了可靠的准确性提升，包括轻量级和大型结构，例如，在ImageNet数据集上，对于MobivleNetV2 | ResNet家族，绝对top-1改进为3.77%∼5.71% | 1.86%∼3.72%。有趣的是，由于其改进的特征学习能力，即使只有一个单独的卷积核，ODConv也可以与具有多个卷积核的现有动态卷积对手竞争或超越其性能，从而大大减少额外的参数。此外，ODConv也优于其他调制输出特征或卷积权重的注意力模块。

论文地址：https://openreview.net/pdf?id=DmpCfq6Mg39

动态卷积原理图

在这里插入图片描述

图2：在ODConv中逐步将四种类型的注意力乘以卷积核的插图。(a ) 沿着空间维度的位置乘法运算，(b ) 沿着输入通道维度的通道乘法运算，(c ) 沿着输出通道维度的滤波器乘法运算，以及(d ) 沿着卷积核空间维度的核乘法运算。

动态卷积原理解析

动态卷积是指在卷积过程中，卷积核的大小和形状是可变的。相对于传统的卷积操作，动态卷积在处理不同大小的输入时更加灵活。

具体来说，动态卷积的原理可以分为以下几步：

首先，我们需要定义一个可变的卷积核。这个卷积核的大小和形状可以在运行时进行调整，以适应不同大小的输入。
接下来，我们需要将可变卷积核应用于输入数据。在传统的卷积操作中，我们是将固定大小的卷积核应用于输入的各个位置。但是在动态卷积中，由于卷积核的大小和形状是可变的，因此我们需要在每个位置重新定义卷积核的大小和形状。
在每个位置重新定义卷积核大小和形状的方式通常是通过一些额外的参数来实现的。例如，我们可以使用位置信息来决定卷积核的大小和形状，或者使用其他的可学习参数来控制卷积核的大小和形状。
最后，我们可以将卷积操作的输出与其他层连接起来，以构建深度神经网络。

总之，动态卷积通过使用可变大小和形状的卷积核，可以更加灵活地处理不同大小的输入。

参数量与计算量

模型	参数量 parameters	计算量 GFLOPs
5s	7235389	16.5
5s-backbone	7028169	15.9
5s-neck	8681417	16.6

YOLO系列算法添加方式

common.py
class od_Attention(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):
        super(od_Attention, self).__init__()
        attention_channel = max(int(in_planes * reduction), min_channel)
        self.kernel_size = kernel_size
        self.kernel_num = kernel_num
        self.temperature = 1.0

        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)
        self.bn = nn.BatchNorm2d(attention_channel)
        self.relu = nn.ReLU(inplace=True)

        self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)
        self.func_channel = self.get_channel_attention

        if in_planes == groups and in_planes == out_planes:  # depth-wise convolution
            self.func_filter = self.skip
        else:
            self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)
            self.func_filter = self.get_filter_attention

        if kernel_size == 1:  # point-wise convolution
            self.func_spatial = self.skip
        else:
            self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)
            self.func_spatial = self.get_spatial_attention

        if kernel_num == 1:
            self.func_kernel = self.skip
        else:
            self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)
            self.func_kernel = self.get_kernel_attention

        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            if isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def update_temperature(self, temperature):
        self.temperature = temperature

    @staticmethod
    def skip(_):
        return 1.0

    def get_channel_attention(self, x):
        channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
        return channel_attention

    def get_filter_attention(self, x):
        filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
        return filter_attention

    def get_spatial_attention(self, x):
        spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)
        spatial_attention = torch.sigmoid(spatial_attention / self.temperature)
        return spatial_attention

    def get_kernel_attention(self, x):
        kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)
        kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)
        return kernel_attention

    def forward(self, x):
        x = self.avgpool(x)
        x = self.fc(x)
        x = self.relu(x)
        return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)


class ODConv2d(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1,
                 reduction=0.0625, kernel_num=4):
        super(ODConv2d, self).__init__()
        self.in_planes = in_planes
        self.out_planes = out_planes
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.kernel_num = kernel_num
        self.attention = od_Attention(in_planes, out_planes, kernel_size, groups=groups,
                                   reduction=reduction, kernel_num=kernel_num)
        self.weight = nn.Parameter(torch.randn(kernel_num, out_planes, in_planes//groups, kernel_size, kernel_size),
                                   requires_grad=True)
        self._initialize_weights()

        if self.kernel_size == 1 and self.kernel_num == 1:
            self._forward_impl = self._forward_impl_pw1x
        else:
            self._forward_impl = self._forward_impl_common

    def _initialize_weights(self):
        for i in range(self.kernel_num):
            nn.init.kaiming_normal_(self.weight[i], mode='fan_out', nonlinearity='relu')

    def update_temperature(self, temperature):
        self.attention.update_temperature(temperature)

    def _forward_impl_common(self, x):
        # Multiplying channel attention (or filter attention) to weights and feature maps are equivalent,
        # while we observe that when using the latter method the models will run faster with less gpu memory cost.
        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)
        batch_size, in_planes, height, width = x.size()
        x = x * channel_attention
        x = x.reshape(1, -1, height, width)
        aggregate_weight = spatial_attention * kernel_attention * self.weight.unsqueeze(dim=0)
        aggregate_weight = torch.sum(aggregate_weight, dim=1).view(
            [-1, self.in_planes // self.groups, self.kernel_size, self.kernel_size])
        output = F.conv2d(x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding,
                          dilation=self.dilation, groups=self.groups * batch_size)
        output = output.view(batch_size, self.out_planes, output.size(-2), output.size(-1))
        output = output * filter_attention
        return output

    def _forward_impl_pw1x(self, x):
        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)
        x = x * channel_attention
        output = F.conv2d(x, weight=self.weight.squeeze(dim=0), bias=None, stride=self.stride, padding=self.padding,
                          dilation=self.dilation, groups=self.groups)
        output = output * filter_attention
        return output

    def forward(self, x):
        return self._forward_impl(x)

yolo.py
在这里插入图片描述

yaml文件有很多种写法，如过你想将这个模块作为降采样模块，那么yaml文件你可以这么写：

分别代表通道数，卷积核大小，步长

[-1, 1, ODConv2d, [256, 3, 2, 1]],

如过你想将这个模块作为常规卷积模块，那么yaml文件你可以这么写：

[-1, 1, ODConv2d, [256, 3, 1, 1]],

yolov5s-neck.yaml
# by CSDN 迪菲赫尔曼

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],    # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],    # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],    # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],   # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024,5]],       # 9
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],   # cat backbone P3
   [-1, 3, C3, [256, False]],   # 17 (P3/8-small)
   [-1, 1, ODConv2d, [256, 3, 1, 1]],    # 18  

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],   # 21 (P4/16-medium)
   [-1, 1, ODConv2d, [512, 3, 1, 1]],  # 22  

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # 24 cat head P5
   [-1, 3, C3, [1024, False]],  # 25 (P5/32-large)
   [-1, 1, ODConv2d, [1024, 3, 1, 1]],  # 26  

   [[18, 22, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]
yolov5s-backbone.yaml
# by CSDN 迪菲赫尔曼

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone + three Attention modules
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],    # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],    # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, ODConv2d, [256, 3, 1, 1]],       
   [-1, 1, Conv, [512, 3, 2]],    # 6-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, ODConv2d, [512, 3, 1, 1]],       
   [-1, 1, Conv, [1024, 3, 2]],   # 9-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],      # 11
   [-1, 1, ODConv2d, [1024, 3, 1, 1]],      
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 8], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 16

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 5], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 20 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 17], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],   # 23 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 13], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 26 (P5/32-large)

   [[20, 23, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]