CNN插件：把YOLOF中的Encoder变为一个Pytorch插件

最新推荐文章于 2024-04-16 12:24:00 发布

AICVHub

最新推荐文章于 2024-04-16 12:24:00 发布

阅读量597

点赞数 5

分类专栏： DeepLearning 代码解析文章标签： YOLOF Dilated Encoder 插件 Pytorch FPN替代品

本文链接：https://blog.csdn.net/oYeZhou/article/details/115488964

版权

DeepLearning 同时被 2 个专栏收录

71 篇文章 30 订阅

订阅专栏

代码解析

8 篇文章 0 订阅

订阅专栏

本文介绍了YOLOF论文中提出的DilatedEncoder，它作为FPN的替代品，用于目标检测网络。DilatedEncoder包含FPN侧边连接和膨胀残差块，以覆盖不同尺度的目标。作者提供了纯Pytorch实现的代码，方便在其他网络结构中使用。

摘要由CSDN通过智能技术生成

1、目的

2、关于Dilated Encoder

3、Pytorch代码

1、目的

之前的一篇博客介绍了YOLOF的论文，其中提到了一种Dilated Encoder来作为FPN的替代品。那么，我们是否可以将其提取出来作为一个插件来用呢？

基于这种想法，就产生了本篇博客的内容。

2、关于Dilated Encoder

在YOLOF中，提出了一种SiSo（单进单出）的模块，可以代替传统MiMo（多进多出）的FPN，其结构如图所示：

其构造十分清晰：首先按照FPN的方式在backbone后面增加了两个投影层（一个1*1卷积、一个3*3卷积），生成一个512通道的feature map；然后，为了使得Encoder的输出特征能够覆盖所有尺度的目标，我们提出了一个额外的残差块，其包含三个连续的卷积：先用一个1*1卷积将通道维度减少4倍，接着用一个3*3膨胀卷积来增大感受野，最后再用一个1*1卷积恢复通道数。

事实上，该模块继承了FPN中的侧边连接（上图中的Projector），C5特征经过Projector之后，然后经过四个连续的残差块，从而生成P5特征。

3、Pytorch代码

在YOLOF官方实现的Encoder中，其包含了一些detectron2、fvcore的API，这对使用纯Pytorch框架的同学来说不太友好。于是，我对其中涉及Pytorch之外框架的内容进行了更换、删减，从而形成了一个纯Pytorch的Encoder，其可以作为一种插件，放到任意其他网络结构中。

废话不多说，直接上码：

"""
The define of Dilated Encoder from YOLOF:
No detectron2, only Pytorch.

"""

import torch
import torch.nn as nn


class DilatedEncoder(nn.Module):
    """
    Dilated Encoder for YOLOF.

    This module contains two types of components:
        - the original FPN lateral convolution layer and fpn convolution layer,
          which are 1x1 conv + 3x3 conv
        - the dilated residual block
    """

    def __init__(self,
                 in_channels=2048,
                 encoder_channels=512,
                 block_mid_channels=128,
                 num_residual_blocks=4,
                 block_dilations=[2, 4, 6, 8]
                 ):
        super(DilatedEncoder, self).__init__()
        # fmt: off
        self.in_channels = in_channels
        self.encoder_channels = encoder_channels
        self.block_mid_channels = block_mid_channels
        self.num_residual_blocks = num_residual_blocks
        self.block_dilations = block_dilations

        assert len(self.block_dilations) == self.num_residual_blocks

        # init
        self._init_layers()
        self._init_weight()

    def _init_layers(self):
        self.lateral_conv = nn.Conv2d(self.in_channels,
                                      self.encoder_channels,
                                      kernel_size=1)
        self.lateral_norm = nn.BatchNorm2d(self.encoder_channels)
        self.fpn_conv = nn.Conv2d(self.encoder_channels,
                                  self.encoder_channels,
                                  kernel_size=3,
                                  padding=1)
        self.fpn_norm = nn.BatchNorm2d(self.encoder_channels)
        encoder_blocks = []
        for i in range(self.num_residual_blocks):
            dilation = self.block_dilations[i]
            encoder_blocks.append(
                Bottleneck(
                    self.encoder_channels,
                    self.block_mid_channels,
                    dilation=dilation
                )
            )
        self.dilated_encoder_blocks = nn.Sequential(*encoder_blocks)

    def xavier_init(self, layer):
            if isinstance(layer, nn.Conv2d):
                # print(layer.weight.data.type())
                # m.weight.data.fill_(1.0)
                nn.init.xavier_uniform_(layer.weight, gain=1)

    def _init_weight(self):
        self.xavier_init(self.lateral_conv)
        self.xavier_init(self.fpn_conv)
        for m in [self.lateral_norm, self.fpn_norm]:
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)
        for m in self.dilated_encoder_blocks.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, mean=0, std=0.01)
                if hasattr(m, 'bias') and m.bias is not None:
                    nn.init.constant_(m.bias, 0)

            if isinstance(m, (nn.GroupNorm, nn.BatchNorm2d, nn.SyncBatchNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def forward(self, feature: torch.Tensor) -> torch.Tensor:
        out = self.lateral_norm(self.lateral_conv(feature))
        out = self.fpn_norm(self.fpn_conv(out))
        return self.dilated_encoder_blocks(out)


class Bottleneck(nn.Module):

    def __init__(self,
                 in_channels: int = 512,
                 mid_channels: int = 128,
                 dilation: int = 1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels, mid_channels, kernel_size=1, padding=0),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(inplace=True)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(mid_channels, mid_channels,
                      kernel_size=3, padding=dilation, dilation=dilation),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(inplace=True)
        )
        self.conv3 = nn.Sequential(
            nn.Conv2d(mid_channels, in_channels, kernel_size=1, padding=0),
            nn.BatchNorm2d(in_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        identity = x
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)
        out = out + identity
        return out


if __name__ == '__main__':
    encoder = DilatedEncoder()
    print(encoder)

    x = torch.rand(1, 2048, 32, 32)
    y = encoder(x)
    print(y.shape)