DetNet: A Backbone network for Object Detection 论文阅读

最新推荐文章于 2023-08-27 13:12:46 发布

Bing_Shieh

最新推荐文章于 2023-08-27 13:12:46 发布

阅读量208

点赞数

分类专栏：目标检测

本文链接：https://blog.csdn.net/bingbingxie1/article/details/95327835

版权

目标检测专栏收录该内容

4 篇文章 0 订阅

订阅专栏

论文地址：https://arxiv.org/abs/1804.06215
GitHub地址：https://github.com/guoruoqian/DetNet_pytorch

论文开头提出了两个问题：

分类任务和检测任务还是有一定差别的，因此用分类数据上训练的分类模型来提取特征用于检测任务不一定合适，比如检测任务比较关注目标的尺度特征，但是分类任务就不一定了。
检测任务不仅仅要做目标的分类，而且要做目标的定位，这样的差异容易导致一些问题，比如在分类网络中常用的降采样操作可能对分类有效，因为增大了感受野，但是对于需要定位目标的检测任务而言就不一定有利，因为丢失了目标的位置信息。

因此DetNet的提出主要也是针对这两个出发点，换句话说是设计了一个专门用于目标检测算法的特征提取网络，主要改进点包括：

增加网络高层输出特征的分辨率，换句话说就是高层不对特征图做尺寸缩减。
引入dilated卷积层增加网络高层的感受野，这是因为第一个改进点引起的感受野减小。
减小网络高层的宽度，减少因增大分辨率带来的计算量。

Figure1是关于几种特征提取网络（backbone）的对比。

A中FPN在分类网络的基础上增加了不同层的融合操作，最终的预测层甚至包含stride等于64的P6层，也就是输出特征维度是输入图像的1/64，这么小的特征图对于目标的回归而言其实不是很有利，因为高层主要负责检测大尺寸目标，所以容易导致大尺寸目标的坐标回归不准。另外，虽然FPN通过特征融合的方式将高层特征和浅层特征进行融合可以提高浅层检测小尺寸目标的效果，但是由于高层这种大stride的原因，小尺寸目标的语义特征在高层丢失也比较多，因此即便融合也会对最后的结果有不利的影响。
B中分类网络一般最终的stride是32，也就是最终输出特征的尺寸是输入图像的1/32，比如在分类模型中常见的输入大小为224224的图像，输出特征大小是77。
C中DetNet的backbone并没有对输入图像做过多的降采样，最终的stride保持为16，这样相当于增加了最终输出特征图的尺寸（或者叫分辨率，spatial resolution）。DetNet整体上还是沿用了FPN的特征融合方式（这部分在Figure1C中没有体现出来），只不过对高层的stride做了修改，尽可能减少了高层的小尺寸目标语义特征的丢失。

在这里插入图片描述
但是如果网络高层的特征不做像分类网络那样多的降采样（将stride等于32修改为stride等于16）会带来两个问题：

增加计算量。这个很容易理解，毕竟特征图比之前的大，计算量的增加不可避免。
高层的感受野（receptive field）减小。感受野和信息丢失类似跷跷板，既然前面选择了尽可能减少高层的特征信息丢失，那么感受野减小也是情理之中。

那么怎么解决这两个问题呢？

针对问题1，主要是降低了网络高层的宽度，这个在Figure2D中展示得比较清楚了，高层的几个stage的每个block的输入特征通道都是256。而常见的分类算法中，比如ResNet越往高层的stage，特征通道数往往越大。

针对问题2，主要引入dilated卷积层来增大感受野，如Figure2的A和B所示，通过对比ResNet网络的residual block（Figure2C）可以看出主要是替换了传统的3*3卷积为dilated卷积层。因此Figure2中的A和B是DetNet网络中的基础结构（Figure2D所示）。
在这里插入图片描述
以下是分别是A:Dilated BottleNeck, B: Dilated BottleNeck with 1x1 conv projection, C: Original BottleNeck 的pytorch 实现代码：
代码来源：https://github.com/guoruoqian/DetNet_pytorch/blob/master/lib/model/fpn/detnet_backbone.py

A: Dilated BottleNeck

class BottleneckA(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BottleneckA, self).__init__()
        assert inplanes == (planes * 4), 'inplanes != planes * 4'
        assert stride == 1, 'stride != 1'
        assert downsample is None, 'downsample is not None'
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)  # inplanes = 1024, planes = 256
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, dilation=2,
                               padding=2, bias=False)  # stride = 1, dilation = 2
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:  # downsample always is None, because stride=1 and inplanes=expansion * planes
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

B: Dilated BottleNeck with 1x1 conv projection

class BottleneckB(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BottleneckB, self).__init__()
        assert inplanes == (planes * 4), 'inplanes != planes * 4'
        assert stride == 1, 'stride != 1'
        assert downsample is None, 'downsample is not None'
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)  # inplanes = 1024, planes = 256
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, dilation=2,
                               padding=2, bias=False)  # stride = 1, dilation = 2
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride
        self.extra_conv = nn.Sequential(
            nn.Conv2d(inplanes, planes * 4, kernel_size=1, bias=False),
            nn.BatchNorm2d(planes * 4)
        )

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        residual = self.extra_conv(x)

        if self.downsample is not None:  # downsample always is None, because stride=1 and inplanes=expansion * planes
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

C:Original BottleNeck

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

参考博客： https://blog.csdn.net/u014380165/article/details/81582623

Bing_Shieh

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
DetNet: A Backbone network for Object Detection 论文阅读

论文地址：https://arxiv.org/abs/1804.06215GitHub地址：https://github.com/guoruoqian/DetNet_pytorch论文开头提出了两个问题：分类任务和检测任务还是有一定差别的，因此用分类数据上训练的分类模型来提取特征用于检测任务不一定合适，比如检测任务比较关注目标的尺度特征，但是分类任务就不一定了。检测任务不仅仅要做目标的分...
复制链接

扫一扫

专栏目录