VovNetV2代码分析之detectron2(backbone)

最新推荐文章于 2024-03-30 18:02:08 发布

电饭锅22

最新推荐文章于 2024-03-30 18:02:08 发布

阅读量3.6k

点赞数 6

分类专栏： detectron2 文章标签： python

本文链接：https://blog.csdn.net/wenghd22/article/details/114371301

版权

detectron2 专栏收录该内容

5 篇文章 3 订阅

订阅专栏

文章目录

前言
一、优化的地方
二、VovNet
- 1.VoVNetV2
- 2.代码分析
总结

前言

ResNet是目标检测模型最常用的backbone，DenseNet其实比ResNet提取特征能力更强，而且其参数更少，计算量（FLOPs）也更少，用于目标检测虽然效果好，但是速度较慢，这主要是因为DenseNet中密集连接所导致的高内存访问成本和能耗。
VoVNet就是为了解决DenseNet这一问题，基于VoVNet的目标检测模型性能超越基于DenseNet的模型，速度也更快，相比ResNet也是性能更好。

代码实现参考：https://github.com/youngwanLEE/vovnet-detectron2

一、优化的地方

在这里插入图片描述
DenseNet的一大问题就是密集连接太重了，而且每个layer都会聚合前面层的特征，其实造成的是特征冗余。
这种信息冗余反而是可以优化的方向，据此这里提出了OSA（One-Shot Aggregation）模块，如图1b所示，简单来说，就是只在最后一次性聚合前面所有的layer。
DenseNet中很多中间特征可能是冗余的，这对于目标检测非常重要，因为检测模型一般的输入都是较大的。

二、VovNet

VoVNet由OSA模块构成
首先是一个由3个3x3卷积层构成的stem block，
然后4个阶段的OSA模块，每个stage的最后会采用一个stride为2的3x3 max pooling层进行降采样，模型最终的output stride是32。与其他网络类似，每次降采样后都会提升特征的channel数。
每个OSA模块里面都是5个3x3 conv，然后concat。
在这里插入图片描述

1.VoVNetV2

VoVNetV2引入了ResNet的残差连接和SENet的SE模块

在这里插入图片描述
从图b可以看到，改进的OSA模块直接将输入加到输出上，增加短路连接，使得VoVNet可以训练更深的网络，论文中是VoVNet-99。
从图2c可以看到，改进的另外一个点是在最后的特征层上加上了sSE模块来进一步增强特征，原始的SE模块包含两个FC层，其中中间的FC层主要是为降维，这在一定程度上会造成信息丢失。而sSE模块是去掉了这个中间FC层。

VoVNetV2相比VoVNet增加了少许的计算量，但是模型性能有提升。

模型效果参考https://blog.csdn.net/xiaohu2022/article/details/105318534/

2.代码分析

VoVNet39_eSE = {
    'stem': [64, 64, 128],      #三个stem模块输入的通道数
    "stage_conv_ch": [128, 160, 192, 224], #stage2-5的输入通道
    "stage_out_ch": [256, 512, 768, 1024],  #stage2-5的输出通道
    "layer_per_block": 5,                   #每个osa模块包含的3*3 conv数量
    "block_per_stage": [1, 1, 2, 2],        #每个stage的osa模块数量
    "eSE": True,                            #ese注意力机制
    "dw" : False							#Depthwise卷积与Pointwise卷积
}
# OSA模块的实现
class _OSA_module(nn.Module):
    def __init__(self, in_ch, stage_ch, concat_ch, layer_per_block, module_name, 
    			SE=False, identity=False, depthwise=False):
        super(_OSA_module, self).__init__()

        self.identity = identity
        self.depthwise = depthwise
        self.isReduced = False
        self.layers = nn.ModuleList()
        in_channel = in_ch
        # 输入通道不一致 1*1降维 
        if self.depthwise and in_channel != stage_ch:
            self.isReduced = True
            self.conv_reduction = nn.Sequential(
                OrderedDict(conv1x1(in_channel, stage_ch, 
                  "{}_reduction".format(module_name), "0")))  
         # 5个3*3 conv串联           
        for i in range(layer_per_block):
            if self.depthwise:
                self.layers.append(nn.Sequential(OrderedDict(dw_conv3x3(stage_ch,
                 stage_ch, module_name, i))))
            else:
                self.layers.append(nn.Sequential(OrderedDict(conv3x3(in_channel, 
                stage_ch, module_name, i)))
                )
            in_channel = stage_ch

        # feature aggregation
        in_channel = in_ch + layer_per_block * stage_ch  #五个卷积和输入的通道总数
        self.concat = nn.Sequential(
            OrderedDict(conv1x1(in_channel, concat_ch, module_name, "concat")))
        self.ese = eSEModule(concat_ch)   #ese注意力的输入是拼接后的

    def forward(self, x):
        identity_feat = x     # 残差连接
        output = []
        output.append(x)    
        if self.depthwise and self.isReduced:
            x = self.conv_reduction(x)
        for layer in self.layers:
            x = layer(x)
            output.append(x)
        x = torch.cat(output, dim=1)    #拼接
        xt = self.concat(x)            #降维
        xt = self.ese(xt)
        if self.identity:				#残差
            xt = xt + identity_feat
        return xt
        
#4个stage2-5的具体实现，每个stage内的osa数量不同
class _OSA_stage(nn.Sequential):
    def __init__(self, in_ch, stage_ch, concat_ch, block_per_stage, 
    			layer_per_block, stage_num, SE=False, depthwise=False):

        super(_OSA_stage, self).__init__()
		# 除了第一个stage2以外 ，使用最大池化 降采样
        if not stage_num == 2:
            self.add_module("Pooling", nn.MaxPool2d(kernel_size=3, stride=2,
             				ceil_mode=True))
        # 多个osa模块 则不开启se注意力
        if block_per_stage != 1:
            SE = False
        module_name = f"OSA{stage_num}_1"
        # 增加第一个OSA模块
        self.add_module(
            module_name, _OSA_module(in_ch, stage_ch, concat_ch, layer_per_block,
             module_name, SE, depthwise=depthwise))
         # 如果不止1个则进行循环，并在最后一个使用se模块
        for i in range(block_per_stage - 1):
            if i != block_per_stage - 2:  # last block
                SE = False
            module_name = f"OSA{stage_num}_{i + 2}"
            self.add_module(module_name,_OSA_module(concat_ch,stage_ch,concat_ch,
            layer_per_block,module_name,SE,identity=True,depthwise=depthwise),)
# 网络整体结构
class VoVNet(Backbone):
    def __init__(self, cfg, input_ch, out_features=None):
        super(VoVNet, self).__init__()
        global _NORM
         # 读取配置文件参数
        _NORM = cfg.MODEL.VOVNET.NORM
        stage_specs = _STAGE_SPECS[cfg.MODEL.VOVNET.CONV_BODY]
        stem_ch = stage_specs["stem"]
        config_stage_ch = stage_specs["stage_conv_ch"]
        config_concat_ch = stage_specs["stage_out_ch"]
        block_per_stage = stage_specs["block_per_stage"]
        layer_per_block = stage_specs["layer_per_block"]
        SE = stage_specs["eSE"]
        depthwise = stage_specs["dw"]

        self._out_features = out_features    #指定输出模块
        # Stem module
        conv_type = dw_conv3x3 if depthwise else conv3x3    
        stem = conv3x3(input_ch, stem_ch[0], "stem", "1", 2)
        stem += conv_type(stem_ch[0], stem_ch[1], "stem", "2", 1)
        stem += conv_type(stem_ch[1], stem_ch[2], "stem", "3", 2)
        #add to self._modules in class Module ？
        self.add_module("stem", nn.Sequential((OrderedDict(stem)))) 
        current_stirde = 4  # 2*2 stem1 stem3 ？
        self._out_feature_strides = {"stem": current_stirde, 
        							"stage2": current_stirde}  # 步长（降采样倍数）
        self._out_feature_channels = {"stem": stem_ch[2]}   # 输出通道数（各个特征图）

        stem_out_ch = [stem_ch[2]]
        in_ch_list = stem_out_ch + config_concat_ch[:-1] # 每个stage的输入通道数
        # OSA stages
        self.stage_names = []
        for i in range(4):  # num_stages
            name = "stage%d" % (i + 2)  # stage 2 ... stage 5
            self.stage_names.append(name)
            self.add_module(
                name,
                _OSA_stage(
                    in_ch_list[i],
                    config_stage_ch[i],
                    config_concat_ch[i],
                    block_per_stage[i],
                    layer_per_block,
                    i + 2,
                    SE,depthwise,),)
            # 写入模块名和对应的输出通道数
            self._out_feature_channels[name] = config_concat_ch[i]
            if not i == 0:  #除了stage2 后面stride都*2
                self._out_feature_strides[name] = current_stirde 
                								= int(current_stirde * 2)
        # initialize weights
        self._initialize_weights()
 
    def forward(self, x):
        outputs = {}
        x = self.stem(x)
        if "stem" in self._out_features:
            outputs["stem"] = x
        for name in self.stage_names:
            x = getattr(self, name)(x)
            if name in self._out_features:
                outputs[name] = x

        return outputs

总结

记录一下：
看源码先从总体结构开始看，有一个大致的框架后，深入了解每一个模块
从外到内，好像比较简单，可能看多了可能就有感觉了吧。。
搞清楚函数之间的调用关系，顺序，作用
参数传递是比较麻烦。。

参考：
https://blog.csdn.net/xiaohu2022/article/details/105318534/

电饭锅22

关注

6
点赞
踩
29

收藏

觉得还不错? 一键收藏
2
评论
VovNetV2代码分析之detectron2(backbone)

文章目录前言一、优化的地方二、VovNet1.VoVNetV22.代码分析总结前言ResNet是目标检测模型最常用的backbone，DenseNet其实比ResNet提取特征能力更强，而且其参数更少，计算量（FLOPs）也更少，用于目标检测虽然效果好，但是速度较慢，这主要是因为DenseNet中密集连接所导致的高内存访问成本和能耗。VoVNet就是为了解决DenseNet这一问题，基于VoVNet的目标检测模型性能超越基于DenseNet的模型，速度也更快，相比ResNet也是性能更好。代码实现
复制链接

扫一扫