SSD源码解析

weixin_51363643

于 2024-04-22 21:43:37 发布

阅读量489

点赞数 4

分类专栏：基础网络结构文章标签：开发语言 python

本文链接：https://blog.csdn.net/weixin_51363643/article/details/138093416

版权

基础网络结构专栏收录该内容

7 篇文章

订阅专栏

本文介绍了如何基于ResNet50搭建SSD网络，通过调整conv_4的卷积核步距，提取前7个模块的特征。文章详细描述了如何构建额外的特征提取器、bbox_view的使用、loss的计算以及后处理算法，包括defaultbox的设置和非极大值抑制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SSD网络的搭建

借鉴的代码使用的是resnet50作为backbone,具体网络结构如下所示，舍弃conv_5x结构及其之后的网络，并且改变conv_4中第一个block的卷积核的stride，从2->1

文件目录中res50_backbone.py是从resnet50网络中直接粘过来的结构

对照resnet50结构看清后，可以看到只取前7个模块，具体如下所示：

因此在backbone中，用net.children()取前7个，同时系应该conv_4中block的步距

至此，前4个卷积的结构搭建好了。

之后构建后5个网络添加结构，调用_bulid_additional_features函数

def _build_additional_features(self, input_size):
    """
    为backbone(resnet50)添加额外的一系列卷积层，得到相应的一系列特征提取器
    :param input_size:
    :return:
    """
    additional_blocks = []
    # input_size = [1024, 512, 512, 256, 256, 256] for resnet50
    #对应从Feature Map2开始每一个模块中间Conv2d的channels
    middle_channels = [256, 256, 128, 128, 128]
    #切片左闭右开，序号从0开始，input_ch代表从0开始到倒数第二个，output_ch代表从1开始到最后一个
    for i, (input_ch, output_ch, middle_ch) in enumerate(zip(input_size[:-1], input_size[1:], middle_channels)):
        #5个结构都是相同的，唯一不同的是在第二个卷积的stride和padding不一样，所以进行判断，前三个相同，后两个相同
        padding, stride = (1, 2) if i < 3 else (0, 1)
        layer = nn.Sequential(
            nn.Conv2d(input_ch, middle_ch, kernel_size=1, bias=False),
            nn.BatchNorm2d(middle_ch),
            nn.ReLU(inplace=True),
            #第二个卷积层输入channels等于第一个输出的channels即middle_ch
            nn.Conv2d(middle_ch, output_ch, kernel_size=3, padding=padding, stride=stride, bias=False),
            nn.BatchNorm2d(output_ch),
            nn.ReLU(inplace=True),
        )
        additional_blocks.append(layer)
    self.additional_blocks = nn.ModuleList(additional_blocks)

网络搭建完成，开始预测器的构建

看看它的forward函数

detection_features：存储预测feature maps的列表

用bbox_view方法得到loc参数和conf参数，

在训练模式下，需要计算损失

def forward(self, image, targets=None):
    #图片数据放入feature_extractor中生产特征矩阵,到conv4_x输出为38*38*1024
    x = self.feature_extractor(image)

    # Feature Map 38x38x1024, 19x19x512, 10x10x512, 5x5x256, 3x3x256, 1x1x256
    #detection_features：存储预测特征面的列表
    detection_features = torch.jit.annotate(List[Tensor], [])  # [x]
    detection_features.append(x)
    for layer in self.additional_blocks:
        #将上一层的输出输入到当前layer当中，得到这一层输出
        x = layer(x)
        detection_features.append(x)

    # Feature Map 38x38x4, 19x19x6, 10x10x6, 5x5x6, 3x3x4, 1x1x4
    #bbox_view方法得到loc参数和conf参数
    locs, confs = self.bbox_view(detection_features, self.loc, self.conf)

    # For SSD 300, shall return nbatch x 8732 x {nlabels, nlocs} results
    # 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732
    #训练模式下，进一步计算预测参数的损失
    if self.training:
        if targets is None:
            raise ValueError("In training mode, targets should be passed")
        # bboxes_out (Tensor 8732 x 4), labels_out (Tensor 8732)
        bboxes_out = targets['boxes']
        bboxes_out = bboxes_out.transpose(1, 2).contiguous()
        # print(bboxes_out.is_contiguous())
        labels_out = targets['labels']
        # print(labels_out.is_contiguous())

        # ploc, plabel, gloc, glabel
        loss = self.compute_loss(locs, confs, bboxes_out, labels_out)
        return {"total_losses": loss}

    # 非训练模式下，做后处理，将预测回归参数叠加到default box上得到最终预测box，并执行非极大值抑制虑除重叠框
    # results = self.encoder.decode_batch(locs, confs)
    results = self.postprocess(locs, confs)
    return results

bbox_view方法：