（20）语义分割--STDC--原理_stdc seg模型推理-CSDN博客

本文链接：https://blog.csdn.net/chencaw/article/details/127409468

本文深入探讨了STDC网络，一种针对实时语义分割问题的高效结构。作者通过改进BiSeNet的multi-path结构，提出STDC模块，减少计算量并保持多尺度特征的提取。STDC网络在Cityscapes和CamVid数据集上表现出色，实现了速度与精度的良好平衡。此外，文章介绍了网络的详细实现，包括不同模块的原理和代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、主要参考

（1）参考的blog

【语义分割】——STDC-Seg快又强 + 细节边缘的监督_农夫山泉2号的博客-CSDN博客_stdcseg

【CVPR2021语义分割】STDC语义分割网络|BiSeNet的轻量化加强版 - 知乎

【STDC】《Rethinking BiSeNet For Real-time Semantic Segmentation》_bryant_meng的博客-CSDN博客

（2）github地址

https://github.com/chenjun2hao/STDC-Seg

（3）论文下载地址

https://openaccess.thecvf.com/content/CVPR2021/papers/Fan_Rethinking_BiSeNet_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf

（4）作者单位：

美团，做外卖机器人？

2、主要原理

2.1论文题目

Rethinking BiSeNet For Real-time Semantic Segmentation

Rethinking说明了一切

关于v1和v2的整理见下面文档

（24）语义分割--BiSeNetV1 和 BiSeNetV2_chencaw的博客-CSDN博客

2.2摘要看看先

BiSeNet[28,27]已被证明是一种流行的用于实时分割的双流网络。但是，它通过额外的通道来编码空间信息的方法是耗时的。而且主干网从预先训练过模型而来，比如图像分类的网络，由于特定任务设计的不同，可能无法有效地进行图像分割。

为了解决这些问题，我们提出了一种新的、有效的结构，即短期密集级联网络(Short-Term Dense Concatenate networkSTDC network)，它消除了结构冗余。

具体来说，我们逐步降低特征图的维数，利用特征图的聚合来表示图像，形成了STDC网络的基本模块。在解码器中，我们提出了一个细节聚合模块，将空间信息的学习以单流的方式集成到底层。最后，融合底层特征和深层特征，预测最终的分割结果。

在Cityscapes和CamVid数据集上的大量实验证明了我们的方法的有效性，实现了分割精度和推理速度之间的良好平衡。

在cityscape上，我们在测试集上实现了71.9%的mIoU，在NVIDIA GTX 1080Ti上的速度为250.4 FPS，比最新的方法快45.2%。在更高图像分辨率上以97.0 FPS的速度推断，实现了76.8%的mIoU。

2.3 研究背景

(1)当前的实时语义分割方法：

在实时推理方面，一些工作，如（1）DFANet[18]和BiSeNetV1[28]选择了轻量级的主干，并研究了特征融合或聚合模块的方法来补偿精度的下降。用了空间信息XXX不好，预训练模型是分类来的，如何不好

（2）另一些工作通过降低输入分辨率，如何XXXXX不好

2.4 STDC网络

作者提出了STDC模块，能够使用较少的参数量提取多尺度特征，且能够很方便地集成到U-Net类型的语义分割网络中；对BiSeNet中的multi-path结构做出改进，在提取底层细节特征的同时减少网络计算量。

2.4.1 STDC模块的原理

（1）通用的STDC结构如下图a所示

（2）本文提出的STDC结构，没有stride=2的（下采样的）如图b所示

其中，图中的ConvX表示“卷积+BN+ReLU”操作，M表示输入特征通道数，N表示输出特征通道数。每个模块ConvX有着不同的核大小

下面引用了大佬的翻译

【CVPR2021语义分割】STDC语义分割网络|BiSeNet的轻量化加强版 - 知乎

在STDC模块中，第1个block的卷积核尺寸为1×1，其余block的卷积核尺寸为3×3。

若STDC模块的最终输出通道数为N，除最后一个block外，该模块内第i个block的输出通道数为N/2i；最后一个block的输出特征通道数与倒数第二个block保持一致。

与传统的backbone不同的是，STDC模块中深层的特征通道数少，浅层的特征通道数多。作者认为，浅层需要更多通道的特征编码细节信息；深层更关注高层次语义信息，过多的特征通道数量会导致信息冗余。

STDC模块最终的输出为各block输出特征的融合，即

上式中的F表示融合函数，x1,x2,…,xn表示n个block的输出，xoutput 表示STDC模块的输出。使用concatenation操作融合n个block的特征。

（3）本文提出的STDC结构，包含stride=2的如图c所示

PS注意看：上图中Block2中有一个stride=2，AVG Pool中也有一个stride=2

对于stride=2版本的STDC模块，在Block2中进行下采样操作；为了在融合时保证feature map尺寸一致，对大尺寸的feature map使用stride=2、3×3的average pooling操作进行下采样

STDC模块有2个特点：（1）随着网络加深，逐渐减少特征通道数，以减少计算量；（2）STDC的输出融合了多个block的输出feature map，包含多尺度信息。

2.4.2 本文网络的结构

（1）下图表示由STDC模块组成的STDC网络，就是上面也提过的图a

该网络包含6个Stage，Stage1~Stage5中都对feature map进行了步长为2的下采样，Stage6输出预测结果。

为了减少计算量，Stage1和Stage2中只使用1个卷积层。Stage3~Stage5中每个Stage包含若干个STDC模块，其中第1个STDC模块包含下采样操作，其余STDC模块保持feature map尺寸不变。

以上图为框架，作者构建了2个STDC网络，分别命名为STDC1和STDC2，它们的结构如下表所示：

表中的ConvX表示“卷积+BN+ReLU”操作，Stage3~Stage5均由若干个STDC模块组成。上表中的KSize表示kernel尺寸，S表示步长，R表示重复次数，C表示输出通道数。

2.5 STDC网络的分类测试

针对如下例子，参加前面整理的教程

（2）pokeman_简单卷积分类的例子_chencaw的博客-CSDN博客

2.5.1 使用了作者的网络

（1）简单的文件stdcnet.py，作者的github提供，无需更改

import torch
import torch.nn as nn
from torch.nn import init
import math
from  torch.nn import functional as F  #额外添加一下，陈20221104


class ConvX(nn.Module):
    def __init__(self, in_planes, out_planes, kernel=3, stride=1):
        super(ConvX, self).__init__()
        self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel, stride=stride, padding=kernel//2, bias=False)
        self.bn = nn.BatchNorm2d(out_planes)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        out = self.relu(self.bn(self.conv(x)))
        return out


class AddBottleneck(nn.Module):
    def __init__(self, in_planes, out_planes, block_num=3, stride=1):
        super(AddBottleneck, self).__init__()
        assert block_num > 1, print("block number should be larger than 1.")
        self.conv_list = nn.ModuleList()
        self.stride = stride
        if stride == 2:
            self.avd_layer = nn.Sequential(
                nn.Conv2d(out_planes//2, out_planes//2, kernel_size=3, stride=2, padding=1, groups=out_planes//2, bias=False),
                nn.BatchNorm2d(out_planes//2),
            )
            self.skip = nn.Sequential(
                nn.Conv2d(in_planes, in_planes, kernel_size=3, stride=2, padding=1, groups=in_planes, bias=False),
                nn.BatchNorm2d(in_planes),
                nn.Conv2d(in_planes, out_planes, kernel_size=1, bias=False),
                nn.BatchNorm2d(out_planes),
            )
            stride = 1

        for idx in range(block_num):
            if idx == 0:
                self.conv_list.append(ConvX(in_planes, out_planes//2, kernel=1))
            elif idx == 1 and block_num == 2:
                self.conv_list.append(ConvX(out_planes//2, out_planes//2, stride=stride))
            elif idx == 1 and block_num > 2:
                self.conv_list.append(ConvX(out_planes//2, out_planes//4, stride=stride))
            elif idx < block_num - 1:
                self.conv_list.append(ConvX(out_planes//int(math.pow(2, idx)), out_planes//int(math.pow(2, idx+1))))
            else:
                self.conv_list.append(ConvX(out_planes//int(math.pow(2, idx)), out_planes//int(math.pow(2, idx))))
            
    def forward(self, x):
        out_list = []
        out = x

        for idx, conv in enumerate(self.conv_list):
            if idx == 0 and self.stride == 2:
                out = self.avd_layer(conv(out))
            else:
                out = conv(out)
            out_list.append(out)

        if self.stride == 2:
            x = self.skip(x)

        return torch.cat(out_list, dim=1) + x



class CatBottleneck(nn.Module):
    def __init__(self, in_planes, out_planes, block_num=3, stride=1):
        super(CatBottleneck, self).__init__()
        assert block_num > 1, print("block number should be larger than 1.")
        self.conv_list = nn.ModuleList()
        self.stride = stride
        if stride == 2:
            self.avd_layer = nn.Sequential(
                nn.Conv2d(out_planes//2, out_planes//2, kernel_size=3, stride=2, padding=1, groups=out_planes//2, bias=False),
                nn.BatchNorm2d(out_planes//2),
            )
            self.skip = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)
            stride = 1

        for idx in range(block_num):
            if idx == 0:
                self.conv_list.append(ConvX(in_planes, out_planes//2, kernel=1))
            elif idx == 1 and block_num == 2:
                self.conv_list.append(ConvX(out_planes//2, out_planes//2, stride=stride))
            elif idx == 1 and block_num > 2:
                self.conv_list.append(ConvX(out_planes//2, out_planes//4, stride=stride))
            elif idx < block_num - 1:
                self.conv_list.append(ConvX(out_planes//int(math.pow(2, idx)), out_planes//int(math.pow(2, idx+1))))
            else:
                self.conv_list.append(ConvX(out_planes//int(math.pow(2, idx)), out_planes//int(math.pow(2, idx))))
            
    def forward(self, x):
        out_list = []
        out1 = self.conv_list[0](x)

        for idx, conv in enumerate(self.conv_list[1:]):
            if idx == 0:
                if self.stride == 2:
                    out = conv(self.avd_layer(out1))
                else:
                    out = conv(out1)
            else:
                out = conv(out)
            out_list.append(out)

        if self.stride == 2:
            out1 = self.skip(out1)
        out_list.insert(0, out1)

        out = torch.cat(out_list, dim=1)
        return out

#STDC2Net
class STDCNet1446(nn.Module):
    def __init__(self, base=64, layers=[4,5,3], block_num=4, type="cat", num_classes=1000, dropout=0.20, pretrain_model='', use_conv_last=False):
        super(STDCNet1446, self).__init__()
        if type == "cat":
            block = CatBottleneck
        elif type == "add":
            block = AddBottleneck
        self.use_conv_last = use_conv_last
        self.features = self._make_layers(base, layers, block_num, block)
        self.conv_last = ConvX(base*16, max(1024, base*16), 1, 1)
        self.gap = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(max(1024, base*16), max(1024, base*16), bias=False)
        self.bn = nn.BatchNorm1d(max(1024, base*16))
        self.