（24）语义分割--BiSeNetV1 和 BiSeNetV2

chencaw

已于 2022-11-03 14:41:03 修改

阅读量981

点赞数

分类专栏： torch 文章标签： pytorch 人工智能 python 1024程序员节

于 2022-10-22 09:50:04 首次发布

本文链接：https://blog.csdn.net/chencaw/article/details/127456996

版权

1、主要参考

（1）github的地址

https://github.com/CoinCheung/BiSeNet

（2）华科还真是不错的学校

2、测试模型

（1）下载github代码

（2）下载权重

（3）测试一下

（1）bisenetV1

python tools/demo.py --config configs/bisenetv1_city.py --weight-path D:/pytorch_learning2022/5chen_segement_test2022/BiSeNet/weight/model_final_v1_city_new.pth  --img-path D:/pytorch_learning2022/5chen_segement_test2022/BiSeNet/example.png

（2）bisenetV2

python tools/demo.py --config configs/bisenetv2_city.py --weight-path D:/pytorch_learning2022/5chen_segement_test2022/BiSeNet/weight/model_final_v2_city.pth  --img-path D:/pytorch_learning2022/5chen_segement_test2022/BiSeNet/chen_test1.png

3、BiSeNetV1 的原理

3.1 论文题目

（1）论文下载地址

https://arxiv.org/abs/1808.00897

（2）题目

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

（3）中文翻译（哈哈，能看出点东西）

BiSeNet：用于实时语义分割的双边分割网络

3.2 论文的思想

陈简单整理，20221023

（一）语义分割面对两个主要问题：需要足够的空间信息（浅层网络），需要较好的语义信息（深层网络，或者说大的感受野）。

（二）传统的方法如何如何，本文采用了两个分支网络：（1）其中一个分支网络使用较浅的卷积获得了大的空间信息（Spatial Path）下采样1/8，（2）其中另一个卷积网络使用快速下采样（各类轻量网络都可以）获得了较好的感受野（Context Path），下采样1/32；（3）最后两个网络融合起来。

（三）该方法实现了空间信息和语义信息（感受野）的解耦！

思想如下：

3.3 语义分割面临的问题和传统解决方法

ps：语义分割的速度当然很重要，抛开速度讲精度是耍......

论文中总结了三种语义分割的加速方法

（一）通过裁剪输入图像尺寸的方法降低运算量来加速。该方法简单有效，但是很丢失空间信息，特别是边缘。

（二）通过对神经网络剪枝来加速推理。尤其是在网络的早期阶段剪枝，该方法会丢失空间信息。

（三）丢弃最后的网络模块来获得极致精巧的网络，比如ENet。然而放弃了最后阶段的操作中的下采样，模型的接受范围不足以覆盖大物体，导致辨别能力差。

上述三个方法都是为了速度对精度进行了妥协。

3.4 传统U形网络的问题

为了弥补上述空间细节的缺失，研究人员广泛利用U形结构。（ps：比如说我们熟悉的Unet和Unet2）。通过融合骨干网络的分层特征，U形结构逐渐提高了空间分辨率，并填补了一些缺失的细节，然后该技术也有两个弱点：

（1）由于额外的引入了高分辨率特征，计算速度会下降。

（2）在剪枝过程中丢失的大部分空间信息无法通过浅层网络来轻松恢复。如下图所示。

下面这句翻译真不靠谱啊。

3.5 我们（本文）的语义分割bisenet的方法

（一）网络结构如下图所示：

（二）本论文的三个卖点：

（1）提出了一种创新的方法，使用2个通道（空间通道Spatial Path，语义通道Context Path）来解耦空间信息和大的感受野。

（2）设计了两个特殊的模块，特征融合模块FFM和注意力增强模块ARM，在增加适度开销（运算量）的情况下改进了精度。

（3）在三个数据集上获得了很好的结果。

（三）网络整体结构如下：

3.6 空间通道Spatial path的实现

（1）当前研究中为了保持输入图像分辨率的方法一些研究采用了空洞卷积（dilated convolution），另外一些研究为了获得足够的感受野使用了金字塔池模块（pyramid pooling module），ASPP（atrous spatial pyramid pooling）或者大的卷积核（large kernel）。（PS应该对应deeplab的v1、v2和v3版本）。

这些研究也表明空间信息和感受野对实现高精度检测的重要性，但是要同时满足他们太难了，特别是要求实现实时语义分割的情况下。

该论文提出的空间通道包括三个卷积层，每个卷积层的stride=2，然后跟着BN和Relu，因而该通道实现了原图的1/8采样，因而获得了丰富的空间信息（PS：1/8下采样应该还算是大的）

（2）论文中对应的图如下所示：

（3）陈根据模型自己画了一遍，20221024，PS：发现是4层卷积

PS：第一层卷积和deeplabv3的是一样的

（4）陈根据源码导出，20221024

（5）作者对应的代码实现部分，ps：包含了我测试的导出代码

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision

# from resnet import Resnet18

from torch.nn import BatchNorm2d

#清晰打印网络结构
from    torchinfo import summary

#保存为onnx
import torch
import torch.onnx
from torch.autograd import Variable

#导出有尺寸
import onnx
# from onnx import shape_inference

class ConvBNReLU(nn.Module):

    def __init__(self, in_chan, out_chan, ks=3, stride=1, padding=1, *args, **kwargs):
        super(ConvBNReLU, self).__init__()
        self.conv = nn.Conv2d(in_chan,
                out_chan,
                kernel_size = ks,
                stride = stride,
                padding = padding,
                bias = False)
        self.bn = BatchNorm2d(out_chan)
        self.relu = nn.ReLU(inplace=True)
        self.init_weight()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

    def init_weight(self):
        for ly in self.children():
            if isinstance(ly, nn.Conv2d):
                nn.init.kaiming_normal_(ly.weight, a=1)
                if not ly.bias is None: nn.init.constant_(ly.bias, 0)


class SpatialPath(nn.Module):
    def __init__(self, *args, **kwargs):
        super(SpatialPath, self).__init__()
        self.conv1 = ConvBNReLU(3, 64, ks=7, stride=2, padding=3)
        self.conv2 = ConvBNReLU(64, 64, ks=3, stride=2, padding=1)
        self.conv3 = ConvBNReLU(64, 64, ks=3, stride=2, padding=1)
        self.conv_out = ConvBNReLU(64, 128, ks=1, stride=1, padding=0)
        self.init_weight()

    def forward(self, x):
        feat = self.conv1(x)
        feat = self.conv2(feat)
        feat = self.conv3(feat)
        feat = self.conv_out(feat)
        return feat

    def init_weight(self):
        for ly in self.children():
            if isinstance(ly, nn.Conv2d):
                nn.init.kaiming_normal_(ly.weight, a=1)
                if not ly.bias is None: nn.init.constant_(ly.bias, 0)

    def get_params(self):
        wd_params, nowd_params = [], []
        for name, module in self.named_modules():
            if isinstance(module, nn.Linear) or isinstance(module, nn.Conv2d):
                wd_params.append(module.weight)
                if not module.bias is None:
                    nowd_params.append(module.bias)
            elif isinstance(module, nn.modules.batchnorm._BatchNorm):
                nowd_params += list(module.parameters())
        return wd_params, nowd_params

def save_onnx(model,x,model_file_name):
    torch_out = torch.onnx.export(model, x, 
                              model_file_name,
                               export_params=True,
                               verbose=True)

def save_scale_onnx(model_file_name):
    model = model_file_name
    onnx.save(onnx.shape_inference.infer_shapes(onnx.load(model)), model)

if __name__ == "__main__":

    sp_net = SpatialPath()
    x = torch.randn(16, 3, 640, 480)
    sp_out = sp_net(x)
    print(sp_out.shape)
    sp_net.get_params()

    model_file_name = "D:/pytorch_learning2022/5chen_segement_test2022/BiSeNet/chentest_print_mode/chen_sp.onnx"
    #打印网络结构
    summary(sp_net, input_size=(16, 3, 640, 480))
    #保存为onnx
    save_onnx(sp_net,x,model_file_name)
    #保存为onnx 有尺寸
    save_scale_onnx(model_file_name)

3.7 语义通道Context path的内容

语义通道旨在提供足够的感受野，本采用的语义通道使用了轻量化网络（lightweight model）和全局平均池化技术（global average pooling）。

（1）轻量化网络比如说xception都能提供快速的下采样从而获得大的感受野，ps：实际上代码你用了resnet18。

（2）然后本文在轻量级模型的尾部添加了一个全局平均池，它可以为语义通道提供最大的感受野信息。

（3）最后通过ARM模块和FFM模块融合输出

3.8 所使用的Resnet18网络

Resnet18的网络详见

（26）到处都可能用到的基础网络resnet18和resnet50_chencaw的博客-CSDN博客

（1）本文所使用的Resnet模块如下，要修改

（2）resnet18的图如下所示，PS：手绘

（3）resnet18的图导出如下

（4）对应测试代码

#!/usr/bin/python
# -*- encoding: utf-8 -*-

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.model_zoo as modelzoo

resnet18_url = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'

from torch.nn import BatchNorm2d
#清晰打印网络结构
from    torchinfo import summary

#保存为onnx
import torch
import torch.onnx
from torch.autograd import Variable

#导出有尺寸
import onnx
# from onnx import shape_inference


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


class BasicBlock(nn.Module):
    def __init__(self, in_chan, out_chan, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(in_chan, out_chan, stride)
        self.bn1 = BatchNorm2d(out_chan)
        self.conv2 = conv3x3(out_chan, out_chan)
        self.bn2 = BatchNorm2d(out_chan)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = None
        if in_chan != out_chan or stride != 1:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_chan, out_chan,
                          kernel_size=1, stride=stride, bias=False),
                BatchNorm2d(out_chan),
                )

    def forward(self, x):
        residual = self.conv1(x)
        residual = self.bn1(residual)
        residual = self.relu(residual)
        residual = self.conv2(residual)
        residual = self.bn2(residual)

        shortcut = x
        if self.downsample is not None:
            shortcut = self.downsample(x)

        out = shortcut + residual
        out = self.relu(out)
        return out


def create_layer_basic(in_chan, out_chan, bnum, stride=1):
    layers = [BasicBlock(in_chan, out_chan, stride=stride)]
    for i in range(bnum-1):
        layers.append(BasicBlock(out_chan, out_chan, stride=1))
    return nn.Sequential(*layers)


class Resnet18(nn.Module):
    def __init__(self):
        super(Resnet18, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = create_layer_basic(64, 64, bnum=2, stride=1)
        self.layer2 = create_layer_basic(64, 128, bnum=2, stride=2)
        self.layer3 = create_layer_basic(128, 256, bnum=2, stride=2)
        self.layer4 = create_layer_basic(256, 512, bnum=2, stride=2)
        self.init_weight()

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        feat8 = self.layer2(x) # 1/8
        feat16 = self.layer3(feat8) # 1/16
        feat32 = self.layer4(feat16) # 1/32
        return feat8, feat16, feat32

    def init_weight(self):
        state_dict = modelzoo.load_url(resnet18_url)
        self_state_dict = self.state_dict()
        for k, v in state_dict.items():
            if 'fc' in k: continue
            self_state_dict.update({k: v})
        self.load_state_dict(self_state_dict)

    def get_params(self):
        wd_params, nowd_params = [], []
        for name, module in self.named_modules():
            if isinstance(module, (nn.Linear, nn.Conv2d)):
                wd_params.append(module.weight)
                if not module.bias is None:
                    nowd_params.append(module.bias)
            elif isinstance(module, nn.modules.batchnorm._BatchNorm):
                nowd_params += list(module.parameters())
        return wd_params, nowd_params


def save_onnx(model,x,model_file_name):
    torch_out = torch.onnx.export(model, x, 
                              model_file_name,
                               export_params=True,
                               verbose=True)

def save_scale_onnx