语义分割DeepLab v1/v2/v3系列网络模型

花花少年

已于 2024-01-30 18:23:53 修改

阅读量946

点赞数 6

分类专栏：深度学习文章标签： DeepLab 语义分割

于 2024-01-26 12:17:03 首次发布

本文链接：https://blog.csdn.net/m0_37605642/article/details/135863251

版权

深度学习专栏收录该内容

117 篇文章 78 订阅

订阅专栏

重要说明：本文从网上资料整理而来，仅记录博主学习相关知识点的过程，侵删。

一、参考资料

经典的语义分割(semantic segmentation)网络模型

二、DeepLab系列网络模型

1. DeepLab v1

原始论文：[1]

DeepLabV1网络简析

bilibili视频讲解：DeepLabV1网络简介(语义分割)

DeepLab v1加入了多尺度的特性，是LargeFOV的升级版。

1.1引言

针对语义分割任务，信号下采样导致分辨率降低和空间“不敏感” 问题。

信号下采样导致分辨率降低。作者说主要是采用Maxpooling导致的，为了解决这个问题作者引入了'atrous'(with holes) algorithm（空洞卷积 / 膨胀卷积 / 扩张卷积）。
空间“不敏感”。作者说分类器自身的问题，因为分类器本来就具备一定空间不变性。为了解决这个问题，作者采用了fully-connected CRF(Conditional Random Field)方法，这个方法只在DeepLabv1-v2中使用到了，从v3之后就不去使用了，而且这个方法挺耗时的。

1.2 backbone

DeepLab v1的backbone为VGG-16。

2. DeepLab v2

原始论文：[2]

DeepLabV2网络简析

解读DeepLab v2

bilibili视频讲解：DeepLabV2网络简介(语义分割)

DeepLab v2加入了ASPP模块，通过四个并行的膨胀卷积层，每个分支上的膨胀卷积层所采用的膨胀系数不同。这里的膨胀卷积层后面没有BatchNorm，并使用了Bias偏置。接着通过add相加的方式融合四个分支上的输出。

2.1 引言

在文章的引言部分，作者提出了DCNNs应用在语义分割任务中遇到的问题。

分辨率被降低（主要由于下采样stride>1的层导致）。
目标的多尺度问题。
DCNNs的不变性(invariance)会降低定位精度。

解决办法

针对分辨率被降低的问题，一般就是将最后的几个Maxpooling层的stride给设置成1(如果是通过卷积下采样的，比如resnet，同样将stride设置成1即可)，然后在配合使用膨胀卷积。
针对目标多尺度的问题，最容易想到的就是将图像缩放到多个尺度分别通过网络进行推理，最后将多个结果进行融合即可。这样做虽然有用但是计算量太大了。为了解决这个问题，DeepLab v2 中提出了ASPP模块（atrous spatial pyramid pooling）。
针对DCNNs不变性导致定位精度降低的问题，和DeepLab v1差不多还是通过CRFs解决，不过这里用的是fully connected pairwise CRF，相比V1里的fully connected CRF要更高效点。在DeepLab v2中CRF涨点就没有DeepLab v1猛了，在DeepLab v1中大概能提升4个点，在DeepLab v2中通过Table4可以看到大概只能提升1个多点了。

2.2 backbone

DeepLab v1的backbone为ResNet101。

2.3 DeepLab v2流程

如下图所示，和v1的流程类似，DeepLab v2的流程为：输入Input -> CNN提取特征 -> 粗糙的分割图（1/8原图大小） -> 双线性插值回原图大小 -> CRF后处理 -> 最终输出Output。

在这里插入图片描述

2.4 DeepLab v2网络结构

这里以ResNet101作为backbone为例。在ResNet的Layer3中的Bottleneck1中原本是需要下采样的（3x3的卷积层stride=2），但在DeepLab v2中将stride设置为1，即不在进行下采样。而且3x3卷积层全部采用膨胀卷积膨胀系数为2。在Layer4中也是一样，取消了下采样，所有的3x3卷积层全部采用膨胀卷积膨胀系数为4。最后需要注意的是ASPP模块，在以ResNet101做为Backbone时，每个分支只有一个3x3的膨胀卷积层，且卷积核的个数都等于num_classes。

在这里插入图片描述

2.5 代码示例

这里以VGG-16作为backbone为例。

import torch
import torch.nn as nn
import torch.nn.functional as F


class ASPP(nn.Module):
    def __init__(self, in_channels, num_classes):
        super().__init__()
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=128, kernel_size=3, stride=1, padding=6, dilation=6, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=num_classes, kernel_size=1, stride=1, padding=0, bias=True),
        )
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=128, kernel_size=3, stride=1, padding=12, dilation=12, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=num_classes, kernel_size=1, stride=1, padding=0, bias=True),
        )
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=128, kernel_size=3, stride=1, padding=18, dilation=18, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=num_classes, kernel_size=1, stride=1, padding=0, bias=True),
        )
        self.branch4 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=128, kernel_size=3, stride=1, padding=24, dilation=24, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=1, stride=1, padding=0, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=num_classes, kernel_size=1, stride=1, padding=0, bias=True),
        )
        
    def forward(self, x):
        return self.branch1(x) + self.branch2(x) + self.branch3(x) + self.branch4(x)
    
    
class DeepLabv2(nn.Module):
    def __init__(self, in_channels: int = 3, num_classes: int = 21):
        super().__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=64, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
        )
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
        )
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
        )
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv4 = nn.Sequential(
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1, bias=True),
            nn.ReLU(inplace=True),
        )
        self.pool4 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.conv5 = nn.Sequential(
            nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=2, dilation=2, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=2, dilation=2, bias=True),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=2, dilation=2, bias=True),
            nn.ReLU(inplace=True),
        )
        self.pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.ASPP = ASPP(in_channels=512, num_classes=num_classes)
        
    def forward(self, x):
        conv1_x = self.conv1(x)
        print('# Conv1 output shape:', conv1_x.shape)
        pool1_x = self.pool1(conv1_x)
        print('# Pool1 output shape:', pool1_x.shape)
        conv2_x = self.conv2(pool1_x)
        print('# Conv2 output shape:', conv2_x.shape)
        pool2_x = self.pool2(conv2_x)
        print('# Pool2 output shape:', pool2_x.shape)
        conv3_x = self.conv3(pool2_x)
        print('# Conv3 output shape:', conv3_x.shape)
        pool3_x = self.pool3(conv3_x)
        print('# Pool3 output shape:', pool3_x.shape)
        conv4_x = self.conv4(pool3_x)
        print('# Conv4 output shape:', conv4_x.shape)
        pool4_x = self.pool4(conv4_x)
        print('# Pool4 output shape:', pool4_x.shape)
        conv5_x = self.conv5(pool4_x)
        print('# Conv5 output shape:', conv5_x.shape)
        pool5_x = self.pool5(conv5_x)
        print('# Pool5 output shape:', pool5_x.shape)
        out = self.ASPP(pool5_x)
        print('# Output shape:', out.shape)
        return out
            
    
if __name__ == '__main__':
    inputs = torch.randn(4, 3, 224, 224)
    print('# input shape:', inputs.shape)
    net = DeepLabv2(in_channels=3, num_classes=21)
    output = net(inputs)

输出结果

# input shape: torch.Size([4, 3, 224, 224])
# Conv1 output shape: torch.Size([4, 64, 224, 224])
# Pool1 output shape: torch.Size([4, 64, 112, 112])
# Conv2 output shape: torch.Size([4, 128, 112, 112])
# Pool2 output shape: torch.Size([4, 128, 56, 56])
# Conv3 output shape: torch.Size([4, 256, 56, 56])
# Pool3 output shape: torch.Size([4, 256, 28, 28])
# Conv4 output shape: torch.Size([4, 512, 28, 28])
# Pool4 output shape: torch.Size([4, 512, 28, 28])
# Conv5 output shape: torch.Size([4, 512, 28, 28])
# Pool5 output shape: torch.Size([4, 512, 28, 28])
# Output shape: torch.Size([4, 21, 28, 28])

3. DeepLab v3

DeepLab v3：[3]

DeepLab v3+：[4]

DeepLab V3网络简介

DeepLabV3网络简析

bilibili视频讲解：DeepLabV3网络简介(语义分割)

DeepLab v3改进了ASPP模块，通过五个并行的膨胀卷积层，其分别是1x1的卷积层，三个3x3的膨胀卷积层，以及一个全局平均池化层。其中，全局平均池化层后面跟有一个1x1的卷积层，然后通过双线性插值的方法还原回输入的W和H，全局平均池化分支增加了全局上下文信息。之后，通过Concat的方式将5个分支的输出沿着channels进行拼接。最后再通过一个1x1的卷积层进一步融合信息。

3.1 DeepLab v3网络结构

这里以ResNet101作为backbone为例。

在这里插入图片描述

3.2 训练技巧

在训练过程中增大训练输入的尺寸。论文中介绍，在采用大的膨胀系数时，输入的图像尺寸不能太小，否则3x3的膨胀卷积可能退化成1x1的普通卷积。
计算损失时，将预测的结果通过上采样还原回原尺度（即网络通过最后的双线性插值上采样8倍），再和真实标签图像计算损失。而在DeepLab v1和DeepLab v2中，将真实标签图像下采样8倍的特征图与没有进行上采样的预测结果计算损失，这样做的目的也能加快训练。
训练后，冻结bn层的参数，fine-turn网络。