语义分割--Understanding Convolution for Semantic Segmentation

最新推荐文章于 2023-07-16 16:56:17 发布

O天涯海阁O

最新推荐文章于 2023-07-16 16:56:17 发布

阅读量3.4k

点赞数

分类专栏：语义分割语义分割

本文链接：https://blog.csdn.net/zhangjunhit/article/details/71522144

版权

语义分割同时被 2 个专栏收录

50 篇文章 3 订阅

订阅专栏

语义分割

49 篇文章 112 订阅

订阅专栏

Understanding Convolution for Semantic Segmentation
https://arxiv.org/abs/1702.08502v1
模型 https://goo.gl/DQMeun

针对语义分割问题，我们从两个方面进行改善，一个是dense upsampling convolution (DUC) 代替 Bilinear upsampling，另一个是用 hybrid dilated convolution (HDC) 代替传统的 dilated convolution。

3.1. Dense Upsampling Convolution (DUC)
输入图像经过CNN卷积网络模型提取得到的特征层，尺寸减小了很多倍。但是由于语义分割需要输出原尺寸大小的结果图像，所以一般的算法对CNN模型输出的特征层进行Bilinear upsampling或 deconvolution 来放大特征图尺寸，使其和输入图像尺寸一致。
这个放大的环节存在一些问题： Bilinear upsampling is not learnable and may lose fine details.
deconvolution, in which zeros have to be padded in the unpooling step before the convolution operation

这里我们提出了一个小的卷积网络模块 DUC 用于实现放大特征图尺寸目的。
这里写图片描述

DUC的输入是 ResNet 网络的输出 feature map h×w×c，我们使用 DUC 输出的 feature map 尺寸为
h×w×(r*r × L) , 最后将这个结果 reshaped 得到输入图像尺寸大小的 H × W × L 。完成了放大工作
其中 L 是语义分割总的类别数目， r 是ResNet 中的 downsampling factor。

DUC的核心思想就是将完整的 label map 等分为 r*r 个相同大小的子块，每个字块的尺寸就是输入的feature map 尺寸。换句话说我们将整个 label map 映射为一个含有多通道的小 label map。这个映射可以让我们直接使用卷积操作由输入feature map得到 output label maps

DUC因为是可学习的，它能够捕捉一些细节信息。
Since DUC is learnable, it is capable of capturing and recovering fine-detailed information that is generally missing in the bilinear interpolation operation.

最后DUC 很容易嵌入到FCN中去。

3.2. Hybrid Dilated Convolution (HDC)
在 FCN 中我们使用 Dilated Convolution 主要是
maintain high resolution of feature maps in FCN through replacing the max-pooling operation or strided convolution layer while maintaining the receptive field of the corresponding layer.

Since all layers have equal dilation rates r
如果所有网络层都使用相同的 dilation rates r，会导致一个问题，如下图所示：
这里写图片描述

卷积采样 very sparse，局部信息不完整，信息不相关，信息不一致
1) local information is completely missing; 2) the information can be irrelevant across large distances. Another outcome of the gridding effect is that pixels in nearby r×r regions at layer l receive information from completely different set of “grids” which may impair the consistency of local information.

这里我们提出了 hybrid dilated convolution (HDC)
we use a different dilation rate for each layer

4 Experiments and Results
这里写图片描述