[paper] 00037-Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

最新推荐文章于 2022-05-08 12:05:14 发布

六半

最新推荐文章于 2022-05-08 12:05:14 发布

阅读量237

点赞数

分类专栏：论文文章标签： DeepLabv3 Atrous Spatial Pyramial Poolin

本文链接：https://blog.csdn.net/u013659598/article/details/102714880

版权

论文专栏收录该内容

19 篇文章 0 订阅

订阅专栏

Author: Liang-CHieh Chen et. al--- Google Inc.

Keywords:

DeepLabv3+ : extends DeepLabv3 by adding a simple yet eﬀective decoder module to reﬁne the segmentation results especially along object boundaries.

Atrous Spatial Pyramial Pooling:

1. Introduction

In this work, we consider two types of neural networks that use spatial pyramid pooling module [18,19,20] or encoder-decoder structure [21,22] for semantic segmentation, where the former one captures rich contextual information by pooling features at dierent resolution while the latter one is able to obtain sharp object boundaries.

Contributions:

We propose a novel encoder-decoder structure which employs DeepLabv3 as a powerful encoder module and a simple yet eective decoder module.
In our structure, one can arbitrarily control the resolution of extracted en-coder features by atrous convolution to trade-o precision and runtime,which is not possible with existing encoder-decoder models.
We adapt the Xception model for the segmentation task and apply depthwise separable convolution to both ASPP module and decoder module, resulting in a faster and stronger encoder-decoder network.
Our proposed model attains a new state-of-art performance on PASCAL VOC 2012 and Cityscapes datasets. We also provide detailed analysis of design choices and model variants.
We make our Tensor ow-based implementation of the proposed model pub-licly available at https://github.com/tensorflow/models/tree/master/research/deeplab.

2 Related Work

Spatial pyramid pooling:

PASS

Encoder-decoder:

Use DeepLabv3 as the encoder module and add a simple yet effective decoder module to obtain sharper segmentations.

Depthwise separable convolution:

3 Methods

3.1 Encoder-Decoder with Atrous Convolution

Atrous convolution:

Depthwise separable convolution:

drastically reduces computation complexity

DeepLabv3 as encoder:

We use the last feature map before logits in the original DeepLabv3 as the encoder output in our proposed encoder-decoder structure.

Proposed decoder:

We apply another 1 1 convolution on the low-level features to reduce the number of channels, since the corresponding low-level features usually contain a large number of channels (e.g., 256 or 512) which may outweigh the importance of the rich encoder features (only 256 channels in our model) and make the training harder. After the concatenation, we apply a few 3 3 convolutions to rene the features followed by another simple bilinear upsampling by a factor of 4.

3.2 Modified Aligned Xception

4. Experimental Evaluation

4.1 Decoder Design Choices

We define "DeepLabv3 feature map" as the last feature map computed by DeepLabv3 (i.e., the features containing ASPP features and image-level fea-tures), and [k X k; f] as a convolution operation with kernel k X k and f flters.

In the decoder module, we consider three places for dierent de-sign choices, namely (1) the 1 X 1 convolution used to reduce the channels of the low-level feature map from the encoder module, (2) the 3 X 3 convolution used to obtain sharper segmentation results, and (3) what encoder low-level features should be used.

We do not pursue further denser output feature map (i.e.,output stride < 4) given the limited GPU resources.

4.2 ResNet-101 as Network Backbone