【论文笔记】Learning Deconvolution Network for Semantic Segmentation

最新推荐文章于 2024-01-25 08:45:13 发布

xiaxzhou

最新推荐文章于 2024-01-25 08:45:13 发布

阅读量2.7k

点赞数 3

分类专栏：论文

本文链接：https://blog.csdn.net/xiaxzhou/article/details/74012137

版权

本文深入探讨了反卷积网络在语义分割中的作用，解决了FCN存在的固定尺寸感受野和对象细节丢失的问题。通过反池化和反卷积层，网络能够重建对象的详细结构，并在不同尺度上进行语义分割。训练中采用批量标准化，分阶段训练，结合FCN和反卷积网络的优势，实现更精确的像素级标注。

摘要由CSDN通过智能技术生成

Learning Deconvolution Network for Semantic Segmentation

首先介绍了基于FCN的语义分割，如论文：

Fully Convolutional Networks for Semantic Segmentation
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connectred CRFs

FCN解释如下:

fully connected layers in the standard CNNs are interpreted as convolutions with large receptive fields, and segmentation is achieved using coarse class score maps obtained by feedforwarding an input image.

An interesting idea in this work is that a simple interpolation filter is employed for deconvolution and only the CNN part of the network is fine-tuned to learn deconvolution indirectly.

FCN存在以下问题：

如图所示：
这里写图片描述

first, the network can handle only a single scale semantics within image due to the fixed-size receptive field
首先，由于固定大小的感受域，网络只能处理图像中的单个尺度语义。

second, the detailed structures of an object are often lost or smoothed because the label map, input to the deconvolutional layer is too coarse and deconvolution procedure is overly simple
其次，一个对象的详细结构常常丢失或被平滑掉，这是因为输入到反卷积层的标签图太小，且反卷积过程过于简单。

反卷积网络引入：

Deconvolution network is introduced in [25] to reconstruct input images. As the reconstruction of an input
image is non-trivial due to max pooling layers, it prop