【SegNet】 A Deep Convolutional Encoder-Decoder Achitecture for Image Segmentation

最新推荐文章于 2021-06-17 23:15:28 发布

One__Coder

最新推荐文章于 2021-06-17 23:15:28 发布

阅读量765

点赞数

分类专栏：论文阅读 Machine Learning

本文链接：https://blog.csdn.net/github_37973614/article/details/84578187

版权

SegNet是一种基于VGG16的深度卷积编码-解码架构，主要用于图像语义分割任务。其创新点在于解码器使用编码器的maxpool位置信息进行非线性上采样，减少学习参数并改进边界定位。与FCN、DeepLab和DeconvNet等结构比较，SegNet更适合场景理解和精确边界定位。论文在CamVid和SUN RGB-D数据集上进行了实验，并讨论了多种解码器结构的性能。

摘要由CSDN通过智能技术生成

在这里插入图片描述
PAMI2017(IEEE Transactions on Pattern Analysis and Machine Intelligence)
这篇论文的内容有点多，所以就跟着论文的节奏来写这篇博客了。

文章目录

- 概述
- Introduction
- Literature review
- Architecture
- - One
  - Two
  - Three
- Training
- Some Q&A

根据论文的abstract来看，可以先总结如下：

概述

网络的任务？
- 像素级的分类任务，语义分割。
方法？or Innovation
- 方法：在编码器部分使用的是VGG16去除了后面几层FC层的结构。
- Innovation（卖点）：在解码器的时候进行上采样的方法。在解码器的上采样过程中，会利用在编码器时记录的maxpool对应位置进行非线性上采样（此处编码器和解码器的位置是相互对应的）。好处就是可以消除上采样的学习。、
对比
- 与FCN结构作比较
- 与DeepLab-LargeFOV作比较
- 与DeconvNet做比较
SegNet主要是motivated by scene understanding applications
- motivation的话可以说是为了得到更加精确的边界定位。
Dataset
- Road scenes task（ CamVid road scenes dataset ）
- SUN RGB-D indoor scenes segmentation task

Introduction

一般Introduction会阐述Segmentation task的必要性，然后会简单总结以下前世今生，最后简单吹以下自己的方法。
1、总起：Semantic segmentation has a wide array of applications ranging from scene understanding, inferring support-relationships among objects to autonomous driving。

Early methods：依赖于低级视觉线索（已经被流行的机器学习算法迅速取代）。
Deep learning：虽然已经取得了比较好的结果，但是，结果还是比较粗糙。

2、自己网络需要应对的任务：有能力去model appearance(road， building)，shape(cars，pedestrians)，understand the spatial-relationship(context) between different classes such as road and side-walk.
3、inspiration：关键的组件就是解码器，这样的灵感来自于非监督的特征训练(from this paper)
4、这样的解码器所带来的advantages：

改善边界定位的效果；
减少参数的数量，从而可以做到end-to-end training；
这样的上采样方式可以应用到任意的Encoder-Decoder结构当中，比如如FCN，Conditional random ﬁelds as recurrent neural networks。

5、Contributions

decoder技术
widely used Fully Convolutional Network (FCN)

Literature review

1、用随机森林(RF),Boosting等做类别的中心预测，用SfM提取特征，配合CRF提高预测精度。但是这些方法效果都不好，总结原因是这些方法都需要提高分类特征（The result of all these techniques indicate the need for improved features for classiﬁcation）。

Before the arrival of deep networks：在深度学习之前，表现最好的方法就是利用自己设计特征的方式来对pixel进行独立的分类。通常的做法就是：将一个图上的截取的patch，丢到一个分类器中，如随机森林、boosting来预测中心点像素，最后会使用条件随机场来平滑（smooth）一下预测结果。
More recent approaches：为了在一个patch里达到对所有的像素进行更高质量预测的目标。在这里是所有的像素而非中心像素。提升了基于随机森林的分割效果，但是thin structed classes are classified poorly。
Another approach：另一种方法呢就是使用了手工设计特征和时空超像素化的组合来获得更高的准确度。
The best performing technique：通过在一个条件随机场框架中将目标检测的输出和分类预测结合的方式，解决了样本不平衡的问题。

2、带景深的RGBD图像自NYU数据集出来后就变得很受欢迎。以下的方法会使用手工设计的特征来进行分类。

In more recent work： both class segmentation and support relationships are inferred together using a combination of RGB and depth based cues.
Another approach：focuses on real-time joint reconstruction and semantic segmentation, where Random Forests are used as the classiﬁer.
boundary detection and hierarchical grouping before performing category segmentation

3、Deep learning network：可以让网络自己更好的学习特征，但是，对边界的检测很poor。
4、Newer deep architecture：就是encoder-decoder结构，编码器一般是去除掉FC的在ImageNet预训练的VGG16，解码器不尽相同。针对最开始的FCN，通过给FCN增加循环网络（RNN）以及在更大的数据集上进行finetune，可以提升FCN的效果。
5、Multi-scale deep architectures：可以有两个理解（1）不同scale的输入；&