SegNet

最新推荐文章于 2024-10-06 17:36:51 发布

写个翻译

最新推荐文章于 2024-10-06 17:36:51 发布

阅读量214

点赞数

文章标签：计算机视觉深度学习

本文链接：https://blog.csdn.net/weixin_46707677/article/details/108558690

版权

摘要

This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer.
（由编码器网络、相应的解码器网络和随后的像素分类层组成）
The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network .
（编码器网络的架构在拓扑上与VGG16网络中的13个卷积层相同）
The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification.
（解码器网络的作用是将低分辨率编码器特征图映射到全输入分辨率特征图，以便按像素分类）
The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s)
（SegNet的新颖性在于解码器对其较低分辨率的输入特征映射进行上采样的方式）
Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps.
（步骤：解码器使用在相应编码器的max-pooling步骤中计算的pooling索引来执行非线性上采样。这消除了学习上采样的需要。上采样的特征图是稀疏的，然后与可训练的滤波器卷积以产生密集的特征地图）

1介绍

The encoder network in SegNet is topologically identical to the convolutional layers in VGG16 . We remove the fully connected layers of VGG16 . The key component of SegNet is the decoder network which consists of a hierarchy of decoders one corresponding to each encoder. Of these, the appropriate decoders use the max-pooling indices received from the corresponding encoder to perform non-linear upsampling of their input feature maps.
(SegNet中的编码器网络在拓扑上与VGG16中的卷积层相同。我们去掉了VGG16的完全连接层。SegNet的关键组成部分是解码器网络，它由一系列解码器组成，每个解码器对应一个编码器。其中，适当的解码器使用从相应编码器接收的最大池化索引来执行其输入特征图的非线性上采样。)

2相关综述

3网络架构

编码–解码–逐像素分类
在这里插入图片描述
编码器部分：将VGG16去掉由全连接层转化过来的卷积层，保留13层。编码器特征图与filter bank产生特征图，然后进行批归一，ReLU(逐像素)，最大池化（窗口2*2，步长为2）（即，以因子2进行下采样）-------下采样会损失细节，故在下采样之前从特征图中捕获并存储边界信息----该论文提出，仅存储最大索引（max-pooling indices）
解码器部分：相应解码器用来自对应的编码器的最大池化索引进行上采样，以生成稀疏特征图，然后与解码器组的filter bank卷积以产生密集特征图，然后批归一。（无relu）
分类：soft-max分类器对应像素处概率最大的分类标签

3.1解码器变体
SegNet与fcn 的比较
在这里插入图片描述
SegNet使用最大池化索引对特征图进行上采样(无需学习)，并与可训练的解码器滤波器组卷积。FCN通过学习对输入特征图进行去卷积进行上采样，并添加相应的编码器特征图以产生解码器输出。该特征图是相应编码器中的最大池化层(包括子采样)的输出。
请注意，在FCN中没有可训练的解码器滤波器。
3.2 训练
3.3分析
We can now summarize the above analysis with the following general points.

The best performance is achieved when encoder feature maps are stored in full. This is reflected in the semantic contour delineation metric (BF) most clearly.
When memory during inference is constrained, then compressed forms of encoder feature maps (dimensionality reduction, max-pooling indices) can be stored and used with an appropriate decoder (e.g. SegNet type) to improve performance.
Larger decoders increase performance for a given encoder network.