SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
0、摘要
0.1、 分割引擎组成
This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer。
- 编码网络结构
引用VGG16网络的13层卷积层,但是去掉了VGG16中的全连接层。 - 解码网络结构
为了逐像素分类,将低分辨率编码器特征图映射到全输入分辨率特征图。
0.2、优点
- 解码器对其较低分辨率的输入特征图进行上采样;
- 解码器使用在相应编码器的最大池化步骤中计算的池化索引来执行非线性上采样;
- 推理期间节省内存和缩短计算时间是有效的;
- 训练参数明显更少。
1、Introduction 忽略
2、Literature Review 忽略
3、网络结构
从图2中可以看出,SegNet由一个编码器、相应的解码器、以及最后的逐像素分类层组成。
3.1、编码器
- 编码器中执行卷积和最大池化操作。
- 卷积操作由13 个卷积层组成。 (但是抛弃VGG16的全连接层)
- 在做 2×2 最大池化时,存储对应的最大池化索引值。只存储最大池化索引,即每个池化窗口中最大特征值的位置为每个存储 编码器特征图。虽然这种较低的内存存储会导致精度略有下降,但仍然适用于实际应用。
3.2、解码器
解码器网络中,恰当的解码器,通过对应编码器特征图的记忆的最大池索引对其输入特征图进行上采样。
- 在解码器,执行上采样和卷积操作。 最后,每个像素都有一个softmax分类器;
- 在上采样期间,相应编码器层的最大池化索引作为输入参数,进行上采样操作;
- 最后,使用 K 类 softmax 分类器来预测每个像素的类别。
3.3、和 DeconvNet 以及U-Net网络的不同
DeconvNet 、U-Net 、SegNet有着相似的网络结构。
3.3.1、和DeconvNet的不同
Similar upsampling approach called unpooling is used.
However, there are fully-connected layers which make the model larger.
3.3.2、和U-Net的不同
It is used for biomedical image segmentation.
Instead of using pooling indices, the entire feature maps are transfer from encoder to decoder, then with concatenation to perform convolution.
This makes the model larger and need more memory.
3.4、训练
- 数据集:Cam Vid road scenes,367训练和233测试RGB图像集,分辨率360*480;RGB输入事先执行了局部的对比度归一化。
- 分类目标:公路、建筑物、汽车、行人、路标、电线杆、人行道等。
- 初始化参数:权重使用何凯明的;SGD梯度(0.1学习率,0.9动量)。
- 损失函数:cross-entropy交叉熵
- epoch:打乱训练集,batch size 12
- 框架:SegNet-Basic Caffe
- 目标:损失值收敛
3.5、分析
- 三种方式性能比较
global accuracy (G)、 class average accuracy © 和 mean intersection over union (mIoU)
4、结论
Two datasets are tried. One is CamVid dataset for Road Scene Segmentation. One is SUN RGB-D dataset for Indoor Scene Segmentation.
4.1、CamVid dataset for Road Scene Segmentation
As shown above, SegNet obtains very good results for many classes. It also got the highest class average and global average.
SegNet obtains highest global average accuracy (G), class average accuracy ©, mIOU and Boundary F1-measure (BF). It outperforms FCN, DeepLabv1 and DeconvNet.
4.2、SUN RGB-D Dataset for Indoor Scene Segmentation
Only RGB is used, depth (D) information are not used.
- SegNet outperforms FCN, DeconvNet, and DeepLabv1.
- SegNet only got a bit inferior to DeepLabv1 for mIOU.
- Higher accuracy for large-size classes.
- Lower accuracy for small-size classes.
4.3、Memory and Inference Time
SegNet is slower than FCN and DeepLabv1 because SegNet contains the decoder architecture. And it is faster than DeconvNet because it does not have fully connected layers.
SegNet has low memory requirement during both training and testing. And the model size is much smaller than FCN and DeconvNet.
5、代码测试相关
pytorch版本代码:https://github.com/yassouali/pytorch-segmentation.git
tensorflow版本代码:https://github.com/divamgupta/image-segmentation-keras.git
MXNet:https://github.com/osmr/imgclsmob.git
Caffe版本代码:https://github.com/alexgkendall/caffe-segnet.git