Fully Convolutional Networks for Semantic Segmentation
——————————————————————————————————————————
【FCN实践】01 常见问题 http://blog.csdn.net/binlearning/article/details/72854136
【FCN实践】02 模型迁移及初始化 http://blog.csdn.net/binlearning/article/details/72854244
【FCN实践】03 训练 http://blog.csdn.net/binlearning/article/details/72854407
【FCN实践】04 预测 http://blog.csdn.net/binlearning/article/details/72854583
【项目源码】https://github.com/binLearning/fcn_voc_32s
——————————————————————————————————————————
Paper:https://arxiv.org/abs/1605.06211
GitHub:https://github.com/shelhamer/fcn.berkeleyvision.org
本系列以voc-fcn32s的训练为例。
FAQ
1.插值层(反向卷积层)是否需要学习?
一开始的版本是将反向卷积层中的卷积核初始化为实现双线性插值,并可学习。但在后续的实验中将这些做双线性插值的核的参数固定,即不可学习。
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0 # 不可学习
}
convolution_param {
num_output: 21
bias_term: false
kernel_size: 64
stride: 32
}
}
将上采样操作固定为双线性插值相较于之前的可学习参数在性能上并没有多少变化,并且参数固定可以稍微加速训练。注意,每一类输出对应一个插值核,更高维度或者非线性的插值操作可能会得到不同的结果,所以可学习的核可能效果更好。
2.为什么对输入图像进行零值填充(pad)?
对输入图像填充100像素是为了确保网络输出在与输入进行对齐时适用于任意尺寸的输入。对齐操作由网络配置和剪切层(crop layer)自动完成,但是需要计算在剪切时的偏移量(offset)。
pad: 100, 500×500 | pad: 1, 500×500 | pad: 1, 224×224 |
---|---|---|
data (1, 3, 500, 500) | data (1, 3, 500, 500) | data (1, 3, 224, 224) |
conv1_1 (1, 64, 698, 698) | conv1_1 (1, 64, 500, 500) | conv1_1 (1, 64, 224, 224) |
conv1_2 (1, 64, 698, 698) | conv1_2 (1, 64, 500, 500) | conv1_2 (1, 64, 224, 224) |
pool1 (1, 64, 349, 349) | pool1 (1, 64, 250, 250) | pool1 (1, 64, 112, 112) |
conv2_1 (1, 128, 349, 349) | conv2_1 (1, 128, 250, 250) | conv2_1 (1, 128, 112, 112) |
conv2_2 (1, 128, 349, 349) | conv2_2 (1, 128, 250, 250) | conv2_2 (1, 128, 112, 112) |
pool2 (1, 128, 175, 175) | pool2 (1, 128, 125, 125) | pool2 (1, 128, 56, 56) |
conv3_1 (1, 256, 175, 175) | conv3_1 (1, 256, 125, 125) | conv3_1 (1, 256, 56, 56) |
conv3_2 (1, 256, 175, 175) | conv3_2 (1, 256, 125, 125) | conv3_2 (1, 256, 56, 56) |
conv3_3 (1, 256, 175, 175) | conv3_3 (1, 256, 125, 125) | conv3_3 (1, 256, 56, 56) |
pool3 (1, 256, 88, 88) | pool3 (1, 256, 63, 63) | pool3 (1, 256, 28, 28) |
conv4_1 (1, 512, 88, 88) | conv4_1 (1, 512, 63, 63) | conv4_1 (1, 512, 28, 28) |
conv4_2 (1, 512, 88, 88) | conv4_2 (1, 512, 63, 63) | conv4_2 (1, 512, 28, 28) |
conv4_3 (1, 512, 88, 88) | conv4_3 (1, 512, 63, 63) | conv4_3 (1, 512, 28, 28) |
pool4 (1, 512, 44, 44) | pool4 (1, 512, 32, 32) | pool4 (1, 512, 14, 14) |
conv5_1 (1, 512, 44, 44) | conv5_1 (1, 512, 32, 32) | conv5_1 (1, 512, 14, 14) |
conv5_2 (1, 512, 44, 44) | conv5_2 (1, 512, 32, 32) | conv5_2 (1, 512, 14, 14) |
conv5_3 (1, 512, 44, 44) | conv5_3 (1, 512, 32, 32) | conv5_3 (1, 512, 14, 14) |
pool5 (1, 512, 22, 22) | pool5 (1, 512, 16, 16) | pool5 (1, 512, 7, 7) |
fc6 (1, 4096, 16, 16) | fc6 (1, 4096, 10, 10) | fc6 (1, 4096, 1, 1) |
fc7 (1, 4096, 16, 16) | fc7 (1, 4096, 10, 10) | fc7 (1, 4096, 1, 1) |
score_fr (1, 21, 16, 16) | score_fr (1, 21, 10, 10) | score_fr (1, 21, 1, 1) |
upscore (1, 21, 544, 544) | upscore (1, 21, 352, 352) | upscore (1, 21, 64, 64) |
score (1, 21, 500, 500) |
3.输出、梯度、参数为什么都是零?
一般是因为没有使用预训练模型进行初始化。不论是复现官方FCN训练还是训练自己的FCN模型,都需要使用将相对应的预训练网络权值迁移到FCN网络。surgery.transplant()函数可以实现权值迁移操作。
4.Python Layer
由于FCN使用到了Python层,所以在编译Caffe时需要设置Makefile.config:
WITH_PYTHON_LAYER := 1
5.相关数据下载
trained VGG model
http://www.robots.ox.ac.uk/~vgg/research/very_deep/
Semantic Boundary Dataset (SBD)
http://home.bharathh.info/pubs/codes/SBD/download.html
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz
PASCAL VOC 2012
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
注:前三个FAQ即GitHub项目主页中的FAQ。