Dilated Convolution
[Paper]: Multi-scale Context Aggregation by Dilated Convolutions
1. Caffe 中的定义
Dilated Convolution 已经可在 Caffe 官方的卷积层参数中定义.
message ConvolutionParameter {
// Factor used to dilate the kernel, (implicitly) zero-filling the resulting holes.
// (Kernel dilation is sometimes referred to by its use in the
// algorithme à trous from Holschneider et al. 1987.)
repeated uint32 dilation = 18; // The dilation; defaults to 1
}
layer {
name: "ct_conv1_1"
type: "Convolution"
bottom: "fc-final"
top: "ct_conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 1
}
convolution_param {
num_output: 42
pad: 33
kernel_size: 3
}
}
layer {
name: "ct_relu1_1"
type: "ReLU"
bottom: "ct_conv1_1"
top: "ct_conv1_1"
}
layer {
name: "ct_conv1_2"
type: "Convolution"
bottom: "ct_conv1_1"
top: "ct_conv1_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 1
}
convolution_param {
num_output: 42
pad: 0
kernel_size: 3
}
}
layer {
name: "ct_relu1_2"
type: "ReLU"
bottom: "ct_conv1_2"
top: "ct_conv1_2"
}
layer {
name: "ct_conv2_1"
type: "Convolution"
bottom: "ct_conv1_2"
top: "ct_conv2_1"
convolution_param {
num_output: 84
kernel_size: 3
dilation: 2
}
}
layer {
name: "ct_relu2_1"
type: "ReLU"
bottom: "ct_conv2_1"
top: "ct_conv2_1"
}
layer {
name: "ct_conv3_1"
type: "Convolution"
bottom: "ct_conv2_1"
top: "ct_conv3_1"
convolution_param {
num_output: 168
kernel_size: 3
dilation: 4
}
}
layer {
name: "ct_relu3_1"
type: "ReLU"
bottom: "ct_conv3_1"
top: "ct_conv3_1"
}
layer {
name: "ct_conv4_1"
type: "Convolution"
bottom: "ct_conv3_1"
top: "ct_conv4_1"
convolution_param {
num_output: 336
kernel_size: 3
dilation: 8
}
}
layer {
name: "ct_relu4_1"
type: "ReLU"
bottom: "ct_conv4_1"
top: "ct_conv4_1"
}
layer {
name: "ct_conv5_1"
type: "Convolution"
bottom: "ct_conv4_1"
top: "ct_conv5_1"
convolution_param {
num_output: 672
kernel_size: 3
dilation: 16
}
}
layer {
name: "ct_relu5_1"
type: "ReLU"
bottom: "ct_conv5_1"
top: "ct_conv5_1"
}
layer {
name: "ct_fc1"
type: "Convolution"
bottom: "ct_conv5_1"
top: "ct_fc1"
convolution_param {
num_output: 672
kernel_size: 3
}
}
layer {
name: "ct_fc1_relu"
type: "ReLU"
bottom: "ct_fc1"
top: "ct_fc1"
}
layer {
name: "ct_final"
type: "Convolution"
bottom: "ct_fc1"
top: "ct_final"
convolution_param {
num_output: 21
kernel_size: 1
}
}
2. Paper - Multi-scale Context Aggregation by Dilated Convolutions
语义分割属于 dense prediction 问题, 不同于图像分类问题.
Dilated Convolutions 能够整合多尺度内容信息,且不损失分辨率,支持接受野的指数增长.
图像分类任务通过连续的 Pooling 和 Subsampling 层整合多尺度的内容信息,降低图像分别率,以得到全局预测输出.
Dense Prediction 需要结合多尺度内容推理(multi-scale contextual reasoning)与 full-resolution 输出.
处理 multi-scale reasoning 与 full-resolution dense prediction 冲突的方法:
- 利用重复的 up-convolutions 操作,重构丢失的分辨率,保留downsampled 层的全局信息.
- 利用图像不同 rescaled 的信息作为网络输入,并结合其输出. 不过无法确定哪个 rescaled 输入图像是最需要的.
Dilated Convolutions 不会降低图像分辨率,或分析 rescaled 图像,整合了多尺度的内容信息. 可以以任何分辨率加入到已有的网络结构中.
2.1 Dilated Convolution
定义离散函数: F:Z2→R F : Z 2 → R , 假设 Ωr=[−r,r]2⋂Z2 Ω r = [ − r , r ] 2 ⋂ Z 2 , k:Ω→R k : Ω → R 是大小为 (2r+1)2 ( 2 r