MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS

采用dilated convolution代替pooling层,关于dilated conviolution的原理,可见:

http://blog.csdn.net/u011961856/article/details/77141761

the front end module:

输入为三通道padded彩色图像,输出为输出feature map为,output channels为21.采用VGG-16作为dense prediction,但是移除VGG-16的最后两个pooling,striding层,之后的每个卷积层采用dilated convolution,dilations值为2,移除的pooling层厚的卷积层的dilations值为4.

模型代码如下:

def build_frontend_vgg(net, bottom, num_classes):
    prev_layer = bottom
    num_convolutions = [2, 2, 3, 3, 3]
    dilations = [0, 0, 0, 0, 2, 4]
    for l in range(5):
        num_outputs = min(64 * 2 ** l, 512)
        for i in range(0, num_convolutions[l]):
            conv_name = 'conv{0}_{1}'.format(l+1, i+1)
            relu_name = 'relu{0}_{1}'.format(l+1, i+1)
            if dilations[l] == 0:
                setattr(net, conv_name,
                        L.Convolution(
                            prev_layer,
                            param=[dict(lr_mult=1, decay_mult=1),
                                   dict(lr_mult=2, decay_mult=0)],
                            convolution_param=dict(num_output=num_outputs,
                                                   kernel_size=3)))
            else:
                setattr(net, conv_name,
                        L.Convolution(
                            prev_layer,
                            param=[dict(lr_mult=1, decay_mult=1),
                                   dict(lr_mult=2, decay_mult=0)],
                            convolution_param=dict(num_output=num_outputs,
                                                   kernel_size=3,
                                                   dilation=dilations[l])))
            setattr(net, relu_name,
                    L.ReLU(getattr(net, conv_name), in_place=True))
            prev_layer = getattr(net, relu_name)
        if dilations[l+1] == 0:
            pool_name = 'pool{0}'.format(l+1)
            setattr(net, pool_name, L.Pooling(
                prev_layer, pool=P.Pooling.MAX, kernel_size=2, stride=2))
            prev_layer = getattr(net, pool_name)

    net.fc6 = L.Convolution(
        prev_layer,
        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
        convolution_param=dict(num_output=4096, kernel_size=7,
                               dilation=dilations[5]))
    net.relu6 = L.ReLU(net.fc6, in_place=True)
    net.drop6 = L.Dropout(net.relu6, in_place=True, dropout_ratio=0.5)
    net.fc7 = L.Convolution(
        net.drop6,
        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
        convolution_param=dict(num_output=4096, kernel_size=1))
    net.relu7 = L.ReLU(net.fc7, in_place=True)
    net.drop7 = L.Dropout(net.relu7, in_place=True, dropout_ratio=0.5)
    net.final = L.Convolution(
        net.drop7,
        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
        convolution_param=dict(
            num_output=num_classes, kernel_size=1,
            weight_filler=dict(type='gaussian', std=0.001),
            bias_filler=dict(type='constant', value=0)))
    return net.final, 'final'

效果对比:

context network

context network结构如下表:

pointwise truncation 函数为max(·, 0).前7层采用卷积,dilations值分别为1,1,2,4,8,16,1.最后一层采用卷积,dilations值为1.C为output channels.表格中包括两种网络结果,分别为basic context network,large context network,他们的网络结果相同,只是每层的output channels值不同,如表中行Basic,Large所示.

关于参数初始化,大多采用随机初始化(random initialization),文章提出了一种更高效的网络参数初始化方法,identity initialization,用于将输入直接传递到下一层,公式为:

对于两个连续的层,feature map个数分别为,那么初始化为:

式中,.

context network代码如下:


def build_context(net, bottom, num_classes, layers=8):
    prev_layer = bottom
    multiplier = 1
    for i in range(1, 3):
        conv_name = 'ctx_conv1_{}'.format(i)
        relu_name = 'ctx_relu1_{}'.format(i)
        setattr(net, conv_name,
                L.Convolution(
                    *([] if prev_layer is None else [prev_layer]),
                    param=[dict(lr_mult=1, decay_mult=1),
                           dict(lr_mult=2, decay_mult=0)],
                    convolution_param=dict(
                        num_output=num_classes * multiplier, kernel_size=3,
                        pad=1,
                        weight_filler=dict(type='identity',
                                           num_groups=num_classes, std=0.01),
                        bias_filler=dict(type='constant', value=0))))
        setattr(net, relu_name,
                L.ReLU(getattr(net, conv_name), in_place=True))
        prev_layer = getattr(net, relu_name)

    for i in range(2, layers - 2):
        dilation = 2 ** (i - 1)
        multiplier = 1
        conv_name = 'ctx_conv{}_1'.format(i)
        relu_name = 'ctx_relu{}_1'.format(i)
        setattr(net, conv_name,
                L.Convolution(
                    prev_layer,
                    param=[dict(lr_mult=1, decay_mult=1),
                           dict(lr_mult=2, decay_mult=0)],
                    convolution_param=dict(
                        num_output=num_classes * multiplier, kernel_size=3,
                        dilation=dilation, pad=dilation,
                        weight_filler=dict(type='identity',
                                           num_groups=num_classes,
                                           std=0.01 / multiplier),
                        bias_filler=dict(type='constant', value=0))))
        setattr(net, relu_name,
                L.ReLU(getattr(net, conv_name), in_place=True))
        prev_layer = getattr(net, relu_name)

    net.ctx_fc1 = L.Convolution(
        prev_layer,
        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
        convolution_param=dict(
            num_output=num_classes * multiplier, kernel_size=3, pad=1,
            weight_filler=dict(type='identity',
                               num_groups=num_classes,
                               std=0.01 / multiplier),
            bias_filler=dict(type='constant', value=0)))
    net.ctx_fc1_relu = L.ReLU(net.ctx_fc1, in_place=True)
    net.ctx_final = L.Convolution(
        net.ctx_fc1_relu,
        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
        convolution_param=dict(
            num_output=num_classes, kernel_size=1,
            weight_filler=dict(type='identity',
                               num_groups=num_classes,
                               std=0.01 / multiplier),
            bias_filler=dict(type='constant', value=0)))
    return net.ctx_final, 'ctx_final'

作者分别做了Front end网络,以及加上Context network,加上CRF-RNN的分割效果:



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值