利用enet学习caffe系列----(2)数据输入层和initial_block()

enet网络结构参考博客:emantic Segmentation–ENet:A Deep Neural Network Architecture for Real-Time Semantic…论文解读

整体结构

网络结构
代码是使用pycaffe接口来写model,打开scripts/create_enet_prototxt.py,可以看到网络的主要结构。

    n = caffe.NetSpec()

    if args.mode == 'train_encoder_decoder':
        network = data_layer_train(n, 1)
        out_directory = args.out_dir + 'coal_enet_train_encoder_decoder.prototxt'
    elif args.mode == 'train_encoder':
        network = data_layer_train(n, 8)
        out_directory = args.out_dir + 'coal_enet_train_encoder.prototxt'
    elif args.mode == 'test':
        network = data_layer_test(n)
        out_directory = args.out_dir + 'coal_enet_deploy.prototxt'
    else:
        raise Exception("Wrong mode! Just train_encoeder, train_encoder_decoder and test mode available, "
                        "but received {}.".format(args.mode))

    network, prev_layer = initial_block(n)

    network, prev_layer = bottleneck(n, prev_layer, 1, 0, 64, 'downsampling')  # stage, number_bottleneck, num_input,
    #  type,

    for i in xrange(1, 5):
        network, prev_layer = bottleneck(n, prev_layer, 1, i, 64, 'regular')

    network, prev_layer = bottleneck(n, prev_layer, 2, 0, 128, 'downsampling')

    for j in xrange(2, 4):
        network, prev_layer = bottleneck(n, prev_layer, j, 1, 128, 'regular')
        network, prev_layer = bottleneck(n, prev_layer, j, 2, 128, 'dilated', 2)
        network, prev_layer = bottleneck(n, prev_layer, j, 3, 128, 'asymmetric', 5)
        network, prev_layer = bottleneck(n, prev_layer, j, 4, 128, 'dilated', 4)
        network, prev_layer = bottleneck(n, prev_layer, j, 5, 128, 'regular')
        network, prev_layer = bottleneck(n, prev_layer, j, 6, 128, 'dilated', 8)
        network, prev_layer = bottleneck(n, prev_layer, j, 7, 128, 'asymmetric', 5)
        network, prev_layer = bottleneck(n, prev_layer, j, 8, 128, 'dilated', 16)

    if args.mode == 'train_encoder_decoder' or args.mode == 'test':
        network, prev_layer = bottleneck(n, prev_layer, 4, 0, 64, 'upsampling', 'relu')  # last one = additional flag,
        # that relu is used instead of prelu
        network, prev_layer = bottleneck(n, prev_layer, 4, 1, 64, 'regular', 'relu')
        network, prev_layer = bottleneck(n, prev_layer, 4, 2, 64, 'regular', 'relu')

        network, prev_layer = bottleneck(n, prev_layer, 5, 0, 16, 'upsampling', 'relu')
        network, prev_layer = bottleneck(n, prev_layer, 5, 1, 16, 'regular', 'relu')

    network, prev_layer = fullconv(n, prev_layer, 6, 0, args.num_of_classes)

    if args.mode == 'train_encoder_decoder' or args.mode == 'train_encoder':
        network = loss_layer(n, prev_layer)

网络主要由为数据输入,initial_block,bottleneck等组成,下面开始逐个分析

数据输入

在data_layer_train()函数中,作者是使用了DenseImageData数据层,现在官方的caffe是没有这个层的,需要我们编译的时候添加进去,如何添加,看我上一篇文章

    n.data, n.label = L.DenseImageData(dense_image_data_param=dict(source=args.source, new_height=args.new_height,
                                                                   new_width=args.new_width, batch_size=args.batch_size,
                                                                   shuffle=args.shuffle,
                                                                   label_divide_factor=label_divide_factor), ntop=2)

通过代码可以知道,原始数据在args.source中定义是个文本文件,图像大小被重新resize为(new_width/label_divide_factor, new_height/label_divide_factor),
label_divide_factor 为在训练中label resize大小,

top[1]->Reshape(batch_size, 1, height/label_divide_factor, width/label_divide_factor);

这个参数很重要,在encoder时,label_divide_factor=8,对应的label为(64, 128)即512/8=64, 1024/8=128,
而在decoder时,该参数为1,对应的label和原图一直为(512, 1024)

label_divide_factor类型输入labelreshape后label
8encoder(512, 1024)(64, 128)
1decoder(512, 1024)(512, 1024)

ntop: 有多少个输出,这里是 2 个,分别是 n.data 和 n.labels,即训练数据和标签数据
W_in=new_width,H_in=new_height
假设原始数据为(C_in, W_in, H_in)经过改层则变为(C_in, new_width, new_height)

输入数据输出数据
(C_in, W_in, H_in)(C_in, W_in/label_divide_factor, new_height/label_divide_factor)

initial_block()

网络结构
initial_block

左边是做3×3/str=2 3×3/str=23×3/str=2的卷积,右边是做MaxPooling,将两边结果concat一起,做通道合并,这样可以上来显著减少存储空间

代码:

def initial_block(n):
    bn_mode = 0
    if args.mode == 'test':
        bn_mode = 1
    n.conv0_1 = L.Convolution(n.data, num_output=13, bias_term=1, pad=1, kernel_size=3, stride=2,
                              weight_filler=dict(type='msra'))
    n.pool0_1 = L.Pooling(n.data, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.concat0_1 = L.Concat(n.conv0_1, n.pool0_1, axis=1)
    n.bn0_1 = L.BN(n.concat0_1, scale_filler=dict(type='constant', value=1), bn_mode=bn_mode,
                   shift_filler=dict(type='constant', value=0.001), param=[dict(lr_mult=1, decay_mult=1),
                                                                           dict(lr_mult=1, decay_mult=0)])

    n.prelu0_1 = L.PReLU(n.bn0_1)
    last_layer = 'prelu0_1'
    return n.to_proto(), last_layer

假设输入图像的大小为(C_in, W_in, H_in),左边经过卷积层计算(H_in + 2*1 - 3)/2 + 1 = H_in/2,得到特征图(13, W_in/2, H_in/2),右边经过Pooling层(H_in - 2)/2 + 1 = H_in/2,得到特征图(C_in, W_in/2, H_in/2),经过通道方向拼接得到输出(13+C_in, W_in/2, H_in/2),所以经过改层

(C_in, W_in, H_in)---->(13+C_in, W_in/2, H_in/2)
假设原始数据为(C_in, W_in, H_in)经过改层则变为(13+C_in, W_in/2, H_in/2)

附加参数输入数据输出数据
num_output=13(C_in, W_in, H_in)(13+C_in, W_in/2, H_in/2)
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值