enet网络结构参考博客:emantic Segmentation–ENet:A Deep Neural Network Architecture for Real-Time Semantic…论文解读
整体结构
代码是使用pycaffe接口来写model,打开scripts/create_enet_prototxt.py,可以看到网络的主要结构。
n = caffe.NetSpec()
if args.mode == 'train_encoder_decoder':
network = data_layer_train(n, 1)
out_directory = args.out_dir + 'coal_enet_train_encoder_decoder.prototxt'
elif args.mode == 'train_encoder':
network = data_layer_train(n, 8)
out_directory = args.out_dir + 'coal_enet_train_encoder.prototxt'
elif args.mode == 'test':
network = data_layer_test(n)
out_directory = args.out_dir + 'coal_enet_deploy.prototxt'
else:
raise Exception("Wrong mode! Just train_encoeder, train_encoder_decoder and test mode available, "
"but received {}.".format(args.mode))
network, prev_layer = initial_block(n)
network, prev_layer = bottleneck(n, prev_layer, 1, 0, 64, 'downsampling') # stage, number_bottleneck, num_input,
# type,
for i in xrange(1, 5):
network, prev_layer = bottleneck(n, prev_layer, 1, i, 64, 'regular')
network, prev_layer = bottleneck(n, prev_layer, 2, 0, 128, 'downsampling')
for j in xrange(2, 4):
network, prev_layer = bottleneck(n, prev_layer, j, 1, 128, 'regular')
network, prev_layer = bottleneck(n, prev_layer, j, 2, 128, 'dilated', 2)
network, prev_layer = bottleneck(n, prev_layer, j, 3, 128, 'asymmetric', 5)
network, prev_layer = bottleneck(n, prev_layer, j, 4, 128, 'dilated', 4)
network, prev_layer = bottleneck(n, prev_layer, j, 5, 128, 'regular')
network, prev_layer = bottleneck(n, prev_layer, j, 6, 128, 'dilated', 8)
network, prev_layer = bottleneck(n, prev_layer, j, 7, 128, 'asymmetric', 5)
network, prev_layer = bottleneck(n, prev_layer, j, 8, 128, 'dilated', 16)
if args.mode == 'train_encoder_decoder' or args.mode == 'test':
network, prev_layer = bottleneck(n, prev_layer, 4, 0, 64, 'upsampling', 'relu') # last one = additional flag,
# that relu is used instead of prelu
network, prev_layer = bottleneck(n, prev_layer, 4, 1, 64, 'regular', 'relu')
network, prev_layer = bottleneck(n, prev_layer, 4, 2, 64, 'regular', 'relu')
network, prev_layer = bottleneck(n, prev_layer, 5, 0, 16, 'upsampling', 'relu')
network, prev_layer = bottleneck(n, prev_layer, 5, 1, 16, 'regular', 'relu')
network, prev_layer = fullconv(n, prev_layer, 6, 0, args.num_of_classes)
if args.mode == 'train_encoder_decoder' or args.mode == 'train_encoder':
network = loss_layer(n, prev_layer)
网络主要由为数据输入,initial_block,bottleneck等组成,下面开始逐个分析
数据输入
在data_layer_train()函数中,作者是使用了DenseImageData数据层,现在官方的caffe是没有这个层的,需要我们编译的时候添加进去,如何添加,看我上一篇文章
n.data, n.label = L.DenseImageData(dense_image_data_param=dict(source=args.source, new_height=args.new_height,
new_width=args.new_width, batch_size=args.batch_size,
shuffle=args.shuffle,
label_divide_factor=label_divide_factor), ntop=2)
通过代码可以知道,原始数据在args.source中定义是个文本文件,图像大小被重新resize为(new_width/label_divide_factor, new_height/label_divide_factor),
label_divide_factor 为在训练中label resize大小,
top[1]->Reshape(batch_size, 1, height/label_divide_factor, width/label_divide_factor);
这个参数很重要,在encoder时,label_divide_factor=8,对应的label为(64, 128)即512/8=64, 1024/8=128,
而在decoder时,该参数为1,对应的label和原图一直为(512, 1024)
label_divide_factor | 类型 | 输入label | reshape后label |
---|---|---|---|
8 | encoder | (512, 1024) | (64, 128) |
1 | decoder | (512, 1024) | (512, 1024) |
ntop: 有多少个输出,这里是 2 个,分别是 n.data 和 n.labels,即训练数据和标签数据
W_in=new_width,H_in=new_height
假设原始数据为(C_in, W_in, H_in)经过改层则变为(C_in, new_width, new_height)
输入数据 | 输出数据 |
---|---|
(C_in, W_in, H_in) | (C_in, W_in/label_divide_factor, new_height/label_divide_factor) |
initial_block()
网络结构
左边是做3×3/str=2 3×3/str=23×3/str=2的卷积,右边是做MaxPooling,将两边结果concat一起,做通道合并,这样可以上来显著减少存储空间
代码:
def initial_block(n):
bn_mode = 0
if args.mode == 'test':
bn_mode = 1
n.conv0_1 = L.Convolution(n.data, num_output=13, bias_term=1, pad=1, kernel_size=3, stride=2,
weight_filler=dict(type='msra'))
n.pool0_1 = L.Pooling(n.data, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.concat0_1 = L.Concat(n.conv0_1, n.pool0_1, axis=1)
n.bn0_1 = L.BN(n.concat0_1, scale_filler=dict(type='constant', value=1), bn_mode=bn_mode,
shift_filler=dict(type='constant', value=0.001), param=[dict(lr_mult=1, decay_mult=1),
dict(lr_mult=1, decay_mult=0)])
n.prelu0_1 = L.PReLU(n.bn0_1)
last_layer = 'prelu0_1'
return n.to_proto(), last_layer
假设输入图像的大小为(C_in, W_in, H_in),左边经过卷积层计算(H_in + 2*1 - 3)/2 + 1 = H_in/2,得到特征图(13, W_in/2, H_in/2),右边经过Pooling层(H_in - 2)/2 + 1 = H_in/2,得到特征图(C_in, W_in/2, H_in/2),经过通道方向拼接得到输出(13+C_in, W_in/2, H_in/2),所以经过改层
(C_in, W_in, H_in)---->(13+C_in, W_in/2, H_in/2)
假设原始数据为(C_in, W_in, H_in)经过改层则变为(13+C_in, W_in/2, H_in/2)
附加参数 | 输入数据 | 输出数据 |
---|---|---|
num_output=13 | (C_in, W_in, H_in) | (13+C_in, W_in/2, H_in/2) |