DEEPLAB V2简介
论文路径
引用一篇博文:Semantic Segmentation –DeepLab(1,2,3)系列总结
DEEPLAB V2的相对于V1的特点在于:
1.使用RESNET101替代VGG16
2.使用ASPP的结构
开源代码走读:
对应的GITHUB地址:
https://github.com/DrSleep/tensorflow-deeplab-resnet
网络结构:
DEEPLAB V2是基于RESNET101实现的。
先看一下RESNET的结构:
代码在:MODEL.PY中实现
相对于基础的RESNET,主要又如下的几点不同:
1 使用空洞卷积代替卷积
在CONV4_x使用rate=2,在CONV5_x使用rate=4的空洞卷积。
2 CONV3_x之后,层之间不在做STRIDE=2的卷积
压缩8倍后不再做2倍的压缩。如默认321*321,最后结果矩阵是41*41而不是11*11.
3 使用ASPP加成。
(self.feed('res5b_relu',
'bn5c_branch2c')
.add(name='res5c')
.relu(name='res5c_relu')
.atrous_conv(3, 3, num_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 18, padding='SAME', relu=False, name='fc1_voc12_c2'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 24, padding='SAME', relu=False, name='fc1_voc12_c3'))
(self.feed('fc1_voc12_c0',
'fc1_voc12_c1',
'fc1_voc12_c2',
'fc1_voc12_c3')
.add(name='fc1_voc12'))
训练
获取LABEL
label = tf.image.decode_png(label_contents, channels=1)
需要将彩色图先转成灰度图。。。。
# 将LABEL转换成网络大小
label_proc = prepare_label(label_batch, tf.stack(raw_output.get_shape()[1:3]), num_classes=args.num_classes, one_hot=False)
实现代码:
input_batch = tf.image.resize_nearest_neighbor(input_batch, new_size) # as labels are integer numbers, need to use NN interp.
input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) # reducing the channel dimension.
插值出来的数据可能大于最大NUM,需要限定一下。
# Predictions: ignoring all predictions with labels greater or equal than n_classes
raw_gt = tf.reshape(label_proc, [-1,])
indices = tf.squeeze(tf.where(tf.less_equal(raw_gt, args.num_classes - 1)), 1)
gt = tf.cast(tf.gather(raw_gt, indices), tf.int32)
prediction = tf.gather(raw_prediction, indices)
计算LOSS
# Pixel-wise softmax loss.
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=prediction, labels=gt)
### l2 loss add all.
l2_losses = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'weights' in v.name]
reduced_loss = tf.reduce_mean(loss) + tf.add_n(l2_losses)
设定优化器参数
all_trainable = [v for v in tf.trainable_variables() if 'beta' not in v.name and 'gamma' not in v.name]
fc_trainable = [v for v in all_trainable if 'fc' in v.name]
conv_trainable = [v for v in all_trainable if 'fc' not in v.name] # lr * 1.0
fc_w_trainable = [v for v in fc_trainable if 'weights' in v.name] # lr * 10.0
fc_b_trainable = [v for v in fc_trainable if 'biases' in v.name] # lr * 20.0
assert(len(all_trainable) == len(fc_trainable) + len(conv_trainable))
assert(len(fc_trainable) == len(fc_w_trainable) + len(fc_b_trainable))
这里代码有个TRACK,最后的4个FC和前面网络的LR是不一致的。(其实可以先冻结前面的层,只训练FC,然后再整体FINE-TUNE)
# Define loss and optimisation parameters.
base_lr = tf.constant(args.learning_rate)
step_ph = tf.placeholder(dtype=tf.float32, shape=())
learning_rate = tf.scalar_mul(base_lr, tf.pow((1 - step_ph / args.num_steps), args.power))
opt_conv = tf.train.MomentumOptimizer(learning_rate, args.momentum)
opt_fc_w = tf.train.MomentumOptimizer(learning_rate * 10.0, args.momentum)
opt_fc_b = tf.train.MomentumOptimizer(learning_rate * 20.0, args.momentum)
grads = tf.gradients(reduced_loss, conv_trainable + fc_w_trainable + fc_b_trainable)
grads_conv = grads[:len(conv_trainable)]
grads_fc_w = grads[len(conv_trainable) : (len(conv_trainable) + len(fc_w_trainable))]
grads_fc_b = grads[(len(conv_trainable) + len(fc_w_trainable)):]
train_op_conv = opt_conv.apply_gradients(zip(grads_conv, conv_trainable))
train_op_fc_w = opt_fc_w.apply_gradients(zip(grads_fc_w, fc_w_trainable))
train_op_fc_b = opt_fc_b.apply_gradients(zip(grads_fc_b, fc_b_trainable))
train_op = tf.group(train_op_conv, train_op_fc_w, train_op_fc_b)
更多
msc– multi-scale inputs
基于基础的训练网络,训练3个大小不同的网络。获得最大值和训练数据比较。
# Create network.
with tf.variable_scope('', reuse=False):
net = DeepLabResNetModel({'data': image_batch}, is_training=args.is_training, num_classes=args.num_classes)
with tf.variable_scope('', reuse=True):
net075 = DeepLabResNetModel({'data': image_batch075}, is_training=args.is_training, num_classes=args.num_classes)
with tf.variable_scope('', reuse=True):
net05 = DeepLabResNetModel({'data': image_batch05}, is_training=args.is_training, num_classes=args.num_classes)
# Predictions.
raw_output100 = net.layers['fc1_voc12']
raw_output075 = net075.layers['fc1_voc12']
raw_output05 = net05.layers['fc1_voc12']
raw_output = tf.reduce_max(tf.stack([raw_output100,
tf.image.resize_images(raw_output075, tf.shape(raw_output100)[1:3,]),
tf.image.resize_images(raw_output05, tf.shape(raw_output100)[1:3,])]), axis=0)
CRF
引用了:
https://github.com/lucasb-eyer/pydensecrf
查看CRF分支,inference.py
# CRF.
raw_output_up = tf.nn.softmax(raw_output_up)
raw_output_up = tf.py_func(dense_crf, [raw_output_up, tf.expand_dims(img_orig, dim=0)], tf.float32)
def dense_crf(probs, img=None, n_iters=10,
sxy_gaussian=(1, 1), compat_gaussian=4,
kernel_gaussian=dcrf.DIAG_KERNEL,
normalisation_gaussian=dcrf.NORMALIZE_SYMMETRIC,
sxy_bilateral=(49, 49), compat_bilateral=5,
srgb_bilateral=(13, 13, 13),
kernel_bilateral=dcrf.DIAG_KERNEL,
normalisation_bilateral=dcrf.NORMALIZE_SYMMETRIC):
"""DenseCRF over unnormalised predictions.
More details on the arguments at https://github.com/lucasb-eyer/pydensecrf.
Args:
probs: class probabilities per pixel.
img: if given, the pairwise bilateral potential on raw RGB values will be computed.
n_iters: number of iterations of MAP inference.
sxy_gaussian: standard deviations for the location component of the colour-independent term.
compat_gaussian: label compatibilities for the colour-independent term (can be a number, a 1D array, or a 2D array).
kernel_gaussian: kernel precision matrix for the colour-independent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
normalisation_gaussian: normalisation for the colour-independent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).
sxy_bilateral: standard deviations for the location component of the colour-dependent term.
compat_bilateral: label compatibilities for the colour-dependent term (can be a number, a 1D array, or a 2D array).
srgb_bilateral: standard deviations for the colour component of the colour-dependent term.
kernel_bilateral: kernel precision matrix for the colour-dependent term (can take values CONST_KERNEL, DIAG_KERNEL, or FULL_KERNEL).
normalisation_bilateral: normalisation for the colour-dependent term (possible values are NO_NORMALIZATION, NORMALIZE_BEFORE, NORMALIZE_AFTER, NORMALIZE_SYMMETRIC).
Returns:
Refined predictions after MAP inference.
"""
_, h, w, _ = probs.shape
probs = probs[0].transpose(2, 0, 1).copy(order='C') # Need a contiguous array.
d = dcrf.DenseCRF2D(h, w, n_classes) # Define DenseCRF model.
U = -np.log(probs) # Unary potential.
U = U.reshape((n_classes, -1)) # Needs to be flat.
d.setUnaryEnergy(U)
d.addPairwiseGaussian(sxy=sxy_gaussian, compat=compat_gaussian,
kernel=kernel_gaussian, normalization=normalisation_gaussian)
if img is not None:
assert(img.shape[1:3] == (h, w)), "The image height and width must coincide with dimensions of the logits."
d.addPairwiseBilateral(sxy=sxy_bilateral, compat=compat_bilateral,
kernel=kernel_bilateral, normalization=normalisation_bilateral,
srgb=srgb_bilateral, rgbim=img[0])
Q = d.inference(n_iters)
preds = np.array(Q, dtype=np.float32).reshape((n_classes, h, w)).transpose(1, 2, 0)
return np.expand_dims(preds, 0)