SSD源码解析1-整体结构和框架
SSD源码解析2-input_pipeline()
SSD源码解析3-ssd_model_fn()
SSD源码解析4-损失函数(理论+源码)
参考文章:
解析代码:
看了一下两个版本的代码,如上面链接所示,
简单版,代码和之前解析的源码类型是一致的,更容易理解些,但是只有预测部分而没有训练部分。虽然能很容易理解,但里面没有标签处理,损失计算等部分。即使看懂了,也有种啥都没学到的感觉。
复杂版,当时看到这个源代码是有点懵的,为啥呢?因为看不懂啊,之前没见过用这种方式写的代码,套路不太一样。反反复复犹犹豫豫了好几次,想着要不要花点精力看复杂版的,也尝试在github上搜了一下看看有没有更合适的版本,结果是并没有,所以就硬着头皮解析这个比较复杂的代码了。前期是先跳过了看不懂的部分,直接去看网络构建部分,anchor生成部分,计算损失部分,数据预处理部分,但是整体运行逻辑还是有点懵。后来看了一点有关TensorFlow的Estimator讲解,稍微有点眉目,但是还不是很了解,有点不知所以然。主要是Estimator的方式不太习惯,如果只把他当作一种框架,你按它固定的格式传入相应的参数就行,还可以接受些。具体的网络搭建,anchor创建,损失计算等和之前还是一样的。
iput_pipeline()函数框架
该函数的主要功能是,读取图像,标签,生成anchor等。
input_pipeline()函数源代码解析:
- 该函数先是创建了AnchorCreator类对象,同时传入了anchor生成的一些相关参数,
- 然后调用get_all_anchors()函数生成所有的anchors,get_all_anchors()函数返回的变量为all_anchors, all_num_anchors_depth, all_num_anchors_spatial。其中all_anchors每个维度数据代表意义:y_on_image (38,38,1),x_on_image (38,38,1),list_h_on_image (4,),list_w_on_image (4,),num_anchors_along_depth 4,num_anchors_along_spatial 38*38=1444 分别为anchor的y坐标,x坐标,高,宽,一个点几个anchor,该特征图共几个点(这里只拿38*38的图举例说明,其他以此类推)。
- 然后是用lamba定义了两个函数,
- preprocess_image()函数,对图像预处理,返回图像数据,标签和标记框信息。
- encode_all_anchors()函数,对GTboxes做编码处理(计算GTboxes相对于anchor的偏移量)
4. 然后在slim_get_batch()函数中调用了上面两个函数,该函数按batch_size大小获取图像、标记框、标签等信息,(这里 的标记框坐标是编码后的偏移量)。
5. 然后调用全局变量字典global_anchor_info,并在其中用lamba定义了一个函数:decode_all_anchors()该函数的主要作用是对预测出的矩形框进行解码--将坐标偏移量转换成实际坐标,转换的公式和编码过程互为逆运算。
6. 最终input_pipeline()函数返回的参数为:图像,标记框坐标信息(偏移量),标记框类别标签等信息。
具体看源代码:
'''
此函数完成anchor生成,图像读取,GT信息读取等工作
'''
def input_pipeline(dataset_pattern='train-*', is_training=True, batch_size=FLAGS.batch_size):
def input_fn():
out_shape = [FLAGS.train_image_size] * 2 # train_image_size=300 out_shape=[300,300]
#********* utility/anchor_manipulator.py 首先创建一个AnchorCreator类对象,传入相关参数
anchor_creator = anchor_manipulator.AnchorCreator(out_shape,
layers_shapes = [(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
anchor_scales = [(0.1,), (0.2,), (0.375,), (0.55,), (0.725,), (0.9,)],
extra_anchor_scales = [(0.1414,), (0.2739,), (0.4541,), (0.6315,), (0.8078,), (0.9836,)],
anchor_ratios = [(1., 2., .5), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., .5), (1., 2., .5)],
layer_steps = [8, 16, 32, 64, 100, 300])
# 6个特征图,每个特征图分别生成anchor,然后放在一个列表中,
# all_anchors:y_on_image,x_on_image,list_h_on_image,list_w_on_image,num_anchors_along_depth,num_anchors_along_spatial
# shape((38,38,1),(38,38,1),(4,),(4,),4,38*38=1444 分别为anchor的y坐标,x坐标,高,宽,一个点几个anchor,共几个点
# all_num_anchors_depth=[4,6,6,6,4,4] all_num_anchors_spatial=[38*38,19*19,10*10,5*5,3*3,1*1]=[1444,361,100,25,9,1]
all_anchors, all_num_anchors_depth, all_num_anchors_spatial = anchor_creator.get_all_anchors() #********* utility/anchor_manipulator.py
# 获取每一层anchor数量 num_anchors_per_layer=[5776,2166,600,150,36,4]=[38*38,19*19,10*10,5*5,3*3,1*1]*4=[1444,361,100,25,9,1]*4
num_anchors_per_layer = []
for ind in range(len(all_anchors)):
num_anchors_per_layer.append(all_num_anchors_depth[ind] * all_num_anchors_spatial[ind])
#********* utility/anchor_manipulator.py 创建anchor编码器类对象
anchor_encoder_decoder = anchor_manipulator.AnchorEncoder(allowed_borders = [1.0] * 6,
positive_threshold = FLAGS.match_threshold, # 正样本阈值
ignore_threshold = FLAGS.neg_threshold, # 负样本阈值
prior_scaling=[0.1, 0.1, 0.2, 0.2])
# 相当于定义了一个函数,并没有在这执行,调用时才会执行
# ********* preprocessing/ssd_preprocessing.py 图像预处理,做一些水平翻转之类的预处理,扩展数据集
# lambda就是用来定义一个匿名函数的 add = lambda x, y : x+y -》add(1,2) # 结果为3
image_preprocessing_fn = lambda image_, labels_, bboxes_ : ssd_preprocessing.preprocess_image(image_, labels_, bboxes_, out_shape, is_training=is_training, data_format=FLAGS.data_format, output_rgb=False)
# 相当于定义了一个函数,并没有在这执行,调用时才会执行
# ********* utility/anchor_manipulator.py 进行编码(实际坐标转换成偏移量),标记正负样本信息等
anchor_encoder_fn = lambda glabels_, gbboxes_: anchor_encoder_decoder.encode_all_anchors(glabels_, gbboxes_, all_anchors, all_num_anchors_depth, all_num_anchors_spatial)
# 调用了上面定义的两个函数
# ********* dataset/dataset_common.py 按batch_size获取图像及标注信息
image, _, shape, loc_targets, cls_targets, match_scores = dataset_common.slim_get_batch(FLAGS.num_classes,
batch_size,
('train' if is_training else 'val'),
os.path.join(FLAGS.data_dir, dataset_pattern),
FLAGS.num_readers,
FLAGS.num_preprocessing_threads,
image_preprocessing_fn, # ***********
anchor_encoder_fn, # **************
num_epochs=FLAGS.train_epochs,
is_training=is_training)
global global_anchor_info
# ********* utility/anchor_manipulator.py 解码 预测框偏移量转换成实际坐标:decode_fn= ymin, xmin, ymax, xmax
global_anchor_info = {'decode_fn': lambda pred : anchor_encoder_decoder.decode_all_anchors(pred, num_anchors_per_layer),
'num_anchors_per_layer': num_anchors_per_layer,
'all_num_anchors_depth': all_num_anchors_depth }
return image, {'shape': shape, 'loc_targets': loc_targets, 'cls_targets': cls_targets, 'match_scores': match_scores}
return input_fn
下面解析一下get_all_anchors()函数
get_all_anchors()函数框架
该函数在utility/anchor_manipulator.py中定义
先是通过类AnchorCreator的__init__函数传入了生成anchor所需要的参数
# anchor生成器
class AnchorCreator(object):
# 初始化相关参数
def __init__(self, img_shape, layers_shapes, anchor_scales, extra_anchor_scales, anchor_ratios, layer_steps):
super(AnchorCreator, self).__init__()
# img_shape -> (height, width)
self._img_shape = img_shape #[300, 300]
self._layers_shapes = layers_shapes # [(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
self._anchor_scales = anchor_scales # [(0.1,), (0.2,), (0.375,), (0.55,), (0.725,), (0.9,)]
self._extra_anchor_scales = extra_anchor_scales # [(0.1414,), (0.2739,), (0.4541,), (0.6315,), (0.8078,), (0.9836,)]
self._anchor_ratios = anchor_ratios # [(1., 2., .5), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., .5), (1., 2., .5)]
self._layer_steps = layer_steps # [8, 16, 32, 64, 100, 300])
self._anchor_offset = [0.5] * len(self._layers_shapes) # [0.5,0.5,0.5,0.5,0.5,0.5]
这里面的参数都是干嘛用的,可以参考一下下面这个图
get_all_anchors()函数并没有具体操作数据,而是通过调用get_layer_anchors()函数完成anchors生成,最终get_all_anchors()函数返回的信息是:
- all_anchors, # anchor坐标信息
- all_num_anchors_depth, # 每个点生成几个anchor
- all_num_anchors_spatial # 该特征图一共几个点
上面3个变量里面分别放着6个特征图的相关信息。
get_all_anchors()函数源码:
'''
6个特征图,每个特征图分别生成anchor,然后放在一个列表中
'''
def get_all_anchors(self):
all_anchors = []
all_num_anchors_depth = [] # all_num_anchors_depth=[4,6,6,6,4,4]
all_num_anchors_spatial = [] # all_num_anchors_spatial=[38*38,19*19,10*10,5*5,3*3,1*1]=[1444,361,100,25,9,1]
for layer_index, layer_shape in enumerate(self._layers_shapes):
anchors_this_layer = self.get_layer_anchors(layer_shape,
self._anchor_scales[layer_index],
self._extra_anchor_scales[layer_index],
self._anchor_ratios[layer_index],
self._layer_steps[layer_index],
self._anchor_offset[layer_index]) # [0.5] * len(self._layers_shapes)
all_anchors.append(anchors_this_layer[:-2]) # anchor坐标信息
all_num_anchors_depth.append(anchors_this_layer[-2]) # 每个点生成几个anchor
all_num_anchors_spatial.append(anchors_this_layer[-1]) # 该特征图一共几个点
return all_anchors, all_num_anchors_depth, all_num_anchors_spatial
get_layer_anchors()函数源码:
'''
get_all_anchors(self)根据当前特征图生成anchors
# layer_shape [38,38]、[19,19]、[10,10]、[5,5]、[3,3]、[1,1]
# anchor_scale [(0.1,), (0.2,), (0.375,), (0.55,), (0.725,), (0.9,)]
# extra_anchor_scales [(0.1414,), (0.2739,), (0.4541,), (0.6315,), (0.8078,), (0.9836,)]
# anchor_ratios [(1., 2., .5), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., 3., .5, 0.3333), (1., 2., .5), (1., 2., .5)]
# layer_step [8, 16, 32, 64, 100, 300]) 是特征图相对于原图的缩放率
# offset [0.5,0.5,0.5,0.5,0.5,0.5]
# 返回信息:
# y_on_image,x_on_image,list_h_on_image,list_w_on_image,num_anchors_along_depth,num_anchors_along_spatial
# shape((38,38,1),(38,38,1),(4,),(4,),4,38*38=1444 分别为anchor的y坐标,x坐标,高,宽,一个点几个anchor,共几个点
'''
def get_layer_anchors(self, layer_shape, anchor_scale, extra_anchor_scale, anchor_ratio, layer_step, offset = 0.5):
'''
实际layer_shape [38,38]、[19,19]、[10,10]、[5,5]、[3,3]、[1,1]
假设 layer_shape[0] = 6, layer_shape[1] = 5
x_on_layer = [[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]
y_on_layer = [[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4],
[5, 5, 5, 5, 5]]
'''
with tf.name_scope('get_layer_anchors'):
# x_on_layer, y_on_layer 相当于特征图上每个点的坐标(0,0),(0,1),(0,2)...(0,37);(1,0),(1,1),(1,2)...(37,37)
x_on_layer, y_on_layer = tf.meshgrid(tf.range(layer_shape[1]), tf.range(layer_shape[0]))
# 转换成对应原图上的坐标,再归一化。原图坐标=特征图上坐标(y+0.5)*缩放倍率,在除以原图宽高,归一化到0-1
y_on_image = (tf.cast(y_on_layer, tf.float32) + offset) * layer_step / self._img_shape[0] # _img_shape[300,300]
x_on_image = (tf.cast(x_on_layer, tf.float32) + offset) * layer_step / self._img_shape[1]
# num_anchors_along_depth =len(0.1,)*len(1.0,2.0,0.5)+len(0.1414)=1*3+1=4
num_anchors_along_depth = len(anchor_scale) * len(anchor_ratio) + len(extra_anchor_scale)
num_anchors_along_spatial = layer_shape[1] * layer_shape[0] # 38*38=1444
list_h_on_image = [] # 这个就是anchor在原图上的宽高,只是做了归一化处理,值都为0-1
list_w_on_image = []
global_index = 0
# for square anchors
for _, scale in enumerate(extra_anchor_scale):
list_h_on_image.append(scale) #
list_w_on_image.append(scale)
global_index += 1
# for other aspect ratio anchors
for scale_index, scale in enumerate(anchor_scale):
for ratio_index, ratio in enumerate(anchor_ratio):
list_h_on_image.append(scale / math.sqrt(ratio)) # h=s*sqrt(ratio)
list_w_on_image.append(scale * math.sqrt(ratio)) # w=s/sqrt(ratio) w*h=s^2
global_index += 1
# tf.expand_dims 在axis轴处给input增加一个为1的维度,这里是在(38,38)后面再加1个维度变成(38,38,1)
# 返回信息 y_on_image,x_on_image,list_h_on_image,list_w_on_image,num_anchors_along_depth,num_anchors_along_spatial
# shape((38,38,1),(38,38,1),(4,),(4,),4,38*381444 分别为anchor的y坐标,x坐标,高,宽,一个点几个anchor,共几个点
return tf.expand_dims(y_on_image, axis=-1), tf.expand_dims(x_on_image, axis=-1), \
tf.constant(list_h_on_image, dtype=tf.float32), \
tf.constant(list_w_on_image, dtype=tf.float32), num_anchors_along_depth, num_anchors_along_spatial
preprocess_image()函数
对图像做一些预处理,水平翻转之类的,这里不解析了
slim_get_batch()函数
按batch_size获取图像及标注信息,这里不解析了
encode_all_anchors()函数
进行编码(实际坐标转换成偏移量),标记正负样本信息等
函数结构:
源码:
'''
进行编码(实际坐标转换成偏移量),标记正负样本信息等
'''
def encode_all_anchors(self, labels, bboxes, all_anchors, all_num_anchors_depth, all_num_anchors_spatial, debug=False):
# y, x, h, w are all in range [0, 1] relative to the original image size y,x,h,w相对于原始图像尺寸均在[0,1]范围内
assert (len(all_num_anchors_depth)==len(all_num_anchors_spatial)) and (len(all_num_anchors_depth)==len(all_anchors)), 'inconsist num layers for anchors.'
with tf.name_scope('encode_all_anchors'):
num_layers = len(all_num_anchors_depth)
list_anchors_ymin = [] # 6个特征图总的信息
list_anchors_xmin = []
list_anchors_ymax = []
list_anchors_xmax = []
tiled_allowed_borders = [] # 可扩展的边界,(8732)每个值都是1.0
for ind, anchor in enumerate(all_anchors):
# 将x, y, w, h转换成x1, y1, x2, y2形式
anchors_ymin_, anchors_xmin_, anchors_ymax_, anchors_xmax_ = self.center2point(anchor[0], anchor[1], anchor[2], anchor[3])
list_anchors_ymin.append(tf.reshape(anchors_ymin_, [-1]))
list_anchors_xmin.append(tf.reshape(anchors_xmin_, [-1]))
list_anchors_ymax.append(tf.reshape(anchors_ymax_, [-1]))
list_anchors_xmax.append(tf.reshape(anchors_xmax_, [-1]))
tiled_allowed_borders.extend([self._allowed_borders[ind]] * all_num_anchors_depth[ind] * all_num_anchors_spatial[ind])
# 原来是6个,拼接成1个
anchors_ymin = tf.concat(list_anchors_ymin, 0, name='concat_ymin')
anchors_xmin = tf.concat(list_anchors_xmin, 0, name='concat_xmin')
anchors_ymax = tf.concat(list_anchors_ymax, 0, name='concat_ymax')
anchors_xmax = tf.concat(list_anchors_xmax, 0, name='concat_xmax')
if self._clip: #
# tf.clip_by_value基于定义的min与max对tesor数据进行截断操作,目的是为了应对梯度爆发或者梯度消失的情况
# tf.clip_by_value(A, min, max):输入一个张量A,把A中的每一个元素的值都压缩在min和max之间。
# 小于min的让它等于min,大于max的元素的值等于max。A = np.array([[1,1,2,4], [3,4,8,5]])
# tf.clip_by_value(A, 2, 5) -》[[2 2 2 4] [3 4 5 5]]
anchors_ymin = tf.clip_by_value(anchors_ymin, 0., 1.)
anchors_xmin = tf.clip_by_value(anchors_xmin, 0., 1.)
anchors_ymax = tf.clip_by_value(anchors_ymax, 0., 1.)
anchors_xmax = tf.clip_by_value(anchors_xmax, 0., 1.)
# 列表合并
anchor_allowed_borders = tf.stack(tiled_allowed_borders, 0, name='concat_allowed_borders')
# 逻辑与操作,两个都对为对,否则为false
inside_mask = tf.logical_and(tf.logical_and(anchors_ymin > -anchor_allowed_borders * 1.,
anchors_xmin > -anchor_allowed_borders * 1.),
tf.logical_and(anchors_ymax < (1. + anchor_allowed_borders * 1.),
anchors_xmax < (1. + anchor_allowed_borders * 1.)))
anchors_point = tf.stack([anchors_ymin, anchors_xmin, anchors_ymax, anchors_xmax], axis=-1)
# save_anchors_op = tf.py_func(save_anchors,
# [bboxes,
# labels,
# anchors_point],
# tf.int64, stateful=True)
# with tf.control_dependencies([save_anchors_op]):
# 计算anchors与GT的交并比IOU
overlap_matrix = iou_matrix(bboxes, anchors_point) * tf.cast(tf.expand_dims(inside_mask, 0), tf.float32)
# 根据anchors与GT的IOU,找出其中的正负样本,及其IOU得分
matched_gt, gt_scores = do_dual_max_match(overlap_matrix, self._ignore_threshold, self._positive_threshold)
# get all positive matching positions 得到所有正匹配的位置
matched_gt_mask = matched_gt > -1
matched_indices = tf.clip_by_value(matched_gt, 0, tf.int64.max)
# the labels here maybe chaos at those non-positive positions
gt_labels = tf.gather(labels, matched_indices) # tf.gather:用一个一维的索引数组,将张量中对应索引的向量提取出来
# filter the invalid labels 过滤无效标签
gt_labels = gt_labels * tf.cast(matched_gt_mask, tf.int64)
# set those ignored positions to -1 # 设置这些忽略的位置为-1
gt_labels = gt_labels + (-1 * tf.cast(matched_gt < -1, tf.int64))
gt_ymin, gt_xmin, gt_ymax, gt_xmax = tf.unstack(tf.gather(bboxes, matched_indices), 4, axis=-1)
# transform to center / size. 转换成中心点xy,和w,h格式
gt_cy, gt_cx, gt_h, gt_w = self.point2center(gt_ymin, gt_xmin, gt_ymax, gt_xmax)
anchor_cy, anchor_cx, anchor_h, anchor_w = self.point2center(anchors_ymin, anchors_xmin, anchors_ymax, anchors_xmax)
# encode features.
# the prior_scaling (in fact is 5 and 10) is use for balance the regression loss of center and with(or height)
# before_scaling(实际上是5和10)用于平衡center和with(或height)的回归损失
gt_cy = (gt_cy - anchor_cy) / anchor_h / self._prior_scaling[0] # [0.1, 0.1, 0.2, 0.2]
gt_cx = (gt_cx - anchor_cx) / anchor_w / self._prior_scaling[1]
gt_h = tf.log(gt_h / anchor_h) / self._prior_scaling[2]
gt_w = tf.log(gt_w / anchor_w) / self._prior_scaling[3]
# now gt_localizations is our regression object, but also maybe chaos at those non-positive positions
# 现在gt_localizations是我们的回归对象,但在那些非正样本位置可能也很混乱
if debug:
gt_targets = tf.stack([anchors_ymin, anchors_xmin, anchors_ymax, anchors_xmax], axis=-1)
else:
gt_targets = tf.stack([gt_cy, gt_cx, gt_h, gt_w], axis=-1)
# set all targets of non-positive positions to 0 将所有非正样本位置的设置为0
gt_targets = tf.expand_dims(tf.cast(matched_gt_mask, tf.float32), -1) * gt_targets
self._all_anchors = (anchor_cy, anchor_cx, anchor_h, anchor_w)
return gt_targets, gt_labels, gt_scores
其中用到的两个小函数
'''
将x,y,w,h转换成x1,y1,x2,y2形式
'''
def center2point(self, center_y, center_x, height, width):
return center_y - height / 2., center_x - width / 2., center_y + height / 2., center_x + width / 2.,
'''
将x1,y1,x2,y2转换成cx,cy,h,w形式
'''
def point2center(self, ymin, xmin, ymax, xmax):
height, width = (ymax - ymin), (xmax - xmin)
return ymin + height / 2., xmin + width / 2., height, width
decode_all_anchors()函数
预测框偏移量转换成实际坐标,转换公式和上面的编码过程的公式互为逆变换。
代码结构:
源代码:
'''
预测框偏移量转换成实际坐标
返回:pred_x1, pred_y1, pred_x2, pred_y2,
shape (5775,4),(2166,4),(600,4),(150,4),(36,4),(4,4)
'''
def decode_all_anchors(self, pred_location, num_anchors_per_layer):
assert self._all_anchors is not None, 'no anchors to decode.'
with tf.name_scope('decode_all_anchors', values=[pred_location]):
anchor_cy, anchor_cx, anchor_h, anchor_w = self._all_anchors
# 编码的反计算,编码公式见encode_all_anchors()函数末尾
pred_h = tf.exp(pred_location[:, -2] * self._prior_scaling[2]) * anchor_h #pred_h=e^(h*_prior_scaling)*anchor_h
pred_w = tf.exp(pred_location[:, -1] * self._prior_scaling[3]) * anchor_w
pred_cy = pred_location[:, 0] * self._prior_scaling[0] * anchor_h + anchor_cy
pred_cx = pred_location[:, 1] * self._prior_scaling[1] * anchor_w + anchor_cx
a=tf.split(tf.stack(self.center2point(pred_cy, pred_cx, pred_h, pred_w), axis=-1), num_anchors_per_layer, axis=0)
return tf.split(tf.stack(self.center2point(pred_cy, pred_cx, pred_h, pred_w), axis=-1), num_anchors_per_layer, axis=0)