0 前言
Faster R-CNN是很多人进行目标检测领域学习的必经之路。本文将从实战的角度出发,对 Faster R-CNN 的结构、损失函数以及令人难以理解的 anchor 进行详细说明。本文将结合代码从以下几个部分进行解析。
- 在个人数据集上快速实战上手
- anchor 解析
- 从损失函数到全局解析
我的文章最先会发在我的知乎账号上,知乎关注周威即可。
接着前两讲的内容
- FasterRCNN从实战到代码解析
- 深度解析FasterRCNN的先验框Anchor
本文将从 Faster RCNN 的损失函数出发,对FasterRCNN进一步进行解析。
后台回复”FasterRCNN"获取代码,代码是基于 python3.5 和 tensorflow1.X 的。
3 从损失到全局解析
其实关于FasterRCNN论文解读的相关文章已经够多了,大多数读者对FasterRCNN都应当有所理解了。如果不清楚,我推荐知乎上的一篇文章。
- https://zhuanlan.zhihu.com/p/31426458
接下来我们就从代码中解读Faster RCNN的一些细节和逻辑。
话说FasterRCNN的代码有点长,又是跳来跳去的,看起来着实让人头疼。不瞒大家,我的python和tensorflow就是看FasterRCNN代码入门的。
本文我们从损失函数出发进行解析,损失函数定义在lib/nets/network.py文件中的函数_add_losses中,对应的代码如下:
def _add_losses(self, sigma_rpn=3.0): #添加损失函数
with tf.variable_scope('loss_' + self._tag): #self._tag='default'
# RPN, class loss
rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2]) #将rpn层的分类得分reshape shape=[1,h,w,18]--->[h*w*9,2]
rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1]) #[-1]是一行,将rpn的标签转为行向量,rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1,shape--》[h*w*9,]
rpn_select = tf.where(tf.not_equal(rpn_label, -1)) #找到rpn_label不等于-1的位置,就是128个fg和128个bg
rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2]) #tf.gather(等待被取元素的张量,索引)tf.gather根据索引,从输入张量中依次取元素,构成一个新的张量,找到前景所在的score
rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1]) #挑选出前景的标记
#rpn_cls_score shape-->[256,2]
#rpn_label-->[256,]
rpn_cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label)) #根据分类得分和rpn的标记计算rpn的交叉验证熵
# RPN, bbox loss
rpn_bbox_pred = self._predictions['rpn_bbox_pred'] #[1,h,w,36]
rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets'] #(1,h,w,36)
rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']# rpn_bbox_inside_weights:shape=(1,h,w,36),每个fg的权重为(1.0,1.0,1.0,1.0)
rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']# rpn_bbox_outside_weights:shape=(1,h,w,36),每个前景背景权重为(1/256,1/256,1/256,1/256)
rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
# RCNN, class loss
cls_score = self._predictions["cls_score"] #(256,num_classes)
label = tf.reshape(self._proposal_targets["labels"], [-1]) #labels:(256,) 前128为前景,后128为背景,前景取值范围[1,NUM CLASSES-1],背景取值为0
cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=tf.reshape(cls_score, [-1, self._num_classes]), labels=label))
# RCNN, bbox loss
bbox_pred = self._predictions['bbox_pred'] #(256,4*num_classes)
bbox_targets = self._proposal_targets['bbox_targets'] #(256,4*21),在每行的类别中添加对应类别的坐标,其余为0
bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
bbox_outside_weights = self._proposal_targets['bbox_outside_weights']
loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
self._losses['cross_entropy'] = cross_entropy
self._losses['loss_box'] = loss_box
self._losses['rpn_cross_entropy'] = rpn_cross_entropy
self._losses['rpn_loss_box'] = rpn_loss_box
loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
self._losses['total_loss'] = loss
self._event_summaries.update(self._losses)
return loss
这里我们来进行解读,上述loss函数中定义了四个损失,这不正好对应了论文中的四个损失嘛,分别是
- RPN层的两个损失
- 1.是否为前景背景的二分类交叉熵损失
- bbox的第一次修正损失
- 对num_classes个物体的分类损失
- 第二次bbox修正损失
好,现在我们进入每个损失的定义过程中。
3.1 前景背景的二分类交叉熵损失
有关前景背景的二分类交叉熵损失,代码中是这么定义的。
rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2]) #将rpn层的分类得分reshape shape=[1,h,w,18]--->[h*w*9,2]
rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1]) #[-1]是一行,将rpn的标签转为行向量,rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1,shape--》[h*w*9,]
rpn_select = tf.where(tf.not_equal(rpn_label, -1)) #找到rpn_label不等于-1的位置,就是128个fg和128个bg
rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2]) #tf.gather(等待被取元素的张量,索引)tf.gather根据索引,从输入张量中依次取元素,构成一个新的张量,找到前景所在的score
rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1]) #挑选出前景的标记
#rpn_cls_score shape-->[256,2]
#rpn_label-->[256,]
rpn_cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label)) #根据分类得分和rpn的标记计算rpn的交叉验证熵
sparse_softmax_cross_entropy_with_logits的两个输入,一个是rpn_cls_score,另一个是rpn_label,这rpn_cls_score应该是网络前向过程的结果,也就是预测值,而这个rpn_label是人为定义的label,是真实值。
那这个rpn_cls_score是啥,从哪儿来的结果呢?
通过不断地追溯,终于追到了lib/net/vgg16.py中了(虽说这个py文件是vgg16命名的,但是可不要被他蛊惑了,他是network的子类,是我们最核心的网络模型搭建脚本)。
我们发现,这个rpn_cls_score是函数build_rpn的输出,追溯到build_rpn中(原来这里在搭建RPN层呀),这个rpn_cls_score就是特征图经过3x3卷积后,再经过一个1x1,将特征图的通道压缩至18,如图1所示。
再看看rpn_cls_score的shape=(1,h,w,18),想想看咱们有多少锚,总共(hw9),那就是每个锚有两个位置放score,一个是前景fg的score,一个是背景bg的score,恍然大悟!
那我们根据函数build_rpn代码(在该层搭建RPN层)完善图1。根据代码
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
# Change it so that the score has 2 as its channel size
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape') #将(1,h,w,18)变为(1,9*h,w,2)
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape") #shape=[1,9*h,w,2]
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob") #恢复上个reshape前,输出为(1,h,w,18)
#框的第一次回归,输出shape为[1,h,w,36]
rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
根据以上代码,我们获得图2所示。
rpn_cls_score_reshape的shape=(1,9h,w,2),可以看出,他腾出一个维度大小为2的空间,用来进行softmax。
rpn_cls_score_reshape是rpn_cls_score通过reshape层的结果,rpn_cls_prob_shape 则是rpn_cls_score_reshape通过softmax层后的结果。
我们知道 ,通过softmax层输出的就是分类问题的概率值。那么通过softmax层,shape大小不变,所以rpn_cls_prob_shape的shape仍然是(1,9h,w,2)。
后面我们再次通过一个reshape,将rpn_cls_prob_shape变为rpn_cls_prob,这样一看,rpn_cls_score和rpn_cls_prob的shape一样,都是(1,h,w,18)。
既然这条支路输入输出维度一样,为何还要经过两次reshape?
那是因为我们softmax用于二分类问题,必须要有个单独维度为2。
说了这么多支路信息,现在我们第一个损失的预测值知道了,我们还需要一个label,即真实值,就能做损失了。
回到上面,第一个损失的预测值为 rpn_cls_score,他的真实值为rpn_label。我们知道了rpn_cls_score,现在我们要找的就是rpn_label。我们仍然在vgg16.py中找到了他的身影,rpn_label藏在函数build_proposals中。
rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor") #,对所有锚与gt的关系从原图映射到RPN层,
# rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1
我们进入函数_anchor_target_layer中,他是这样定义的:
def _anchor_target_layer(self, rpn_cls_score, name): #name="anchor",对所有锚与gt的关系从原图映射到RPN层
with tf.variable_scope(name): #所有参数以"anchor"开头
rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
anchor_target_layer,
[rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32, tf.float32, tf.float32]) #
"""参数有:分类得分:
gt框数据
图片信息
16
锚
锚的数量
"""
"""返回结果的shape
rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1
rpn_bbox_targets:shape=(1,h,w,36),每个anchor(此类anchor在图片内部)到最近gt的变化(dx,dy,dw,dh)
rpn_bbox_inside_weights:shape=(1,h,w,36),每个fg的权重为(1.0,1.0,1.0,1.0)
rpn_bbox_outside_weights:shape=(1,h,w,36),每个前景背景权重为(1/256,1/256,1/256,1/256)
"""
rpn_labels.set_shape([1, 1, None, None])
rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
self._anchor_targets['rpn_labels'] = rpn_labels
self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets
self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights
self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights
self._score_summaries.update(self._anchor_targets)
return rpn_labels
我们在这里留个疑惑,大家注意到了为啥rpn_label需要输入rpn_cls_score呢?我们平时做图片分类的时候,label是人为定义的,和网络输出有什么关系呢?这里留个悬念,后面会有说明。
我们可以看到rpn_label又是函数anchor_target_layer中得到的输出,这里可以看出,得到rpn_label不仅仅需要上面的rpn_cls_score,还需要最重要的gt_boxes,这是图片中给出的信息,我们的思路应该是从gt_boxes中获取rpn_labels,而不是rpn_cls_score。
接着我们进入anchor_target_layer函数。
发现原来输入的rpn_cls_score只是为了提取特征图的h和w。
# map of shape (..., H, W)
height, width = rpn_cls_score.shape[1:3]
anchor_target_laye函数有点长,不过我做了些标注,便于大家理解。
def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, all_anchors, num_anchors):
"""Same as the anchor target layer in original Fast/er RCNN """
"""参数有: 分类得分:shape=[batch size,w,h,18]
gt框数据:shape=[None, 5]),一个gt_boxes, shape=(一个图片框的个数,四个坐标加分类)
图片信息:w,h,scale
16
锚 :shape=[batch size*w*h*9,4]
锚的数量
"""
A = num_anchors #9
total_anchors = all_anchors.shape[0] #batch size*h*w*9
K = total_anchors / num_anchors #batch size*h*w
im_info = im_info[0]
# allow boxes to sit over the edge by a small amount
_allowed_border = 0
# map of shape (..., H, W)
height, width = rpn_cls_score.shape[1:3]
# only keep anchors inside the image
inds_inside = np.where(
(all_anchors[:, 0] >= -_allowed_border) &
(all_anchors[:, 1] >= -_allowed_border) &
(all_anchors[:, 2] # width
(all_anchors[:, 3] # height
)[0]
# keep only inside anchors
anchors = all_anchors[inds_inside, :]
# label: 1 is positive, 0 is negative, -1 is dont care
labels = np.empty((len(inds_inside),), dtype=np.float32) #成为长度为len(inds_inside)的一维向量,全部填充为-1,后面为全部anchor的bp,fg分类
labels.fill(-1)
# overlaps between the anchors and the gt boxes
# overlaps (ex, gt)
#ascontiguousarray函数将一个内存不连续存储的数组转换为内存连续存储的数组,使得运行速度更快
overlaps = bbox_overlaps(
np.ascontiguousarray(anchors, dtype=np.float),
np.ascontiguousarray(gt_boxes, dtype=np.float)) #overlaps的shape为[num(anchor),num(gt)]
argmax_overlaps = overlaps.argmax(axis=1) #求每个anchor重叠面积最大的gt的序号
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #获取每个anchor和gt之间重叠的最大重叠面积
gt_argmax_overlaps = overlaps.argmax(axis=0) #求每个gt最大重叠面积的anchor序号
gt_max_overlaps = overlaps[gt_argmax_overlaps,
np.arange(overlaps.shape[1])] #获取每个gt最大重叠面积anchor的重叠面积
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] #获取与每个gt最大重叠面积的anchor序号的行
if not cfg.FLAGS.rpn_clobber_positives: #cfg.FLAGS.rpn_clobber_positives=False
# assign bg labels first so that positive labels can clobber them
# first set the negatives
labels[max_overlaps #获取iou<0.3的为bg
# fg label: for each gt, anchor with highest overlap
labels[gt_argmax_overlaps] = 1
# fg label: above threshold IOU
labels[max_overlaps >= cfg.FLAGS.rpn_positive_overlap] = 1
if cfg.FLAGS.rpn_clobber_positives: # cfg.FLAGS.rpn_clobber_positives=False
# assign bg labels last so that negative labels can clobber positives
labels[max_overlaps
# subsample positive labels if we have too many
num_fg = int(cfg.FLAGS.rpn_fg_fraction * cfg.FLAGS.rpn_batchsize) #0.5*256=128
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg: #大于则进行降采样
disable_inds = npr.choice(
fg_inds, size=(len(fg_inds) - num_fg), replace=False) #从fg_inds中选择128个fg,其余的都设为-1
labels[disable_inds] = -1
# subsample negative labels if we have too many
num_bg = cfg.FLAGS.rpn_batchsize - np.sum(labels == 1) #256-128=128
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
disable_inds = npr.choice(
bg_inds, size=(len(bg_inds) - num_bg), replace=False)
labels[disable_inds] = -1 #从bg_inds中选择128个bg,其余的都设为-1
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) ##返回每个anchor到其最近(重叠面积最大)的gt的转化(dx,dy,dw,dh)
bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
# only the positive ones have regression targets,FLAGS2["bbox_inside_weights"] = (1.0, 1.0, 1.0, 1.0),只有正样本才有bbox_inside_weights,四个1.0
bbox_inside_weights[labels == 1, :] = np.array(cfg.FLAGS2["bbox_inside_weights"])
bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
if cfg.FLAGS.rpn_positive_weight #cfg.FLAGS.rpn_positive_weight=-1
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0) #正负样本总和
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
assert ((cfg.FLAGS.rpn_positive_weight > 0) &
(cfg.FLAGS.rpn_positive_weight positive_weights = (cfg.FLAGS.rpn_positive_weight /
np.sum(labels == 1))
negative_weights = ((1.0 - cfg.FLAGS.rpn_positive_weight) /
np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights #1/256
bbox_outside_weights[labels == 0, :] = negative_weights #1/256
# map up to original set of anchors 全部映射到原始锚集,就是加上去掉的(越界图片的)
labels = _unmap(labels, total_anchors, inds_inside, fill=-1) #创建一个原始长度(total_anchors)长度的一维向量,并且将整个上述的labels添加上去
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
# labels
labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2) #shape从(batch size,h,w,9)到(batch size,9,h,w)
labels = labels.reshape((1, 1, A * height, width)) #shape在转为(1,1,9*h,w)
rpn_labels = labels
# bbox_targets
bbox_targets = bbox_targets \
.reshape((1, height, width, A * 4)) #shape=(1,h,w,36)
rpn_bbox_targets = bbox_targets
# bbox_inside_weights
bbox_inside_weights = bbox_inside_weights \
.reshape((1, height, width, A * 4)) #shape=(1,h,w,36)
rpn_bbox_inside_weights = bbox_inside_weights
# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights \
.reshape((1, height, width, A * 4)) #shape=(1,h,w,36)
rpn_bbox_outside_weights = bbox_outside_weights
return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
大家阅读代码后,发现这段代码做了一下几件事:
- 去除越界的anchor
- 将与gt boxes拥有最大重叠的anchor或者与gt boxes的IOU>0.7的anchor设为前景(fg)
- 将与gt boxes的 IOU<0.3的anchor设为背景(bg),其余全部设为-1(不是前景也不是背景)
- 进行降采样,从选择出来的fg和bg中随机选择出128个fg和128个bg,其他设为-1
- 将上述的128个fg(1)和128个bg(0)放入长度为(1hw9,)的全部值为-1的列表,然后reshape成和rpn_cls_score一样的形状(1,h,w,9)
我们可以看出,rpn_cls_score是网络通过RPN层前向传播的结果,也就是预测值。rpn_label是人为设置的,将与gt boxes关系(重叠面积)最大和最小的anchor映射到形状(1,h,w,9)的张量。
最后计算rpn_cls_score(1,h,w,18)和rpn_label(1,h,w,9)的交叉熵(等等,你们没发现我忽略了一个小点呢吗)
我们跳回第一个loss的定义
rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2]) #将rpn层的分类得分reshape shape=[1,h,w,18]--->[h*w*9,2]
rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1]) #[-1]是一行,将rpn的标签转为行向量,rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1,shape--》[h*w*9,]
rpn_select = tf.where(tf.not_equal(rpn_label, -1)) #找到rpn_label不等于-1的位置,就是128个fg和128个bg
rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2]) #tf.gather(等待被取元素的张量,索引)tf.gather根据索引,从输入张量中依次取元素,构成一个新的张量,找到前景所在的score
rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1]) #挑选出前景的标记
#rpn_cls_score shape-->[256,2]
#rpn_label-->[256,]
rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label)) #根据分类得分和rpn的标记计算rpn的交叉验证熵
这里有个
rpn_select = tf.where(tf.not_equal(rpn_label, -1)) #找到rpn_label不等于-1的位置,就是128个fg和128个bg
这是找到rpn_label中前景fg和背景bg的索引,根据索引在rpn_cls_score和rpn_label 找到256个对应位置,然后求交叉熵(对吧,要找到256个前景背景后再求交叉熵,这里就没错了)!
说了这么多,第一个损失是不是看懂了?如果看懂了,后面其他损失就很easy了。
3.2 bbox的第一次修正损失
接着我们来看第二个loss:
rpn_bbox_pred = self._predictions['rpn_bbox_pred'] #[1,h,w,36]
rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets'] #(1,h,w,36)
rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']# rpn_bbox_inside_weights:shape=(1,h,w,36),每个fg的权重为(1.0,1.0,1.0,1.0)
rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']# rpn_bbox_outside_weights:shape=(1,h,w,36),每个前景背景权重为(1/256,1/256,1/256,1/256)
rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
第二个损失的计算需要四个输入,分别为
- rpn_bbox_pred
- rpn_bbox_targets
- rpn_bbox_inside_weight
- rpn_bbox_outside_weights。
我们还是在计算loss的时候,还是分一下label(真实值)logit(预测值),很显然,这里的rpn_bbox_pred是预测值的,是logit,而真实值label为 rpn_bbox_targets,另外两个应该就是来计算loss的权重了,后面遇到了定义再进行详细说明。
现在我们去看看rpn_bbox_pred和rpn_bbox_targets的定义。
老样子,我们进入vgg16.py文件中,找到了
rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape = self.build_rpn(net, is_training, initializer)
所以是这样子的,RPN层不是有两个支路嘛,上面一个是预测前景背景的,下面一个就是对anchor进行第一次框回归,那么特征图通过1x1卷积,输出维度为(1,h,w,36)的就是rpn_bbox_pred。如图3所示。
我们的anchor有hw9个,rpn_bbox_pred的维度为(1,h,w,36),所以rpn_bbox_pred包含了每个anchor的坐标变换(dx,dy,dw,dh),如果不懂,不妨温故下坐标变换(dx,dy,dw,dh)的知识,链接如下:
- https://zhuanlan.zhihu.com/p/31426458
接着我们来看rpn_bbox_target,按理来说,这个是人为设定的label值,我们进入vgg16.py的父类network.py中,找到函数_anchor_target_layer,他是这么定义的:
def _anchor_target_layer(self, rpn_cls_score, name): #name="anchor",对所有锚与gt的关系从原图映射到RPN层
with tf.variable_scope(name): #所有参数以"anchor"开头
rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
anchor_target_layer,
[rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32, tf.float32, tf.float32]) #
"""参数有:分类得分:
gt框数据
图片信息
16
锚
锚的数量
"""
"""返回结果的shape
rpn_labels:shape=(1,1,9*h,w),128个fg(1),128个bg(0),其余都是-1
rpn_bbox_targets:shape=(1,h,w,36),每个anchor(此类anchor在图片内部)到最近gt的变化(dx,dy,dw,dh)
rpn_bbox_inside_weights:shape=(1,h,w,36),每个fg的权重为(1.0,1.0,1.0,1.0)
rpn_bbox_outside_weights:shape=(1,h,w,36),每个前景背景权重为(1/256,1/256,1/256,1/256)
"""
rpn_labels.set_shape([1, 1, None, None])
rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
self._anchor_targets['rpn_labels'] = rpn_labels
self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets
self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights
self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights
self._score_summaries.update(self._anchor_targets)
return rpn_labels
虽然这个函数的返回值与rpn_bbox_target无关,但是他将rpn_bbox_target以字典表的形式进行了存储,好像也发现
- rpn_bbox_inside_weights
- rpn_bbox_outside_weights
其中产生
- rpn_bbox_target
- rpn_bbox_inside_weights
- rpn_bbox_outside_weights
代码如下:
rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
anchor_target_layer,
[rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32, tf.float32, tf.float32]) #
我们进入函数anchor_target_layer中,妈耶,好熟悉的过程呀(不是我们上面求第一个loss的时候也进来过嘛)
此处省去函数anchor_target_layer一万行代码
我们提取anchor_target_layer中关于生成rpn_bbox_target的代码。
overlaps = bbox_overlaps(
np.ascontiguousarray(anchors, dtype=np.float),
np.ascontiguousarray(gt_boxes, dtype=np.float)) #overlaps的shape为[num(anchor),num(gt)]
argmax_overlaps = overlaps.argmax(axis=1) #求每个anchor重叠面积最大的gt的序号
以上代码中的overlaps求所有的anchor(共hw9个,这里的h,w是特征图的高和宽)与gt_boxes的重叠IOU。然后嘞,argmax_overlaps是求overlaps每行中最大的索引,这句话是什么意思呢?
每行的数据是某个特定的anchor与一张图片所有gt_bboxes的IOU,求最大索引,就是找每个anchor最近的gt_bboxes的标号。
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) ##返回每个anchor到其最近(重叠面积最大)的gt的转化(dx,dy,dw,dh)
这里就是计算每个anchor转换到该anchor最近的gt_bboxes的**(dx,dy,dw,dh)值**。
哦,恍然大悟。
接着我们来看产生rpn_bbox_inside_weights 和rpn_bbox_outside_weights两个权重的代码。
bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
# only the positive ones have regression targets,FLAGS2["bbox_inside_weights"] = (1.0, 1.0, 1.0, 1.0),只有正样本才有bbox_inside_weights,四个1.0
bbox_inside_weights[labels == 1, :] = np.array(cfg.FLAGS2["bbox_inside_weights"])
bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
if cfg.FLAGS.rpn_positive_weight #cfg.FLAGS.rpn_positive_weight=-1
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0) #正负样本总和
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
assert ((cfg.FLAGS.rpn_positive_weight > 0) &
(cfg.FLAGS.rpn_positive_weight positive_weights = (cfg.FLAGS.rpn_positive_weight /
np.sum(labels == 1))
negative_weights = ((1.0 - cfg.FLAGS.rpn_positive_weight) /
np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights #1/256
bbox_outside_weights[labels == 0, :] = negative_weights #1/256
# map up to original set of anchors 全部映射到原始锚集,就是加上去掉的(越界图片的)
labels = _unmap(labels, total_anchors, inds_inside, fill=-1) #创建一个原始长度(total_anchors)长度的一维向量,并且将整个上述的labels添加上去
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4)) #shape=(1,h,w,36)
rpn_bbox_inside_weights = bbox_inside_weights
# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4)) #shape=(1,h,w,36)
上面代码有点长,主要就两个意思:
rpn_bbox_inside_weights是一个(1,h,w,36)的数组,其中在所有anchor中,有128个positive的anchor,这128个positive的anchor对应位置权重设为1,因为是和坐标对应的,所以是(1.0,1.0,1.0,1.0),其余所有anchor 的权重设为(0.0,0.0,0.0,0.0)
rpn_bbox_outside_weights也是一个(1,h,w,36)的数组,其中在所有anchor中,有256个positive+negative的anchor,这256个anchor对应位置权重设为(1/256,1/256,1/256,1/256),其余所有anchor 的权重设为(0.0,0.0,0.0,0.0)
了解了
- rpn_bbox_target
- rpn_bbox_inside_weights
- rpn_bbox_outside_weights后
我们跳回原来定义的loss中,他是这样定义的:
rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
他将以上
- rpn_bbox_target
- rpn_bbox_inside_weights
- rpn_bbox_outside_weights
作为smooth_L1损失的输入。我们回顾下smooth_L1损失。
接着我们根据代码来看下具体loss定义:
def _smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0, dim=[1]):#定义 smooth l1 loss
sigma_2 = sigma ** 2 #1.0
box_diff = bbox_pred - bbox_targets #预测的框-目标框
in_box_diff = bbox_inside_weights * box_diff # 只看fg的box_diff,其余为0,shape=( 1,h,w,36)
abs_in_box_diff = tf.abs(in_box_diff) #绝对误差,shape还是( 1,h,w,36),只不过每四个是(dx-dx_tar,dy-dy_tar,dw-dw_tar,dh-dh_tar)
smoothL1_sign = tf.stop_gradient(tf.to_float(tf.less(abs_in_box_diff, 1. / sigma_2))) #tf.less将前后张量内元素比较,返回bool型的张量,
# 后面大为True,stop_gradients也是一个list,list中的元素是tensorflow graph中的op,一旦进入这个list,将不会被计算梯度,更重要的是,在该op之后的BP计算都不会运行
#所以这个smoothL1_sign是一个bool列表,True表示该位置处元素小于1.0
#smoothL1函数定义,小于1的化0.5x**2,大于1就是|x|-1
in_loss_box = tf.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign + (abs_in_box_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)
out_loss_box = bbox_outside_weights * in_loss_box #对于所有fg和bg(共256)乘1/256
loss_box = tf.reduce_mean(tf.reduce_sum(
out_loss_box,
axis=dim
)) #求和取平均
return loss_box
这段代码中的
smoothL1_sign = tf.stop_gradient(tf.to_float(tf.less(abs_in_box_diff, 1. / sigma_2)))
smoothL1_sign是找到满足smoothL1判断条件的|x|<0.5中x的位置。
然后根据位置进行损失函数的定义:
in_loss_box = tf.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign + (abs_in_box_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)
后面再根据rpn_bbox_outside_weights给上面定义的损失定义权重
out_loss_box = bbox_outside_weights * in_loss_box #对于所有fg和bg(共256)乘1/256
后面再求和取平均即可!
至此,第二个loss的定义就结束了,后面还有两个loss和上面的两个loss基本是相同的定义,这里不再赘述。
既然已经将loss定义看懂了,那我觉得后面不管是tensorflow框架还是caffe框架,做的事情都是比较easy的了,具体的一些train_op操作就在train.py中看吧。
我们的FasterRCNN 就讲到这里了,其实包括数据处理,模型搭建,loss定义,train_op定义,FasterRCNN还是有很多细节之处比较精妙,我个人认为,熟读FasterRCNN代码,不仅可以掌握对FasterRCNN的理解,还能提高个人的代码水平。
当然了,初学者还是要自己仔仔细细地看一下代码,不能想着急于求成,我的文章只能作为一个参考和辅助作用,更多的需要读者自己进行领悟。