proposal_layer讲解

最新推荐文章于 2022-11-29 09:00:00 发布

到达起点

最新推荐文章于 2022-11-29 09:00:00 发布

阅读量902

点赞数

分类专栏：目标检测

本文链接：https://blog.csdn.net/m0_37663944/article/details/103728902

版权

目标检测专栏收录该内容

11 篇文章 2 订阅

订阅专栏

对上篇博客的**_proposal_layer**进行讲解
首先，上代码！O(∩_∩)O注意注释！

_proposal_layer
好吧其实它又调用了其它函数，这个函数是一个中间函数（加了一个判断），那我们只看if语句里面。有一个proposal_layer_tf

#好吧其实它又调用了，这个函数一个中间函数（加了一个判断），那我们只看if语句里面。有一个proposal_layer_tf
  def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
    with tf.variable_scope(name) as scope:
      if cfg.USE_E2E_TF:
        rois, rpn_scores = proposal_layer_tf(
          rpn_cls_prob,
          rpn_bbox_pred,
          self._im_info,
          self._mode,
          self._feat_stride,
          self._anchors,
          self._num_anchors
        )
      else:
        rois, rpn_scores = tf.py_func(proposal_layer,
                              [rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
                               self._feat_stride, self._anchors, self._num_anchors],
                              [tf.float32, tf.float32], name="proposal")

      rois.set_shape([None, 5])
      rpn_scores.set_shape([None, 1])

    return rois, rpn_scores

好吧，让我们看看proposal_layer_tf，注意看注释
整个函数最终得到是一堆box形式为[left, bottom, right, top]，以及对应的分数。
首先有必要对输入进行以下讲解：
rpn_cls_prob：shape=(1, 60, 40, 2*9)
rpn_bbox_pred：[ $\Delta x$ ， $\Delta y$ ， $\Delta w$ ， $\Delta h$ ]误差值，shape=(1, 60, 40, 36)
im_info：[3] , 图片长，宽，色道
_feat_stride：16，算是特征图与原图的比例，16倍
anchors：shape=(21600, 4)，不同的锚
num_anchors：9，一个特征图点位9个锚
cfg_key：config设置

def proposal_layer_tf(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
  if type(cfg_key) == bytes:
    cfg_key = cfg_key.decode('utf-8')
  pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N#nms参数
  post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N#nms参数
  nms_thresh = cfg[cfg_key].RPN_NMS_THRESH#nms参数

  # Get the scores and bounding boxes
  scores = rpn_cls_prob[:, :, :, num_anchors:]
  #rpn_cls_prob shape=(1, 60, 40, 18) 
  #对第四维度取后9个 shape=(1, 60, 40, 9)是否为目标
  scores = tf.reshape(scores, shape=(-1,))
  #shape=(1*60*40*9,1)
  rpn_bbox_pred = tf.reshape(rpn_bbox_pred, shape=(-1, 4))
  #shape=(1, 60, 40, 36)=>(1*60*40*9, 4)

  proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred)
  #修正ancors，得到候选框
  #见详细代码解析，在下面
  proposals = clip_boxes_tf(proposals, im_info[:2])
  #超出图片之外的anchors做处理
  #见详细代码解析，在下面
  indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)
  # Non-maximal suppression非极大值抑制，输出的索引号
  #我们建设没有做删除，依然是1*60*40*9
  #见详细代码解析，在下面
  boxes = tf.gather(proposals, indices)#根据索引号输出对应proposals shape=(1*60*40*9, 4)
  boxes = tf.to_float(boxes)#shape=(1*60*40*9, 4)
  scores = tf.gather(scores, indices)#取出对应的分数 shape=(1*60*40*9,)
  scores = tf.reshape(scores, shape=(-1, 1))#变成二维 shape=(1*60*40*9,1)

  # Only support single image as input
  batch_inds = tf.zeros((tf.shape(indices)[0], 1), dtype=tf.float32)#shape=(1*60*40*9,1)
  blob = tf.concat([batch_inds, boxes], 1)#拼接 shape=(1*60*40*9, 5),没错它直接加了一列0，至于为什么后面会讲到
#以上我们假设没有做删除
  return blob, scores

这里需要讲到3个函数
1.bbox_transform_inv_tf：修正ancors，得到候选框
2.clip_boxes_tf：#超出图片之外的anchors做处理
3.tf.image.non_max_suppression：Tensorflow自带的函数实现nms

详细代码解析

1.bbox_transform_inv_tf
首先该函数有以下输入
boxes：shape=(21600, 4)，不同的锚[left, bottom, right, top]
deltas：[ $\Delta x$ ， $\Delta y$ ， $\Delta w$ ， $\Delta h$ ]误差值，shape=(1, 60, 40, 36)
公示的计算方法还记得吗：
anchor box: 中心点位置坐标 $x_a$ , $y_a$ 和宽高 $w_a$ , $h_a$
ground truth(真实框):标定的框也对应一个中心点位置坐标 $x^*$ , $y^*$ 和宽高 $w^*$ , $h^*$
所以，偏移量：
$\Delta x=(x^*-x_a)/h_a$ ， $\Delta y=(y^*-y_a)/h_a$
$\Delta w=log(w^*/w_a)$ ， $\Delta h=log(h^*/h_a)$

现在我们不知道真实框的值，预测了 $\Delta$ 值：
$\Delta x=(x^*-x_a)/h_a$ ==> $x^*=\Delta x *h_a +x_a$
$\Delta y=(y^*-y_a)/h_a$ ==> $y^*=\Delta y*h_a +y_a$
$\Delta w=log(w^*/w_a)$ ==> $w^*=w_a*e^{\Delta w}$
$\Delta h=log(h^*/h_a)$ ==> $h^*=h_a*e^{\Delta h}$

def bbox_transform_inv_tf(boxes, deltas):
  boxes = tf.cast(boxes, deltas.dtype)
  widths = tf.subtract(boxes[:, 2], boxes[:, 0]) + 1.0
  #right-left
  heights = tf.subtract(boxes[:, 3], boxes[:, 1]) + 1.0
  #top-bottom
  ctr_x = tf.add(boxes[:, 0], widths * 0.5)#求中心坐标x
  ctr_y = tf.add(boxes[:, 1], heights * 0.5)#求中心坐标y

  dx = deltas[:, 0]#delta x
  dy = deltas[:, 1]#delta y
  dw = deltas[:, 2]#delta w
  dh = deltas[:, 3]#delta h

#根据上述推导公式，得出以下代码，求得真实框的中心点位置坐标和宽高
  pred_ctr_x = tf.add(tf.multiply(dx, widths), ctr_x)
  pred_ctr_y = tf.add(tf.multiply(dy, heights), ctr_y)
  pred_w = tf.multiply(tf.exp(dw), widths)
  pred_h = tf.multiply(tf.exp(dh), heights)
#重新转换为[left, bottom, right, top]
  pred_boxes0 = tf.subtract(pred_ctr_x, pred_w * 0.5)
  pred_boxes1 = tf.subtract(pred_ctr_y, pred_h * 0.5)
  pred_boxes2 = tf.add(pred_ctr_x, pred_w * 0.5)
  pred_boxes3 = tf.add(pred_ctr_y, pred_h * 0.5)
#返回修正之后的box
  return tf.stack([pred_boxes0, pred_boxes1, pred_boxes2, pred_boxes3], axis=1)

2.clip_boxes_tf
boxes：shape=(21600, 4)，不同的锚[left, bottom, right, top]
im_info：[3] , 图片长，宽，色道

#这段代码其实很简单，就是将超出原图的部分切掉
def clip_boxes_tf(boxes, im_info):
  b0 = tf.maximum(tf.minimum(boxes[:, 0], im_info[1] - 1), 0)
  b1 = tf.maximum(tf.minimum(boxes[:, 1], im_info[0] - 1), 0)
  b2 = tf.maximum(tf.minimum(boxes[:, 2], im_info[1] - 1), 0)
  b3 = tf.maximum(tf.minimum(boxes[:, 3], im_info[0] - 1), 0)
  return tf.stack([b0, b1, b2, b3], axis=1)

3.tf.image.non_max_suppression
这是Tensorflow自带的函数实现nms。这个函数是为了筛掉一些重复，无用的框（boxes）。首先我们吃个nms栗子：

设定IoU为0.7的阈值，即仅保留覆盖率不超过0.7的局部最大分数的box（粗筛）。最后留下大约2000个anchor，然后再取前N个box（比如300个）；这样，进入到下一层ROI Pooling时region proposal大约只有300个。

假设现有6个识别为人的框，每一个框有一个置信率。
现在需要消除多余的:
· 按置信率排序: 0.95, 0.9, 0.9, 0.8, 0.7, 0.7
· 取最大0.95的框为一个物体框
· 剩余5个框中，去掉与0.95框重叠率IoU大于0.6(可以另行设置)，则保留0.9, 0.8, 0.7三个框
· 重复上面的步骤，直到没有框了，0.9为一个框
· 选出来的为: 0.95, 0.9两个框

好了，栗子吃完。该仔细讲这个函数了。
tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)
第一个参数：输入的框
第二个参数：对应的置信度
第三个参数：最大输出框数
第四个参数：也就是栗子中的阀值

到达起点

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
proposal_layer讲解

对上篇博客的**_proposal_layer**进行讲解首先，上代码！O(∩_∩)O注意注释！_proposal_layer好吧其实它又调用了其它函数，这个函数是一个中间函数（加了一个判断），那我们只看if语句里面。有一个proposal_layer_tf#好吧其实它又调用了，这个函数一个中间函数（加了一个判断），那我们只看if语句里面。有一个proposal_layer_tf...
复制链接

扫一扫

专栏目录