Faster RCNN源码解读3.3-_region_proposal() 筛选anchors-_proposal_target_layer()（核心和关键2）

最新推荐文章于 2021-06-15 11:50:37 发布

业余狙击手19

最新推荐文章于 2021-06-15 11:50:37 发布

阅读量1k

点赞数 1

分类专栏： # 目标检测算法

本文链接：https://blog.csdn.net/sxlsxl119/article/details/101769705

版权

目标检测算法专栏收录该内容

28 篇文章 17 订阅

订阅专栏

Faster RCNN源码解读2-_anchor_component()为图像建立anchors（核心和关键1）

Faster RCNN源码解读3.1-_region_proposal() 筛选anchors-_proposal_layer()（核心和关键2）

Faster RCNN源码解读3.2-_region_proposal()筛选anchors-_anchor_target_layer()（核心和关键2）

Faster RCNN源码解读3.3-_region_proposal() 筛选anchors-_proposal_target_layer()（核心和关键2）

Faster RCNN源码解读4-其他收尾工作：ROI_pooling、分类、回归等

Faster RCNN源码解读5-损失函数

理论介绍：有关Faster RCNN理论介绍的文章，可以自行搜索，这里就不多说理论部分了。

复现过程：代码配置过程没有记录，具体怎么把源码跑起来需要自己搜索一下。

faster rcnn源码确实挺复杂的，虽然一步步解析了，但是觉得还是没有领会其中的精髓，只能算是略知皮毛。在这里将代码解析的过程给大家分享一下，希望对大家有帮助。先是解析了代码的整体结构，然后对各个子结构进行了分析。代码中的注释，有的是原来就有的注释，有的是参考网上别人的，有的是自己理解的，里面或多或少会有些错误，如果发现，欢迎指正！

本文解析的源码地址：https://github.com/lijianaiml/tf-faster-rcnn-windows

RPN处的处理流程：

_region_proposal()函数依赖关系：

接上一篇，继续解析下面这个模块

3 _proposal_target_layer()

_proposal_target_layer调用proposal_target_layer，并进一步调用_sample_rois从之前 _proposal_layer中选出的2000个anchors筛选出256个archors。_sample_rois将正样本数量固定为最大64（小于时补负样本），并根据手抄图公式6-9对坐标归一化, 通过_get_bbox_regression_labels得到bbox_targets。用于rcnn的分类及回归。该层只在训练时使用；测试时，直接选择了300个anchors，不需要该层了。

  '''
  _proposal_target_layer调用proposal_target_layer，并进一步调用_sample_rois从之前
  _proposal_layer中选出的2000个anchors筛选出256个archors。_sample_rois将正样本数量
  固定为最大64（小于时补负样本），并根据手抄图公式6-9对坐标归一化，
  通过_get_bbox_regression_labels得到bbox_targets。用于rcnn的分类及回归。该层只在训
  练时使用；测试时，直接选择了300个anchors，不需要该层了。
  '''
  def _proposal_target_layer(self, rois, roi_scores, name):
    # post_nms_topN个anchor的位置及为1（正样本）的概率
    # 只在训练时使用该层，从post_nms_topN个anchors中选择256个anchors
    with tf.variable_scope(name) as scope:
      # rois:从post_num_topN个anchors中选择256个anchors(第一列的全0更新为每个anchors对应的类别)
      # roi_scores:256个anchors对应的正样本的概率
      # labels:正样本和负样本对应的真实的类别
      # bbox_targets:256*（4*21）的矩阵，只有为正样本时，对应类别的坐标才不为0，其他类别的坐标全为0
      # bbox_inside_weights:256*（4*21）的矩阵，正样本时，对应类别四个坐标的权重为1，其他全为0
      # bbox_outside_weights:256*（4*21）的矩阵，
      rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = tf.py_func(
        proposal_target_layer, # proposal_target_layer()在lib/layer_utils/proposal_target_layer.py中定义
        [rois, roi_scores, self._gt_boxes, self._num_classes],
        [tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32],
        name="proposal_target")

      rois.set_shape([cfg.TRAIN.BATCH_SIZE, 5])
      roi_scores.set_shape([cfg.TRAIN.BATCH_SIZE])
      labels.set_shape([cfg.TRAIN.BATCH_SIZE, 1])
      bbox_targets.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
      bbox_inside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
      bbox_outside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])

      self._proposal_targets['rois'] = rois
      self._proposal_targets['labels'] = tf.to_int32(labels, name="to_int32")
      self._proposal_targets['bbox_targets'] = bbox_targets
      self._proposal_targets['bbox_inside_weights'] = bbox_inside_weights
      self._proposal_targets['bbox_outside_weights'] = bbox_outside_weights

      self._score_summaries.update(self._proposal_targets)

      return rois, roi_scores

3.1 proposal_target_layer()

将产生的proposals与ground-truth进行运算，产生分类标签和回归坐标。

#rnp_rois 为post_nms_topN*5的矩阵
#rpn_scores为post_nms_topN的矩阵，代表对应的anchors为正样本的概率
def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
  """
  Assign object detection proposals to ground-truth targets. Produces proposal
  classification labels and bounding-box regression targets.
  将产生的proposals与ground-truth进行运算，产生分类标签和回归坐标
  """
  # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
  # (i.e., rpn.proposal_layer.ProposalLayer), or any other source
  all_rois = rpn_rois
  all_scores = rpn_scores

  # Include ground-truth boxes in the set of candidate rois
  # 在候选的rois中加入ground-truth boxes
  if cfg.TRAIN.USE_GT:
    zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
    all_rois = np.vstack((all_rois, np.hstack((zeros, gt_boxes[:, :-1]))))
    # not sure if it a wise appending, but anyway i am not using it
    # 不知道附加ground-truth boxes是不是一个明智的，但无论如何我没有使用它
    all_scores = np.vstack((all_scores, zeros))

  num_images = 1 #该程序一次只能处理一张图片
  rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images  #每张图片中最终选择的rois， cfg.TRAIN.BATCH_SIZE=256
  fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image) #正样本的个数:0.25*rois_per_image

  # labels:正样本和负样本对应的真实的类别
  # rois:从post_nms_topN个anchors中选择256个anchors（第一列的全0更新为每个anchors对应的类别），shape(256,5)
  # rois_scores:256个anchors对应的正样本的概率 ，shape(256,1)
  # bbox_targets:256*(4*21)的矩阵，只有为正样本时，对应类别的坐标才不为0，其他类别的坐标全为0，shape(256,4*21)
  # bbox_inside_weights：256*(4*21)的矩阵，正样本时，对应类别四个坐标的权重为1，其他全为0，shape(256,4*21)
  labels, rois, roi_scores, bbox_targets, bbox_inside_weights = _sample_rois(
    all_rois, all_scores, gt_boxes, fg_rois_per_image,
    rois_per_image, _num_classes) #选择256个anchors

  rois = rois.reshape(-1, 5)  # shape(256,5)
  roi_scores = roi_scores.reshape(-1)
  labels = labels.reshape(-1, 1)
  bbox_targets = bbox_targets.reshape(-1, _num_classes * 4)
  bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes * 4)
  bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)

  return rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights

3.1.1 _sample_rois()

从2000个roi中选择256个正负样本，用于Fast RCNN训练

#all_rois：第一列全0，后4列为坐标
#gt_boxes：gt_boxes前4列为坐标，最后一列为类别
def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
  """
  Generate a random sample of RoIs comprising foreground and background examples.
  生成包含前景和背景示例的RoI随机样本。
  """
  # overlaps: (rois x gt_boxes)
  # 计算anchors和gt_boxes重叠区域面积的比值
  overlaps = bbox_overlaps(
    np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),  #all_rois.shape(2000,5)
    np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))   #gt_boxes.shape(n,5),n为gt数量

  # 计算每一个anchor与哪个gt有最大重叠，即gt_assignment
  # 如上所述，需计算每一个anchor与gt的重叠率，如果有多个gt，则需要找出当前anchor与哪一个gt有最大重叠。
  # gt_assignment的值为gt的序号：如0、1...len(gt)-1
  gt_assignment = overlaps.argmax(axis=1)  #返回沿轴axis最大值的索引，#shape(2000,？)

  max_overlaps = overlaps.max(axis=1) #得到每个anchors对应的gt_boxes的重叠区域的值，#shape(2000,？)
  labels = gt_boxes[gt_assignment, 4] #得到每个anchors对应的gt_boxes的类别，#shape(2000,？)

  # Select foreground RoIs as those with >= FG_THRESH overlap
  # 每个anchors对应的gt_boxes的重叠区域的值大于阈值的作为正样本，得到正样本的索引
  fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
  # Guard against the case when an image has fewer than fg_rois_per_image 防止图像少于fg_rois_per_image的情况
  # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) 选择背景RoIs为[BG_THRESH_LO，BG_THRESH_HI）
  # 每个anchors对应的gt_boxes的重叠区域的值在给定阈值内作为负样本，得到负样本的索引
  bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                     (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]

  # Small modification to the original version where we ensure a fixed number of regions are sampled
  # 最终选择256个anchors
  if fg_inds.size > 0 and bg_inds.size > 0: #正负样本均存在，则选择最多fg_rois_per_image个正样本,不够的话，补充负样本
    fg_rois_per_image = min(fg_rois_per_image, fg_inds.size) # fg_rois_per_image=64
    fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_image), replace=False)
    bg_rois_per_image = rois_per_image - fg_rois_per_image   #负样本数量=256-正样本数量
    to_replace = bg_inds.size < bg_rois_per_image
    bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_image), replace=to_replace)
  elif fg_inds.size > 0: #只有正样本，选择rois_per_image个正样本
    to_replace = fg_inds.size < rois_per_image
    fg_inds = npr.choice(fg_inds, size=int(rois_per_image), replace=to_replace)
    fg_rois_per_image = rois_per_image
  elif bg_inds.size > 0: #只有负样本，选择rois_per_image个负样本
    to_replace = bg_inds.size < rois_per_image
    bg_inds = npr.choice(bg_inds, size=int(rois_per_image), replace=to_replace)
    fg_rois_per_image = 0
  else:
    import pdb
    pdb.set_trace()

  # The indices that we're selecting (both fg and bg)  我们选择的索引（fg和bg）
  keep_inds = np.append(fg_inds, bg_inds) #正样本和负样本的索引，共256个
  # Select sampled values from various arrays:
  labels = labels[keep_inds] #正样本和负样本对应的真实的类别
  # Clamp labels for the background RoIs to 0
  labels[int(fg_rois_per_image):] = 0 #负样本对应的类别设置为0
  rois = all_rois[keep_inds] #从post_nms_topN个anchors中选择256个anchors
  roi_scores = all_scores[keep_inds] #256个anchors对应的正样本的概率

  #通过256个anchors的坐标和每个anchors对应的gt_boxes的坐标及这些anchors的真实类别得到坐标偏移
  #（将rois第一列的全0更新为每个anchors对应的类别）
  bbox_target_data = _compute_targets(
    rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

  bbox_targets, bbox_inside_weights = \
    _get_bbox_regression_labels(bbox_target_data, num_classes)

  return labels, rois, roi_scores, bbox_targets, bbox_inside_weights

3.1.1.1 _compute_targets()

通过256个anchors的坐标和每个anchors对应的gt_boxes的坐标及这些anchors的真实类别得到坐标偏移，（将rois第一列的全0更新为每个anchors对应的类别）。

###
def _compute_targets(ex_rois, gt_rois, labels):
  """Compute bounding-box regression targets for an image."""

  assert ex_rois.shape[0] == gt_rois.shape[0]
  assert ex_rois.shape[1] == 4
  assert gt_rois.shape[1] == 4

  targets = bbox_transform(ex_rois, gt_rois) #通过公式2后4个，结合256个anchor和对应的正样本的坐标计算坐标的偏移
  if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
    # Optionally normalize targets by a precomputed mean and stdev
    targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
               / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS)) #坐标减去均值除以标准差，进行归一化
  return np.hstack(
    (labels[:, np.newaxis], targets)).astype(np.float32, copy=False) #之前的bbox的一列全0，此处第一列为对应的类别

3.1.1.2 bbox_transform()

通过自己写的那张纸上的公式（6-9）计算tx,ty,tw,th

3.1.1.3 _get_bbox_regression_labels()

####
def _get_bbox_regression_labels(bbox_target_data, num_classes):
  """
  Bounding-box regression targets (bbox_target_data) are stored in a
  compact form N x (class, tx, ty, tw, th)
  边界框回归目标（bbox_target_data）存储在紧凑形式N x（class，tx，ty，tw，th）

  This function expands those targets into the 4-of-4*K representation used
  by the network (i.e. only one class has non-zero targets).
  此功能将这些目标扩展为所用的4 * 4 * K表示形式通过网络（即，只有一个类别具有非零目标）。

  Returns: 返回值：
      bbox_target (ndarray): N x 4K blob of regression targets,N x 4K回归目标的blob
      bbox_inside_weights (ndarray): N x 4K blob of loss weights,N x 4K损失权重
  """

  clss = bbox_target_data[:, 0] #第1列，为类别
  bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) #256*(4*21)的矩阵
  bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
  inds = np.where(clss > 0)[0] #正样本的索引
  for ind in inds:
    cls = clss[ind] #正样本的类别
    start = int(4 * cls) #每个正样本的起始坐标
    end = start + 4 #每个正样本的终点坐标(由于坐标为4)
    bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] #对应的坐标偏移赋值给对应的类别
    bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS  #对应的权重(1.0,1.0,1.0,1.0)
  return bbox_targets, bbox_inside_weights