Tensorflow object detection API 源码阅读笔记:RPN

Update:
建议先看从编程实现角度学习Faster R-CNN,比较直观。这里由于源代码抽象程度较高,显得比较混乱。

  • faster_rcnn_meta_arch.py中这两个对应知乎文章中RPN包含的3*3和1*1卷积:
    rpn_box_predictor_features = slim.conv2d(rpn_features_to_crop
    self._first_stage_box_predictor=box_predictor.ConvolutionalBoxPredictor

  • 知乎文章中的AnchorTargetCreator按照IoU将20000多个候选的anchor选出256个anchor进行分类和回归位置(计算RPN loss),对应:
    target_assigner.batch_assign_targets;
    self._first_stage_sampler=sampler.BalancedPositiveNegativeSampler,作用在first_stage_minibatch_size;
    其中20000是RPN输入的feature map大小和anchor的种类决定的,256对应first_stage_minibatch_size(见protos/faster_rcnn.proto);
    总之就是在def _loss_rpn。

  • (proposal=2000)知乎文章中的ProposalCreator: 在RPN中,从上万个anchor中,按照概率选择一定数目(如12000/6000),并调整大小和位置,经过NMS,选出概率最大的2000/300个,生成RoIs,对应:
    def _postprocess_rpn
    first_stage_max_proposals=300

  • 知乎文章中ProposalTargetCreator从2000/300候选中选择一部分(比如128个)pooling出来用以训练Fast R-CNN,对应:
    不使用hard_example_miner
    _unpad_proposals_and_sample_box_classifier_batch
    second_stage_batch_size=64

Old:

'''RPN概况。注意,分析时很多术语直接采用了原始论文中的表述,和代码中不一样。
'''
FasterRCNNFeatureExtractor.extract_proposal_features实际调用的是
FasterRCNNResnetV1FeatureExtractor._extract_proposal_features
生成first stage RPN features作为RPN的输入。

class FasterRCNNMetaArch(model.DetectionModel)的_extract_rpn_feature_map:
调用上述特征提取器的_extract_proposal_features,并且返回
      rpn_box_predictor_features: A 4-D float32 tensor with shape
        [batch, height, width, depth] to be used for predicting proposal boxes
        and corresponding objectness scores.'''sliding window得到的intermediate layer'''
      rpn_features_to_crop: A 4-D float32 tensor with shape
        [batch, height, width, depth] representing image features to crop using
        the proposals boxes. '''其实就是前面特征提取器得到的feature map。'''
      anchors: A BoxList representing anchors (for the RPN) in
        absolute coordinates.

'''这里使用了grid_anchor_generator.GridAnchorGenerator,生成9个anchor boxes(3 different scales and 3 aspect ratios)。具体见下文分析。
'''
    anchors = self._first_stage_anchor_generator.generate(
        [(feature_map_shape[1], feature_map_shape[2])]

'''sliding window 作用于 conv feature map,得到intermediate layer. first_stage_box_predictor_kernel_size: Kernel size to use for the convolution op just prior to RPN box predictions.
'''
    with slim.arg_scope(self._first_stage_box_predictor_arg_scope):
      kernel_size = self._first_stage_box_predictor_kernel_size
      rpn_box_predictor_features = slim.conv2d(
          rpn_features_to_crop,
          self._first_stage_box_predictor_depth,
          kernel_size=[kernel_size, kernel_size],
          rate=self._first_stage_atrous_rate,
          activation_fn=tf.nn.relu6)

'''按照论文下面应该是intermediate layer进入cls和reg layer
'''
def _predict_rpn_proposals(self, rpn_box_predictor_features):
进入
self._first_stage_box_predictor.predict
self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor

'''Box predictors are classes that take a high level image feature map as input and produce two predictions, (1) a tensor encoding box locations, and (2) a tensor encoding classes for each box. 下文具体看class ConvolutionalBoxPredictor(BoxPredictor)
'''

'''再进一步是进loss了。这个predict函数可以同时返回两个阶段的一个prediction_dict,然后进loss。
'''
def predict(self, preprocessed_inputs)
def loss(self, prediction_dict, scope=None):

'''loss中调用了第一阶段loss的计算。下文细看。
'''
def _loss_rpn
'''anchor生成
object_detection/anchor_generators/grid_anchor_generator.py
'''
def _generate #通过父类core/anchor_generator.py的generate函数调用
    grid_height, grid_width = feature_map_shape_list[0]
    # Multidimensional analog of numpy.meshgrid
    scales_grid, aspect_ratios_grid = ops.meshgrid(self._scales,
                                                   self._aspect_ratios)
    scales_grid = tf.reshape(scales_grid, [-1])
    aspect_ratios_grid = tf.reshape(aspect_ratios_grid, [-1])
    return tile_anchors(grid_height,
                        grid_width,
                        scales_grid,
                        aspect_ratios_grid,
                        self._base_anchor_size,
                        self._anchor_stride,
                        self._anchor_offset)
'''去test脚本手算验证一下。
    base_anchor_size = [10, 10]#default=[256, 256]
    anchor_stride = [19, 19]#default=[16, 16]
    anchor_offset = [0, 0]
    scales = [0.5, 1.0, 2.0]
    aspect_ratios = [1.0]

    exp_anchor_corners = [[-2.5, -2.5, 2.5, 2.5], [-5., -5., 5., 5.],
                          [-10., -10., 10., 10.], [-2.5, 16.5, 2.5, 21.5],
                          [-5., 14., 5, 24], [-10., 9., 10, 29],
                          [16.5, -2.5, 21.5, 2.5], [14., -5., 24, 5],
                          [9., -10., 29, 10], [16.5, 16.5, 21.5, 21.5],
                          [14., 14., 24, 24], [9., 9., 29, 29]]
    feature_map_shape_list=[(2, 2)] #asks for anchors that correspond
        to an 2x2 layer
grid_height, grid_width = 2,2
scales_grid, aspect_ratios_grid 略,三种组合,导致整个feature map一共获得2*2*3=12个anchor
anchor的高和宽由scales, aspect_ratio和base_anchor_size决定,简单。
anchor的中心由range(grid),anchor_stride和anchor_offset决定,grid就是grid_height和grid_width。比如显然第一个是0,第二个是19。理解grid就是指在feature map上产生anchor的格点,然后就很简单。思考:base_anchor_size和anchor_stride等参数是怎么配置的?
anchor的中心怎么和sliding window的中心一致?答案是在输入的feature map上每一个格子都生成anchor:
    feature_map_shape = tf.shape(rpn_features_to_crop)
    anchors = self._first_stage_anchor_generator.generate(
        [(feature_map_shape[1], feature_map_shape[2])])
由此可以得到anchor的stride应该是1×16=16(因为feature map的每个格点对应原图的感受野是16*16)。符合预期。基本的想法就是feature map还原回去感受野很大,形状单一,所以在每个格点引入了k种不同大小和形状的anchor box,以便在原图上更好的框住物体。这种思想在yolo2, ssd等paper中有进一步改进和扩展。
'''      
def tile_anchors
"""生成locations and classes
object_detection/core/box_predictor.py. 
作用于sliding window得到的intermediate layer,输出是每个anchor的tensor encoding box locations和tensor encoding classes for each box.
可能会引入额外的卷积层。
另外位置学习也没有什么特别的,就是卷积:
        box_encodings = slim.conv2d(
            net, num_predictions_per_location * self._box_code_size,
            [self._kernel_size, self._kernel_size],
            scope='BoxEncodingPredictor')
num_predictions_per_location是anchor数。类别学习也类似:
        class_predictions_with_background = slim.conv2d(
            net, num_predictions_per_location * num_class_slots,
            [self._kernel_size, self._kernel_size], scope='ClassPredictor',
            biases_initializer=tf.constant_initializer(
                self._class_prediction_bias_init))
思考:这里学习到的box location是啥东西?其实就是predict函数调用_predict_rpn_proposals函数得到的'rpn_box_encodings'.
    self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor
后面被用来计算Loss。和anchor本身的location是啥关系?它实际是论文中的predicted box与anchor box之间坐标的差值。
"""
class ConvolutionalBoxPredictor(BoxPredictor)
"""cls, reg loss
object_detection/meta_architectures/faster_rcnn_meta_arch.py
这里直接拿rpn_box_encodings计算loss了。推测batch_reg_targets是anchor box与ground truth的差值。查target_assigner.batch_assign_targets代码:
  def assign(self, anchors, groundtruth_boxes, groundtruth_labels=None,
             **params):
      reg_targets = self._create_regression_targets(anchors,
                                                    groundtruth_boxes,
                                                    match)
_create_regression_targets
        matched_reg_targets = self._box_coder.encode(matched_gt_boxes,
                                                 matched_anchors)
box_coders/faster_rcnn_box_coder.py
    tx = (xcenter - xcenter_a) / wa
    ty = (ycenter - ycenter_a) / ha
    tw = tf.log(w / wa)
    th = tf.log(h / ha) 
顺藤摸瓜找到了,和论文中一致。                                                           
"""
def _loss_rpn
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, _) = target_assigner.batch_assign_targets(
           self._proposal_target_assigner, box_list.BoxList(anchors),
           groundtruth_boxlists, len(groundtruth_boxlists)*[None])
      batch_cls_targets = tf.squeeze(batch_cls_targets, axis=2)

      localization_losses = self._first_stage_localization_loss(
          rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices)

下面看loss计算公式具体是怎么实现的。

"""The ground-truth label is 1 if the anchor is positive, and is 0 if the anchor is negative. 
An anchor is labeled as positive if:
(a) the anchor is the one with highest IoU overlap with a ground-truth box
(b) the anchor has an IoU overlap with a ground-truth box higher than 0.7
Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth
boxes.
50%/50% ratio of positive/negative anchors in a minibatch.
"""
经过之前的分析,相应的代码应该是
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, _) = target_assigner.batch_assign_targets(
           self._proposal_target_assigner, box_list.BoxList(anchors),
           groundtruth_boxlists, len(groundtruth_boxlists)*[None])
这里调用的target_assigner对象是这样构建的:
    self._proposal_target_assigner = target_assigner.create_target_assigner(
        'FasterRCNN', 'proposal')
进入core/target_assigner.py中的create_target_assigner函数:
  elif reference == 'FasterRCNN' and stage == 'proposal':
    similarity_calc = sim_calc.IouSimilarity()
    matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.7,
                                           unmatched_threshold=0.3,
                                           force_match_for_each_row=True)
    box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
        scale_factors=[10.0, 10.0, 5.0, 5.0])
具体实现在:       
from object_detection.matchers import argmax_matcher
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值