Tensorflow object detection API 源码阅读笔记:Fast r-cnn

Update:
建议先看从编程实现角度学习Faster R-CNN,比较直观。这里由于源代码抽象程度较高,显得比较混乱。

Old:

之前看过检测api整体的代码架构,rpn部分和几个基本类,这次仔细看看Fast r-cnn部分,顺便把之前看过的东西串一串。

"""object_detection/meta_architectures/faster_rcnn_meta_arch.py"""
class FasterRCNNFeatureExtractor(object)类声明了_extract_proposal_features_extract_box_classifier_features等抽象函数。

不过我们先看下restore_from_classification_checkpoint_fn函数,它返回的是variables_to_restore字典,看一下后面的restore_map函数就懂了,创建了模型中变量和checkpoint中变量名字的映射,且区分了分类的checkpoint(用于初始化)和检测的checkpoint。
class FasterRCNNMetaArch(model.DetectionModel)
注意到用了好多@property,是将方法当成属性来调用。在类的定义中使用@property修饰函数,可以让调用者写出简短的代码,同时保证对参数进行必要的检查。还可以定义只读属性,不定义setter方法就是一个只读属性。

"""这个max_num_proposals为什么这样设定?,暂时不太明白,后续研究"""
def max_num_proposals(self):
    Max number of proposals (to pad to) for each image in the input batch.
    At training time, this is set to be the `second_stage_batch_size` if hard
    example miner is not configured, else it is set to
    `first_stage_max_proposals`. At inference time, this is always set to
    `first_stage_max_proposals`.

"""这个函数调用了image_resizer_fn,把归一化等其他预处理都留给feature_extractor来做"""
def preprocess(self, inputs)

"""这个函数比较复杂,首先是一些特殊的处理需要注意:
    + Anchor pruning vs. clipping: following the recommendation of the Faster
    R-CNN paper, we prune anchors that venture outside the image window at
    training time and clip anchors to the image window at inference time.
    + Proposal padding: as described at the top of the file, proposals are
    padded to self._max_num_proposals and flattened so that proposals from all
    images within the input batch are arranged along the same batch dimension.
"""
def predict(self, preprocessed_inputs)
    返回的prediction_dict包含11项。
        1) rpn_box_predictor_features: A 4-D float32 tensor with shape
          [batch_size, height, width, depth] to be used for predicting proposal
          boxes and corresponding objectness scores.
          #sliding window得到的intermediate layer,directly fed to a box predictor。代码中也称作RPN feature map。
        2) rpn_features_to_crop: A 4-D float32 tensor with shape
          [batch_size, height, width, depth] representing image features to crop
          using the proposal boxes predicted by the RPN.  
          #其实就是前面特征提取器得到的feature map。就是截断处block3的activations,然后过一个卷积得到rpn_box_predictor_features。
        3) image_shape: a 1-D tensor of shape [4] representing the input
          image shape.
        6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors
          for the first stage RPN (in absolute coordinates).  Note that
          `num_anchors` can differ depending on whether the model is created in
          training or inference mode.
        #上述四个来自self._extract_rpn_feature_maps(preprocessed_inputs),调用的是FasterRCNNResnetV1FeatureExtractor._extract_proposal_features。

        4) rpn_box_encodings:  3-D float tensor of shape
          [batch_size, num_anchors, self._box_coder.code_size] containing
          predicted boxes.
        5) rpn_objectness_predictions_with_background: 3-D float tensor of shape
          [batch_size, num_anchors, 2] containing class
          predictions (logits) for each of the anchors.  Note that this
          tensor *includes* background class predictions (at class index 0).
        #上述两个来自self._predict_rpn_proposals(rpn_box_predictor_features),具体实现在box_predictor中,如class ConvolutionalBoxPredictor(BoxPredictor),没什么特殊的,本质就是拿卷积拟合位置和类别。看起来结合之前看过的,rpn阶段的内容已经比较熟悉了(当然,一些外部参数如何引入到模型的部分还没看,应该在train脚本里)。注意:num_anchors_per_location=self._first_stage_anchor_generator.num_anchors_per_location()),是一个列表: [len(self._scales) * len(self._aspect_ratios)],通常就是3*3=9,表示rpn_features_to_crop中的每个位置取9个anchor,列表元素个数是1表示只在一张特征图上取anchor(可参考ssd paper还是yolo2 paper在多张特征图上取anchor的情况,后面我们也会读ssd的代码)。

        (and if first_stage_only=False):
        7) refined_box_encodings: a 3-D tensor with shape
          [total_num_proposals, num_classes, 4] representing predicted
          (final) refined box encodings, where
          total_num_proposals=batch_size*self._max_num_proposals
        8) class_predictions_with_background: a 3-D tensor with shape
          [total_num_proposals, num_classes + 1] containing class
          predictions (logits) for each of the anchors, where
          total_num_proposals=batch_size*self._max_num_proposals.
          Note that this tensor *includes* background class predictions
          (at class index 0).
        9) num_proposals: An int32 tensor of shape [batch_size] representing the
          number of proposals generated by the RPN.  `num_proposals` allows us
          to keep track of which entries are to be treated as zero paddings and
          which are not since we always pad the number of proposals to be
          `self.max_num_proposals` for each image.
        10) proposal_boxes: A float32 tensor of shape
          [batch_size, self.max_num_proposals, 4] representing
          decoded proposal bounding boxes in absolute coordinates.
        11) mask_predictions: (optional) a 4-D tensor with shape
          [total_num_padded_proposals, num_classes, mask_height, mask_width]
          containing instance mask predictions.
        #结合paper和代码,第一阶段的思路已经比较顺畅了。第二阶段的结果都来自self._predict_second_stage函数。注意到它的输入是rpn_box_encodings, rpn_objectness_predictions_with_background, rpn_features_to_crop, anchors, image_shape。函数的逻辑比较简单,就是将rpn的结果进行一些处理,然后提取特征。下面看几个要点。
"""self._postprocess_rpn: decodes the raw RPN predictions, runs non-max suppression."""
    proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
        rpn_box_encodings, rpn_objectness_predictions_with_background,
        anchors, image_shape)

"""_compute_second_stage_input_feature_maps函数中实现ROI pooling,注意self._maxpool_kernel_size是在config中设置的,然后通过model_builder传递给模型. 和faster r-cnn paper中不一样,是先将feature map resize到固定大小,然后用固定大小的Kernel进行池化,而不是使用自适应的kernel大小。这一点在作者的论文中有描述。
"""
    flattened_proposal_feature_maps = (
        self._compute_second_stage_input_feature_maps(
            rpn_features_to_crop, proposal_boxes_normalized))

"""使用resnet的block4.
"""
    box_classifier_features = (
        self._feature_extractor.extract_box_classifier_features(
            flattened_proposal_feature_maps,
            scope=self.second_stage_feature_extractor_scope))

"""class MaskRCNNBoxPredictor(BoxPredictor),ValueError: if num_predictions_per_location is not 1,这个大概解释了为啥前面要把anchor维度merge到batch维度中去。The mask prediction head is based on the Mask RCNN paper with the following modifications: We replace the deconvolution layer with a bilinear resize and a convolution. 分类和定位,各用一个fc.可以看到比fast r-cnn paper中少了两个ROI pooling后的fc,可能因为原文使用的特征提取器是vgg,所以多了两个fc。
"""
    box_predictions = self._mask_rcnn_box_predictor.predict(
        box_classifier_features,
        num_predictions_per_location=1,
        scope=self.second_stage_box_predictor_scope)
"""最后,代码中提到了Mask r-cnn,等先读完paper再研究。"""
    if self._predict_keypoints:
      raise ValueError('Keypoint prediction is unimplemented.')

    if self._predict_instance_masks:
      with slim.arg_scope(self._conv_hyperparams):
        upsampled_features = tf.image.resize_bilinear(
            image_features,
            [self._mask_height, self._mask_width],
            align_corners=True)
        upsampled_features = slim.conv2d(
            upsampled_features,
            num_outputs=self._mask_prediction_conv_depth,
            kernel_size=[2, 2])
        mask_predictions = slim.conv2d(upsampled_features,
                                       num_outputs=self.num_classes,
                                       activation_fn=None,
                                       kernel_size=[3, 3])
        instance_masks = tf.expand_dims(tf.transpose(mask_predictions,
                                                     perm=[0, 3, 1, 2]),
                                        axis=1,
                                        name='MaskPredictor')
      predictions_dict[MASK_PREDICTIONS] = instance_masks
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值