Tensorflow object detection API 源码阅读笔记：Fast r-cnn

最新推荐文章于 2022-12-14 23:56:24 发布

Wayne2019

最新推荐文章于 2022-12-14 23:56:24 发布

阅读量2.8k

点赞数 1

分类专栏： TensorFlow 文章标签： TensorFlow 计算机视觉目标检测深度学习 python

本文链接：https://blog.csdn.net/Wayne2019/article/details/78808147

版权

TensorFlow 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Update:
建议先看从编程实现角度学习Faster R-CNN，比较直观。这里由于源代码抽象程度较高，显得比较混乱。

Old:

之前看过检测api整体的代码架构，rpn部分和几个基本类，这次仔细看看Fast r-cnn部分，顺便把之前看过的东西串一串。

"""object_detection/meta_architectures/faster_rcnn_meta_arch.py"""
class FasterRCNNFeatureExtractor(object)类声明了_extract_proposal_features和_extract_box_classifier_features等抽象函数。

不过我们先看下restore_from_classification_checkpoint_fn函数，它返回的是variables_to_restore字典，看一下后面的restore_map函数就懂了，创建了模型中变量和checkpoint中变量名字的映射，且区分了分类的checkpoint（用于初始化）和检测的checkpoint。

class FasterRCNNMetaArch(model.DetectionModel)
注意到用了好多@property，是将方法当成属性来调用。在类的定义中使用@property修饰函数，可以让调用者写出简短的代码，同时保证对参数进行必要的检查。还可以定义只读属性，不定义setter方法就是一个只读属性。

"""这个max_num_proposals为什么这样设定？，暂时不太明白，后续研究"""
def max_num_proposals(self):
    Max number of proposals (to pad to) for each image in the input batch.
    At training time, this is set to be the `second_stage_batch_size` if hard
    example miner is not configured, else it is set to
    `first_stage_max_proposals`. At inference time, this is always set to
    `first_stage_max_proposals`.

"""这个函数调用了image_resizer_fn,把归一化等其他预处理都留给feature_extractor来做"""
def preprocess(self, inputs)

"""这个函数比较复杂，首先是一些特殊的处理需要注意：
    + Anchor pruning vs. clipping: following the recommendation of the Faster
    R-CNN paper, we prune anchors that venture outside the image window at
    training time and clip anchors to the image window at inference time.
    + Proposal padding: as described at the top of the file, proposals are
    padded to self._max_num_proposals and flattened so that proposals from all
    images within the input batch are arranged along the same batch dimension.
"""
def predict(self, preprocessed_inputs)
    返回的prediction_dict包含11项。
        1) rpn_box_predictor_features: A 4-D float32 tensor with shape
          [batch_size, height, width, depth] to be used for predicting proposal
          boxes and corresponding objectness scores.
          #sliding window得到的intermediate layer,directly fed to a box predictor。代码中也称作RPN feature map。
        2) rpn_features_to_crop: A 4-D float32 tensor with shape
          [batch_size, height, width, depth] representing image features to crop
          using the proposal boxes predicted by the RPN.  
          #其实就是前面特征提取器得到的feature map。就是截断处block3的activations，然后过一个卷积得到rpn_box_predictor_features。
        3) image_shape: a 1-D tensor of shape [4] representing the input
          image shape.
        6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors
          for the first stage RPN (in absolute coordinates).  Note that
          `num_anchors` can differ depending on whether the model is created in
          training or inference mode.
        #上述四个来自self._extract_rpn_feature_maps(preprocessed_inputs)，调用的是FasterRCNNResnetV1FeatureExtractor._extract_proposal_features。

        4) rpn_box_encodings:  3-D float tensor of shape
          [batch_size, num_anchors, self._box_coder.code_size] containing
          predicted boxes.
        5) rpn_objectness_predictions_with_background: 3-D float tensor of shape
          [batch_size, num_anchors, 2] containing class
          predictions (logits) for each of the anchors.  Note that this
          tensor *includes* background class predictions (at class index 0).
        #上述两个来自self._predict_rpn_proposals(rpn_box_predictor_features)，具体实现在box_predictor中，如class ConvolutionalBoxPredictor(BoxPredictor)，没什么特殊的，本质就是拿卷积拟合位置和类别。看起来结合之前看过的，rpn阶段的内容已经比较熟悉了（当然，一些外部参数如何引入到模型的部分还没看，应该在train脚本里）。注意：num_anchors_per_location=self._first_stage_anchor_generator.num_anchors_per_location())，是一个列表： [len(self._scales) * len(self._aspect_ratios)]，通常就是3*3=9，表示rpn_features_to_crop中的每个位置取9个anchor，列表元素个数是1表示只在一张特征图上取anchor（可参考ssd paper还是yolo2 paper在多张特征图上取anchor的情况，后面我们也会读ssd的代码）。

        (and if first_stage_only=False):
        7) refined_box_encodings: a 3-D tensor with shape
          [total_num_proposals, num_classes, 4] representing predicted
          (final) refined box encodings, where
          total_num_proposals=batch_size*self._max_num_proposals
        8) class_predictions_with_background: a 3-D tensor with shape
          [total_num_proposals, num_classes + 1] containing class
          predictions (logits) for each of the anchors, where
          total_num_proposals=batch_size*self._max_num_proposals.
          Note that this tensor *includes* background class predictions
          (at class index 0).
        9) num_proposals: An int32 tensor of shape [batch_size] representing the
          number of proposals generated by the RPN.  `num_proposals` allows us
          to keep track of which entries are to be treated as zero paddings and
          which are not since we always pad the number of proposals to be
          `self.max_num_proposals` for each image.
        10) proposal_boxes: A float32 tensor of shape
          [batch_size, self.max_num_proposals, 4] representing
          decoded proposal bounding boxes in absolute coordinates.
        11) mask_predictions: (optional) a 4-D tensor with shape
          [total_num_padded_proposals, num_classes, mask_height, mask_width]
          containing instance mask predictions.
        #结合paper和代码，第一阶段的思路已经比较顺畅了。第二阶段的结果都来自self._predict_second_stage函数。注意到它的输入是rpn_box_encodings, rpn_objectness_predictions_with_background, rpn_features_to_crop, anchors, image_shape。函数的逻辑比较简单，就是将rpn的结果进行一些处理，然后提取特征。下面看几个要点。

"""self._postprocess_rpn: decodes the raw RPN predictions, runs non-max suppression."""
    proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
        rpn_box_encodings, rpn_objectness_predictions_with_background,
        anchors, image_shape)

"""_compute_second_stage_input_feature_maps函数中实现ROI pooling，注意self._maxpool_kernel_size是在config中设置的，然后通过model_builder传递给模型. 和faster r-cnn paper中不一样，是先将feature map resize到固定大小，然后用固定大小的Kernel进行池化，而不是使用自适应的kernel大小。这一点在作者的论文中有描述。
"""
    flattened_proposal_feature_maps = (
        self._compute_second_stage_input_feature_maps(
            rpn_features_to_crop, proposal_boxes_normalized))

"""使用resnet的block4.
"""
    box_classifier_features = (
        self._feature_extractor.extract_box_classifier_features(
            flattened_proposal_feature_maps,
            scope=self.second_stage_feature_extractor_scope))

"""class MaskRCNNBoxPredictor(BoxPredictor),ValueError: if num_predictions_per_location is not 1，这个大概解释了为啥前面要把anchor维度merge到batch维度中去。The mask prediction head is based on the Mask RCNN paper with the following modifications: We replace the deconvolution layer with a bilinear resize and a convolution. 分类和定位，各用一个fc.可以看到比fast r-cnn paper中少了两个ROI pooling后的fc，可能因为原文使用的特征提取器是vgg，所以多了两个fc。
"""
    box_predictions = self._mask_rcnn_box_predictor.predict(
        box_classifier_features,
        num_predictions_per_location=1,
        scope=self.second_stage_box_predictor_scope)

"""最后，代码中提到了Mask r-cnn，等先读完paper再研究。"""
    if self._predict_keypoints:
      raise ValueError('Keypoint prediction is unimplemented.')

    if self._predict_instance_masks:
      with slim.arg_scope(self._conv_hyperparams):
        upsampled_features = tf.image.resize_bilinear(
            image_features,
            [self._mask_height, self._mask_width],
            align_corners=True)
        upsampled_features = slim.conv2d(
            upsampled_features,
            num_outputs=self._mask_prediction_conv_depth,
            kernel_size=[2, 2])
        mask_predictions = slim.conv2d(upsampled_features,
                                       num_outputs=self.num_classes,
                                       activation_fn=None,
                                       kernel_size=[3, 3])
        instance_masks = tf.expand_dims(tf.transpose(mask_predictions,
                                                     perm=[0, 3, 1, 2]),
                                        axis=1,
                                        name='MaskPredictor')
      predictions_dict[MASK_PREDICTIONS] = instance_masks

Wayne2019

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow object detection API 源码阅读笔记：Fast r-cnn

Update: 建议先看从编程实现角度学习Faster R-CNN，比较直观。这里由于源代码抽象程度较高，显得比较混乱。知乎文章中ProposalTargetCreator从RoIs选择一部分(比如128个)用以训练，本应该对应def _loss_box_classifier，但是实现不完全一致，又回到def _postprocess_rpn了，统一在Tensorflow object dete
复制链接

扫一扫