初窥Tensorflow Object Detection API 源码之（2.1.1）FasterRCNNMetaArch.predict

本文链接：https://blog.csdn.net/godlessspirit/article/details/79354684

本文深入探讨了Tensorflow Object Detection API中FasterRCNNMetaArch.predict函数的实现细节，包括RPN特征映射提取、卷积层应用、anchor计算、提案预测及无效锚点和预测的移除等关键步骤。通过对训练过程中的窗口裁剪和多阶段预测的分析，揭示了该API如何进行目标检测。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

_extract_rpn_feature_maps
_predict_rpn_proposals
- self_first_stage_box_predictorpredict
  - self_first_stage_box_predictor
clip_window
判断是否在训练
- _remove_invalid_anchors_and_predictions
self_predict_second_stage
self_predict_third_stage
返回prediction_dict

这个函数比较复杂
内部调用了N各私有函数，其私有函数又调用了M个其他函数，
总之，调用链很长

_extract_rpn_feature_maps

(rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist,
     image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)

取block3的输出

rpn_features_to_crop = self._feature_extractor.extract_proposal_features(preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)

新增卷积层

with slim.arg_scope(self._first_stage_box_predictor_arg_scope):
      kernel_size = self._first_stage_box_predictor_kernel_size
      rpn_box_predictor_features = slim.conv2d(
          rpn_features_to_crop,
          self._first_stage_box_predictor_depth,
          kernel_size=[kernel_size, kernel_size],
          rate=self._first_stage_atrous_rate,
          activation_fn=tf.nn.relu6)

这里又出现一个卷积层，好吧……
rpn_box_predictor_features是新卷积层的输出
block1->block2->block3->new conv2d———————–>rpn_box_predictor_features

anchors

_predict_rpn_proposals

(rpn_box_encodings, rpn_objectness_predictions_with_background
    ) = self._predict_rpn_proposals(rpn_box_predictor_features)

输入为新卷积层的输出

graph

self._first_stage_box_predictor.predict

box_predictions = self._first_stage_box_predictor.predict(
        [rpn_box_predictor_features],
        num_anchors_per_location,
        scope=self.first_stage_box_predictor_scope)

self._first_stage_box_predictor

__init__函数中：

self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor(
        self._is_training, num_classes=1,
        conv_hyperparams=self._first_stage_box_predictor_arg_scope,
        min_depth=0, max_depth=0, num_layers_before_predictor=0,
        use_dropout=False, dropout_keep_prob=1.0, kernel_size=1,
        box_code_size=self._box_coder.code_size)

box
BoxPredictor
返回（box_encodings，objectness_predictions_with_background）

clip_window

clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]]))

判断是否在训练

if self._is_training:
      (rpn_box_encodings, rpn_objectness_predictions_with_background,
       anchors_boxlist) = self._remove_invalid_anchors_and_predictions(
           rpn_box_encodings, rpn_objectness_predictions_with_background,
           anchors_boxlist, clip_window)
    else:
      anchors_boxlist = box_list_ops.clip_to_window(
          anchors_boxlist, clip_window)

_remove_invalid_anchors_and_predictions

计算anchors_boxlist与clip_window冲突项（尺寸越界）索引，然后筛选出所有相关的非冲突项

self._predict_second_stage

if self._number_of_stages >= 2:
      prediction_dict.update(self._predict_second_stage(
          rpn_box_encodings,
          rpn_objectness_predictions_with_background,
          rpn_features_to_crop,
          self._anchors.get(), image_shape, true_image_shapes))

判断_first_stage_only，
如果不是：以_predict_second_stage的输出更新prediction_dict

self._predict_third_stage

if self._number_of_stages == 3:
      prediction_dict = self._predict_third_stage(
          prediction_dict, true_image_shapes)