Tensorflow object detection API 源码阅读笔记：RPN

本文链接：https://blog.csdn.net/Wayne2019/article/details/78966558

本文是关于Tensorflow object detection API中RPN（区域提议网络）的源码阅读笔记。首先，文章提到建议先从编程实现角度理解Faster R-CNN。在`faster_rcnn_meta_arch.py`文件中，RPN的3*3和1*1卷积被用于特征提取。接着，文章解释了`target_assigner.batch_assign_targets`和`sampler.BalancedPositiveNegativeSampler`如何根据IoU选取anchor进行分类和回归。在损失计算部分，即`_loss_rpn`函数，涉及到了RPN的训练样本数量。最后，文章讨论了`_postprocess_rpn`如何生成RoIs，并通过NMS选择最终的2000个提案，这些提案将用于后续的Fast R-CNN训练。

摘要由CSDN通过智能技术生成

Update:
建议先看从编程实现角度学习Faster R-CNN，比较直观。这里由于源代码抽象程度较高，显得比较混乱。

faster_rcnn_meta_arch.py中这两个对应知乎文章中RPN包含的3*3和1*1卷积：
rpn_box_predictor_features = slim.conv2d(rpn_features_to_crop
self._first_stage_box_predictor=box_predictor.ConvolutionalBoxPredictor
知乎文章中的AnchorTargetCreator按照IoU将20000多个候选的anchor选出256个anchor进行分类和回归位置（计算RPN loss），对应：
target_assigner.batch_assign_targets；
self._first_stage_sampler=sampler.BalancedPositiveNegativeSampler，作用在first_stage_minibatch_size；
其中20000是RPN输入的feature map大小和anchor的种类决定的，256对应first_stage_minibatch_size（见protos/faster_rcnn.proto）；
总之就是在def _loss_rpn。
（proposal=2000）知乎文章中的ProposalCreator：在RPN中，从上万个anchor中，按照概率选择一定数目（如12000/6000），并调整大小和位置，经过NMS，选出概率最大的2000/300个，生成RoIs，对应：
def _postprocess_rpn
first_stage_max_proposals=300
知乎文章中ProposalTargetCreator从2000/300候选中选择一部分(比如128个)pooling出来用以训练Fast R-CNN，对应：
不使用hard_example_miner
_unpad_proposals_and_sample_box_classifier_batch
second_stage_batch_size=64

Old:

'''RPN概况。注意，分析时很多术语直接采用了原始论文中的表述，和代码中不一样。
'''
FasterRCNNFeatureExtractor.extract_proposal_features实际调用的是
FasterRCNNResnetV1FeatureExtractor._extract_proposal_features
生成first stage RPN features作为RPN的输入。

class FasterRCNNMetaArch(model.DetectionModel)的_extract_rpn_feature_map:
调用上述特征提取器的_extract_proposal_features，并且返回
      rpn_box_predictor_features: A 4-D float32 tensor with shape
        [batch, height, width, depth] to be used for predicting proposal boxes
        and corresponding objectness scores.'''sliding window得到的intermediate layer'''
      rpn_features_to_crop: A 4-D float32 tensor with shape
        [batch, height, width, depth] representing image features to crop using
        the proposals boxes. '''其实就是前面特征提取器得到的feature map。'''