Tensorflow开源的object detection API中的源码解析（二）：faster_rcnn_meta_arch.py

最新推荐文章于 2024-07-04 19:56:16 发布

haixwang

最新推荐文章于 2024-07-04 19:56:16 发布

阅读量7.4k

点赞数 4

分类专栏： Deep/Machine Learning 文章标签： object-det 物体检测 FasterRCNN tensorflow 源码解析

本文链接：https://blog.csdn.net/HaixWang/article/details/78904242

版权

“`
“”“Faster R-CNN meta-architecture definition.

General tensorflow implementation of Faster R-CNN detection models.

这里允许两种模式：first_stage_only = True和first_stage_only = False。
在第一种设置中，所有面向用户的方法（例如：预测，后期处理，损失函数）都可以被看
作该模型只包含RPN的样子来使用，返回class agnostic proposals（这些可以被认
为是没有关联的类信息的近似检测）。
【注：就是说，这些方法仅仅与RPN“合作”也能使用，只是没有检测出来的类的信息，
也就是论文中的第一个阶段，与后面的FAST RCNN阶段分离】
在第二种设置中，会计算proposals。然后通过第二阶段的“box classifier”产生（
多类）检测结果。

FASTER R-CNN模型的实现必须定义一个新的FASTER R-CNN特征提取器并覆盖三个方法：
preprocess（预处理）；
_extract_proposal_features（模型的第一阶段——提取建议框的特征）；
_extract_box_classifier_features（模型的第二阶段——对框分类时提取特征）（可选的）
restore_fn方法可以被重写。后面有例子。
一些重要的注意事项：
+ 批处理约定：
1. 这里支持批量的推断和训练，同一批次内的所有图像应具有相同的分辨率；
2. 批量大小通过输入张量的shape来动态地决定（而不是在模型构造器中直接指定）；
3. 麻烦的是，由于非最大抑制，我们不能保证每个图像从第一阶段RPN（区域提案网络）
中得到的提案数量相同。出于这个原因，我们为一个批次中的每个图像的proposals给一个最大值。
4. self.max_num_proposals这个属性在 inference time的‘first_stage_max_proposals’
参数以及我们在训练期间通过box分类器对batch进行二次抽样时的second_stage_batch_size参
数中设置。
我们按照一个批次维度为批次内的所有图像安排proposals。例如，输入
的_extract_box_classifier_features的值是一个
[total_num_proposals，crop_height，crop_width，depth]形状的张量；
total_num_proposals是batch_size * self.max_num_proposals。
（并注意上面的每个注意事项，都是零填充。）

坐标的表示：
（请参阅model.DetectionModel中定义的API），在后期处理后的输出结果，总
是会进行归一化，但是anchor和proposal_boxes都被表示为绝对坐标。

TODO: Support TPU implementations and sigmoid loss.
“””
from abc import abstractmethod
from functools import partial
import tensorflow as tf

from object_detection.anchor_generators import grid_anchor_generator
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import box_predictor
from object_detection.core import losses
from object_detection.core import model
from object_detection.core import post_processing
from object_detection.core import standard_fields as fields
from object_detection.core import target_assigner
from object_detection.utils import ops
from object_detection.utils import shape_utils

slim = tf.contrib.slim

class FasterRCNNFeatureExtractor(object):
“”“定义Faster R-CNN的特征提取器”“”

def init(self,
is_training,
first_stage_features_stride,
reuse_weights=None,
weight_decay=0.0):
“”“构造器

Args:
  is_training: 布尔值，表示是否应该构造计算图的训练版本【注：】
  first_stage_features_stride: 提取的RPN特征图时的步长。
  reuse_weights: 是否重用变量。 默认为None。
  weight_decay: 特征提取器的权重衰减，默认值：0.0【调节模型复杂度对损失函数的影响，正则化时会用到】
"""
self._is_training = is_training
self._first_stage_features_stride = first_stage_features_stride
self._reuse_weights = reuse_weights
self._weight_decay = weight_decay

@abstractmethod
def preprocess(self, resized_inputs):
“”“特征提取器特定的预处理方法(裁剪图像).
抽象方法，在FasterRCNNInceptionResnetV2FeatureExtractor实现
因为FASTER R-CNN这个架构还与其他的经典卷积网络“合作”，故这里定义为了抽象方法
pass 不做任何事情，一般用做占位语句。”“”
pass

def extract_proposal_features(self, preprocessed_inputs, scope):
“”“提取用于第一阶段RPN的特征。
该方法负责从经过预处理后的图像中提取特征图。
region proposal network（RPN）使用这些特征来预测proposal。

Args:
  preprocessed_inputs: 一批图像
    一个[batch, height, width, channels] 浮点 tensor

  scope: A scope name.

Returns:
  rpn_feature_map: A tensor with shape [batch, height, width, depth]
"""
with tf.variable_scope(scope, values=[preprocessed_inputs]):
  # 调用的方法具体在FasterRCNNInceptionResnetV2FeatureExtractor中实现
  return self._extract_proposal_features(preprocessed_inputs, scope)

@abstractmethod
def _extract_proposal_features(self, preprocessed_inputs, scope):
pass

def extract_box_classifier_features(self, proposal_feature_maps, scope):
“”“提取用于第二阶段分类的特征。

"""
with tf.variable_scope(scope, values=[proposal_feature_maps]):
  return self._extract_box_classifier_features(proposal_feature_maps, scope)

@abstractmethod
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
pass

def restore_from_classification_checkpoint_fn(
self,
first_stage_feature_extractor_scope,
second_stage_feature_extractor_scope):
“”“返回从固定的checkpoint中读取到的各种变量，一个字典

Args:
  first_stage_feature_extractor_scope: 第一阶段的特征提取器的scope名字
  second_stage_feature_extractor_scope: 第二阶段的特征提取器的scope名字
Returns:
  A dict mapping variable names (to load from a checkpoint) to variables in
  the model graph.
"""
variables_to_restore = {}
for variable in tf.global_variables():
  for scope_name in [first_stage_feature_extractor_scope,
                     second_stage_feature_extractor_scope]:
    if variable.op.name.startswith(scope_name):
      var_name = variable.op.name.replace(scope_name + '/', '')
      variables_to_restore[var_name] = variable
return variables_to_restore

class FasterRCNNMetaArch(model.DetectionModel):
“”“定义Faster R-CNN 元架构.”“”

def init(self,
is_training,
num_classes,
image_resizer_fn,
feature_extractor,
first_stage_only,
first_stage_anchor_generator,
first_stage_atrous_rate,
first_stage_box_predictor_arg_scope,
first_stage_box_predictor_kernel_size,
first_stage_box_predictor_depth,
first_stage_minibatch_size,
first_stage_positive_balance_fraction,
first_stage_nms_score_threshold,
first_stage_nms_iou_threshold,
first_stage_max_proposals,
first_stage_localization_loss_weight,
first_stage_objectness_loss_weight,
initial_crop_size,
maxpool_kernel_size,
maxpool_stride,
second_stage_mask_rcnn_box_predictor,
second_stage_batch_size,
second_stage_balance_fraction,
second_stage_non_max_suppression_fn,
second_stage_score_conversion_fn,
second_stage_localization_loss_weight,
second_stage_classification_loss_weight,
hard_example_miner,
parallel_iterations=16):
“”“FasterRCNNMetaArch 构造器.

Args:
  image_resizer_fn: A callable for image resizing.  This callable always takes a rank-3
    image tensor (corresponding to a single image) and returns a rank-3 image tensor,
    possibly with new spatial dimensions.请参阅builders / image_resizer_builder.py。

feature_extractor：一个FasterRCNNFeatureExtractor对象。
first_stage_only：是否只构建区域提案网络(RPN）模型的第一階段。
first_stage_anchor_generator：一个anchor_generator.AnchorGenerator对象
注意目前只支持grid_anchor_generator.GridAnchorGenerator对象）
first_stage_atrous_rate： A single integer indicating the atrous rate for
the single convolution op which is applied to the rpn_features_to_crop
tensor to obtain a tensor to be used for box prediction. Some feature
extractors optionally allow for producing feature maps computed at
denser resolutions. The atrous rate is used to compensate for the
denser feature maps by using an effectively larger receptive field.
(This should typically be set to 1).
first_stage_box_predictor_arg_scope: 用于conv2d的Slim arg_scope，
RPN box檢測器的separable_conv2d和fully_connected操作。
first_stage_box_predictor_kernel_size: 在RPN框预测之前的用于卷积运算的卷积核大小。
first_stage_box_predictor_depth: 在RPN框预测之前的用于卷积运算的输出维度。
first_stage_minibatch_size: 用于计算区域建议网络的内容是前景还是背景和location loss
的“batch size”。这个“batch size”是指在图像批次中为给定的图像计算损失函数的锚点的数量，
由于FASTER R-CNN论文中的术语而只被称为“batch_size”。
first_stage_positive_balance_fraction: Fraction of positive examples
per image for the RPN. The recommended value for Faster RCNN is 0.5.
first_stage_nms_score_threshold: Score threshold for non max suppression
for the Region Proposal Network (RPN). This value is expected to be in
[0, 1] as it is applied directly after a softmax transformation. The
recommended value for Faster R-CNN is 0.
first_stage_nms_iou_threshold: 对于RPN预测出来的box应用非最大抑制的IOU值的阀值
（与得分最高的框的IOU超过摸个阀值都会被删除）
first_stage_max_proposals: 在区域提议网络（RPN）预测的方框上执行非最大抑制（NMS）后，
要保留的框的最大数量。
first_stage_localization_loss_weight: A float
first_stage_objectness_loss_weight: A float
initial_crop_size: A single integer indicating the output size
(width and height are set to be the same) of the initial bilinear
interpolation based cropping during ROI pooling.
maxpool_kernel_size: A single integer indicating the kernel size of the
max pool op on the cropped feature map during ROI pooling.
maxpool_stride: A single integer indicating the stride of the max pool
op on the cropped feature map during ROI pooling.
second_stage_mask_rcnn_box_predictor: Mask R-CNN box predictor to use for
the second stage.
second_stage_batch_size: The batch size used for computing the
classification and refined location loss of the box classifier. This
“batch size” refers to the number of proposals selected as contributing
to the loss function for any given image within the image batch and is
only called “batch_size” due to terminology from the Faster R-CNN paper.
second_stage_balance_fraction: Fraction of positive examples to use
per image for the box classifier. The recommended value for Faster RCNN
is 0.25.
second_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression
callable that takes boxes, scores, optional clip_window and
optional (kwarg) mask inputs (with all other inputs already set)
and returns a dictionary containing tensors with keys:
detection_boxes, detection_scores, detection_classes,
num_detections, and (optionally) detection_masks. See
post_processing.batch_multiclass_non_max_suppression for the type and
shape of these tensors.
second_stage_score_conversion_fn: Callable elementwise nonlinearity
(that takes tensors as inputs and returns tensors). This is usually
used to convert logits to probabilities.
second_stage_localization_loss_weight: A float
second_stage_classification_loss_weight: A float
hard_example_miner: A losses.HardExampleMiner object (can be None).
parallel_iterations: (Optional) The number of iterations allowed to run
in parallel for calls to tf.map_fn.
Raises:
ValueError: 如果 second_stage_batch_size 大于 first_stage_max_proposals
ValueError: 如果 first_stage_anchor_generator 不是
grid_anchor_generator.GridAnchorGenerator这种类型
“””
# 函数的一个常见用法是在 init() 方法中确保父类被正确的初始化了
super(FasterRCNNMetaArch, self).init(num_classes=num_classes)

if second_stage_batch_size > first_stage_max_proposals:
  raise ValueError('second_stage_batch_size should be no greater than '
                   'first_stage_max_proposals.')
# 一个对象是一个类的一个实例还是它的一个子类的一个实例。
if not isinstance(first_stage_anchor_generator,
                  grid_anchor_generator.GridAnchorGenerator):
  raise ValueError('first_stage_anchor_generator must be of type '
                   'grid_anchor_generator.GridAnchorGenerator.')

self._is_training = is_training
self._image_resizer_fn = image_resizer_fn
self._feature_extractor = feature_extractor
self._first_stage_only = first_stage_only

# 第一个类被保留为背景。
unmatched_cls_target = tf.constant(
    [1] + self._num_classes * [0], dtype=tf.float32)
self._proposal_target_assigner = target_assigner.create_target_assigner(
    'FasterRCNN', 'proposal')
self._detector_target_assigner = target_assigner.create_target_assigner(
    'FasterRCNN', 'detection', unmatched_cls_target=unmatched_cls_target)
# proposal 和 detector target assigners 使用同一box coder
self._box_coder = self._proposal_target_assigner.box_coder

# (第一阶段) RPN参数
self._first_stage_anchor_generator = first_stage_anchor_generator
self._first_stage_atrous_rate = first_stage_atrous_rate
self._first_stage_box_predictor_arg_scope = (
    first_stage_box_predictor_arg_scope)
self._first_stage_box_predictor_kernel_size = (
    first_stage_box_predictor_kernel_size)
self._first_stage_box_predictor_depth = first