【MaskRCNN】源码系列四：train的RCNN

最新推荐文章于 2022-10-14 16:07:38 发布

mjiansun

最新推荐文章于 2022-10-14 16:07:38 发布

阅读量881

点赞数

分类专栏： Keras 论文笔记

本文链接：https://blog.csdn.net/u013066730/article/details/102664978

版权

论文笔记同时被 2 个专栏收录

87 篇文章 20 订阅

订阅专栏

Keras

43 篇文章 5 订阅

订阅专栏

DetectionTargetLayer

detection_targets_graph

最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。

        if mode == "training":
            # Class ID mask to mark class IDs supported by the dataset the image
            # came from.
            active_class_ids = KL.Lambda(
                lambda x: parse_image_meta_graph(x)["active_class_ids"]
                )(input_image_meta)

            if not config.USE_RPN_ROIS:
                # Ignore predicted ROIs and use ROIs provided as an input.
                input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
                                      name="input_roi", dtype=np.int32)
                # Normalize coordinates
                target_rois = KL.Lambda(lambda x: norm_boxes_graph(
                    x, K.shape(input_image)[1:3]))(input_rois)
            else:
                target_rois = rpn_rois

            # Generate detection targets
            # Subsamples proposals and generates target outputs for training
            # Note that proposal class IDs, gt_boxes, and gt_masks are zero
            # padded. Equally, returned rois and targets are zero padded.
            rois, target_class_ids, target_bbox, target_mask =\
                DetectionTargetLayer(config, name="proposal_targets")([
                    target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

active_class_ids：在链接中，可以看出所有的类别都被激活，其形状为（1，81）；

target_rois：在【MaskRCNN】源码系列三：train&test的RPN中，最终的输出就是rpn_rois，他是通过RPN选出来的前2000个框。

DetectionTargetLayer

class DetectionTargetLayer(KE.Layer):
    def __init__(self, config, **kwargs):
        super(DetectionTargetLayer, self).__init__(**kwargs)
        self.config = config

    def call(self, inputs):
        proposals = inputs[0]
        gt_class_ids = inputs[1]
        gt_boxes = inputs[2]
        gt_masks = inputs[3]

        # Slice the batch and run a graph for each slice
        # TODO: Rename target_bbox to target_deltas for clarity
        names = ["rois", "target_class_ids", "target_bbox", "target_mask"]
        outputs = utils.batch_slice(
            [proposals, gt_class_ids, gt_boxes, gt_masks],
            lambda w, x, y, z: detection_targets_graph(
                w, x, y, z, self.config),
            self.config.IMAGES_PER_GPU, names=names)
        return outputs

    def compute_output_shape(self, input_shape):
        return [
            (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # rois
            (None, self.config.TRAIN_ROIS_PER_IMAGE),  # class_ids
            (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # deltas
            (None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0],
             self.config.MASK_SHAPE[1])  # masks
        ]

    def compute_mask(self, inputs, mask=None):
        return [None, None, None, None]

输入参数

target_rois：在【MaskRCNN】源码系列三：train&test的RPN中，最终的输出就是rpn_rois，他是通过RPN选出来的前2000个框，形状为（？，2000，4）；

下面这3个变量都是待输入的变量，具体输入值可以查看【MaskRCNN】源码系列一：train数据处理三：

input_gt_class_ids：batch_gt_class_ids：形状为（1，100），这个100是指一张图片中最多出现实例的数量。

gt_boxes：batch_gt_boxes：形状为（1，100，4）

input_gt_masks：batch_gt_masks：形状为（1，56，56，100）

DetectionTargetLayer类主要就是call()函数中的detection_targets_graph函数。功能的实现主要靠detection_targets_graph。

detection_targets_graph

def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config):

下面的变量要注意一点，第一个维度没有了，也就是batchsize这个维度，主要是由于batch_slice的操作导致的。

proposals：就是上面的target_rois

gt_class_ids：就是上面的input_gt_class_ids

gt_boxes：就是上面的gt_boxes

gt_masks：就是上面的input_gt_masks

    # Assertions
    asserts = [
        tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals],
                  name="roi_assertion"),
    ]
    with tf.control_dependencies(asserts):
        proposals = tf.identity(proposals)

说实话，这段不知道干嘛的。。。

去0操作

    # Remove zero padding
    proposals, _ = trim_zeros_graph(proposals, name="trim_proposals")
    gt_boxes, non_zeros = trim_zeros_graph(gt_boxes, name="trim_gt_boxes")
    gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros,
                                   name="trim_gt_class_ids")
    gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2,
                         name="trim_gt_masks")

这段是删除之前为了固定形状补0的部分。得到的结果如下：

proposals：删除掉0的经过修正的前小于等于2000个预设框；

gt_boxes：一张图片中具体真实有多少个实例，（？，4）其中？小于等于100；

non_zeros：（100，）输入就是100，非0部分为True，0的部分为False；

gt_class_ids：（？）根据non_zeros，选取非0部分每一个实例的类别；

gt_masks：（56，56，？）根据non_zeros，选取非0部分每一个实例的mask。

trim_zeros_graph
def trim_zeros_graph(boxes, name='trim_zeros'):
    """Often boxes are represented with matrices of shape [N, 4] and
    are padded with zeros. This removes zero boxes.

    boxes: [N, 4] matrix of boxes.
    non_zeros: [N] a 1D boolean mask identifying the rows to keep
    """
    non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool)
    boxes = tf.boolean_mask(boxes, non_zeros, name=name)
    return boxes, non_zeros
有可能输入的形状为（？，4），表示坐标，取绝对值并在1这个维度进行求和，大于0的位置为True，等于0的位置为False，然后使用boolean_mask来选取为True位置的数据，得到的boxes形状为（？>新数量，4）。

处理特别拥挤的实例

    # Handle COCO crowds
    # A crowd box in COCO is a bounding box around several instances. Exclude
    # them from training. A crowd box is given a negative class ID.
    crowd_ix = tf.where(gt_class_ids < 0)[:, 0]
    non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0]
    crowd_boxes = tf.gather(gt_boxes, crowd_ix)
    gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix)
    gt_boxes = tf.gather(gt_boxes, non_crowd_ix)
    gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2)

gt_class_ids小于0就是特别拥挤的，大于0就是单张图片中每一个真实的实例类别。每个的形状为：

crowd_ix：（？，）其中？表示拥挤实例的数量

non_crowd_ix：（？，）其中？表示正常实例的数量

crowd_boxes：（？，4）其中？表示拥挤实例的数量

gt_class_ids：（？，）其中？表示正常实例的数量

gt_boxes：（？，4）其中？表示正常实例的数量

gt_masks：（56，56，？）其中？表示正常实例的数量

    # Compute overlaps with crowd boxes [proposals, crowd_boxes]
    crowd_overlaps = overlaps_graph(proposals, crowd_boxes)
    crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1)
    no_crowd_bool = (crowd_iou_max < 0.001)

计算上面得到的proposals与拥挤实例crowd_boxes框进行重叠率，在拥挤实例中找到与单个proposals重叠度最大的值，如果该值小于0.001，那说明这部分符合条件的proposals可以被保留。

计算overlaps

    # Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)

def overlaps_graph(boxes1, boxes2):
    """Computes IoU overlaps between two sets of boxes.
    boxes1, boxes2: [N, (y1, x1, y2, x2)].
    """
    # 1. Tile boxes2 and repeat boxes1. This allows us to compare
    # every boxes1 against every boxes2 without loops.
    # TF doesn't have an equivalent to np.repeat() so simulate it
    # using tf.tile() and tf.reshape.
    b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1),
                            [1, 1, tf.shape(boxes2)[0]]), [-1, 4]) # 这个形状是（N1*N2，4）这里表示N1中的第一个坐标被重复N2次，第二个坐标被重复N2次，一直到N1个坐标被重复N2次
    b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1]) # 这个形状是（N1*N2，4）这里表示N2个坐标被重复了N1次
    # 2. Compute intersections
    b1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1)
    b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1)
    y1 = tf.maximum(b1_y1, b2_y1)
    x1 = tf.maximum(b1_x1, b2_x1)
    y2 = tf.minimum(b1_y2, b2_y2)
    x2 = tf.minimum(b1_x2, b2_x2)
    intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0)
    # 3. Compute unions
    b1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1)
    b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1)
    union = b1_area + b2_area - intersection
    # 4. Compute IoU and reshape to [boxes1, boxes2]
    iou = intersection / union
    overlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]]) # roi_box与每个gt_box的得分
    return overlaps

上面这段就是计算得到的小于等于2000个前景框与该张图片中每一个非拥挤实例的框的坐标的重叠率。

输入参数

proposals：删除掉0的经过修正的前小于等于2000个预设框，假设数量为N1；

gt_boxes：非拥挤实例的框的坐标，假设数量为N2。

返回值

overlaps：返回的是每一个roi_box也就是proposals，与gt_boxes之间的得分，所以形状为（N1,N2）。

候选正负ROIS

    # Determine positive and negative ROIs
    roi_iou_max = tf.reduce_max(overlaps, axis=1)
    # 1. Positive ROIs are those with >= 0.5 IoU with a GT box
    positive_roi_bool = (roi_iou_max >= 0.5)
    positive_indices = tf.where(positive_roi_bool)[:, 0]
    # 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds.
    negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0]

在overlaps中找到每一个proposals与gt_boxes的最大重叠率，当最大重叠率大于0.5时，就认为是正样本。当小于0.5，并且不是拥挤样本时就是负样本。

positive_indices：候选正样本的索引值；

negative_indices：候选负样本的索引值。

抽选部分正负ROIS参与训练

    # Subsample ROIs. Aim for 33% positive
    # Positive ROIs
    positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
                         config.ROI_POSITIVE_RATIO) #200*0.33
    positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
    positive_count = tf.shape(positive_indices)[0]
    # Negative ROIs. Add enough to maintain positive:negative ratio.
    r = 1.0 / config.ROI_POSITIVE_RATIO
    negative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_count
    negative_indices = tf.random_shuffle(negative_indices)[:negative_count]
    # Gather selected ROIs
    positive_rois = tf.gather(proposals, positive_indices)
    negative_rois = tf.gather(proposals, negative_indices)

一共有200个参与后续训练，从上面候选正样本中选取200*0.33个数据作为后续训练，这里负样本的数量为negative_count=(1/0.33)*(200*0.33)-(200*0.33)。

确定参与训练ROIs所对应的标签

    # Assign positive ROIs to GT boxes.就是每一个正样本框对应的gt框的坐标以及类别是什么
    positive_overlaps = tf.gather(overlaps, positive_indices)
    roi_gt_box_assignment = tf.cond(
        tf.greater(tf.shape(positive_overlaps)[1], 0),
        true_fn = lambda: tf.argmax(positive_overlaps, axis=1),
        false_fn = lambda: tf.cast(tf.constant([]),tf.int64)
    )
    roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment)
    roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment)

根据训练正样本选取与gt_boxes的重叠率，然后判断这些训练正样本与哪个gt_box的重叠率最高，那就把这个gt_box的坐标和类别作为该训练正样本的标签。

计算训练正样本与gt_box之间的偏移值

    # Compute bbox refinement for positive ROIs
    deltas = utils.box_refinement_graph(positive_rois, roi_gt_boxes)
    deltas /= config.BBOX_STD_DEV

def box_refinement_graph(box, gt_box):
    """Compute refinement needed to transform box to gt_box.
    box and gt_box are [N, (y1, x1, y2, x2)]
    """
    box = tf.cast(box, tf.float32)
    gt_box = tf.cast(gt_box, tf.float32)

    height = box[:, 2] - box[:, 0]
    width = box[:, 3] - box[:, 1]
    center_y = box[:, 0] + 0.5 * height
    center_x = box[:, 1] + 0.5 * width

    gt_height = gt_box[:, 2] - gt_box[:, 0]
    gt_width = gt_box[:, 3] - gt_box[:, 1]
    gt_center_y = gt_box[:, 0] + 0.5 * gt_height
    gt_center_x = gt_box[:, 1] + 0.5 * gt_width

    dy = (gt_center_y - center_y) / height
    dx = (gt_center_x - center_x) / width
    dh = tf.log(gt_height / height)
    dw = tf.log(gt_width / width)

    result = tf.stack([dy, dx, dh, dw], axis=1)
    return result

计算训练正样本与gt_box之间的偏移值。形状为（？，4）。

为训练ROIS分配掩膜

    # Assign positive ROIs to GT masks
    # Permute masks to [N, height, width, 1]
    transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1)
    # Pick the right mask for each ROI
    roi_masks = tf.gather(transposed_masks, roi_gt_box_assignment)

gt_masks的形状为（56，56，？）。结果变换得到transposed_mask，其形状为（？，56，56，1）。

这样再为每一个训练ROIS分配标记的掩膜，得到roi_masks。

    # Compute mask targets
    boxes = positive_rois
    if config.USE_MINI_MASK:
        # Transform ROI coordinates from normalized image space
        # to normalized mini-mask space.
        y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1)
        gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1)
        gt_h = gt_y2 - gt_y1
        gt_w = gt_x2 - gt_x1
        y1 = (y1 - gt_y1) / gt_h
        x1 = (x1 - gt_x1) / gt_w
        y2 = (y2 - gt_y1) / gt_h
        x2 = (x2 - gt_x1) / gt_w
        boxes = tf.concat([y1, x1, y2, x2], 1)
    box_ids = tf.range(0, tf.shape(roi_masks)[0])
    masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes,
                                     box_ids,
                                     config.MASK_SHAPE)
    # Remove the extra dimension from masks.
    masks = tf.squeeze(masks, axis=3)

    # Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with
    # binary cross entropy loss.
    masks = tf.round(masks)

如上图所示，训练ROIS与target之间有差别，并不能完美重叠，那么就需要将他们的交集也就是阴影区域取出来，然后放缩成后续可以使用的28*28的尺寸，再经过squeeze的操作，将多余维度删除，得到masks，其形状为（？，28，28）。

其中如果对tf.image.crop_and_resize不清楚，可以参考https://blog.csdn.net/u013066730/article/details/100583484。

由于resize会出现非0或1值，但标签要求必须是0或1，所以使用round函数，大于等于0.5为1，小于0.5为0。

规范输出的形状

    # Append negative ROIs and pad bbox deltas and masks that
    # are not used for negative ROIs with zeros.
    rois = tf.concat([positive_rois, negative_rois], axis=0)
    N = tf.shape(negative_rois)[0]
    P = tf.maximum(config.TRAIN_ROIS_PER_IMAGE - tf.shape(rois)[0], 0)
    rois = tf.pad(rois, [(0, P), (0, 0)])
    roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)])
    roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)])
    deltas = tf.pad(deltas, [(0, N + P), (0, 0)])
    masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)])

    return rois, roi_gt_class_ids, deltas, masks

rois的包含正负样本，但我们规定是200个，通过上面的计算有可能是少于200个的，所以为了规范形状，不足200，就补0补到200。

roi_gt_boxes，roi_gt_class_ids，deltas，masks都是只有这样本才有这样的标签，所以填补0的时候，是P+N，最终的个数都是200个。

整个DetectionTargetLayer最终得到了如下几个变量：

rois：（1，200，4）这里的200包含有3部分，正样本、负样本和填补的0；

target_class_ids：（1，200）这里的200包含有2部分，正样本和填补的0；

target_bbox：（1，200，4）这里的200包含有2部分，正样本和填补的0；

target_mask：（1，200，28，28）这里的200包含有2部分，正样本和填补的0。

fpn_classifier_graph

回到build函数种

            # Network Heads
            # TODO: verify that this handles zero padded ROIs
            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
    # ROI Pooling
    # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)
    # Two 1024 FC layers (implemented with Conv2D for consistency)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
                           name="mrcnn_class_conv1")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
                           name="mrcnn_class_conv2")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
                       name="pool_squeeze")(x)

    # Classifier head
    mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
                                            name='mrcnn_class_logits')(shared)
    mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
                                     name="mrcnn_class")(mrcnn_class_logits)

    # BBox head
    # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
    x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
                           name='mrcnn_bbox_fc')(shared)
    # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
    s = K.int_shape(x)
    mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)

    return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

输入参数

rois：（1，200，4）这里的200包含有3部分，正样本、负样本和填补的0；

feature_maps：[P2, P3, P4, P5]，这里请参考【MaskRCNN】源码系列二：train&test特征图；

image_meta：这里就是记录了原始图片被放缩处理的信息，https://blog.csdn.net/u013066730/article/details/102501128；

pool_size：7；

num_classes：81；

train_bn：False；

fc_layers_size：1024。

下面先介绍这个网络种比较关键的几个内容，再介绍网络整体结构

PyramidROIAlign

class PyramidROIAlign(KE.Layer):
    def __init__(self, pool_shape, **kwargs):
        super(PyramidROIAlign, self).__init__(**kwargs)
        self.pool_shape = tuple(pool_shape)

下面这段代码是该层的使用方式

    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)

初始化pool_size=7，然后再输入参数rois，image_meta，feature_maps。

关注主要的call函数

    def call(self, inputs):
        # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords
        boxes = inputs[0]

        # Image meta
        # Holds details about the image. See compose_image_meta()
        image_meta = inputs[1]

        # Feature Maps. List of feature maps from different level of the
        # feature pyramid. Each is [batch, height, width, channels]
        feature_maps = inputs[2:]

        # Assign each ROI to a level in the pyramid based on the ROI area.
        y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)
        h = y2 - y1
        w = x2 - x1
        # Use shape of first image. Images in a batch must have the same size.
        image_shape = parse_image_meta_graph(image_meta)['image_shape'][0]
        # Equation 1 in the Feature Pyramid Networks paper. Account for
        # the fact that our coordinates are normalized here.
        # e.g. a 224x224 ROI (in pixels) maps to P4
        image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)
        roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
        roi_level = tf.minimum(5, tf.maximum(
            2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # shape is (1,?,1)
        roi_level = tf.squeeze(roi_level, 2)

        # Loop through levels and apply ROI pooling to each. P2 to P5.
        pooled = []
        box_to_level = []
        for i, level in enumerate(range(2, 6)):
        for i, level in enumerate(range(2, 6)):
            # ix的形状为（?,2），表示每个batch中有多少个与特征图图等级相符的roi的索引，?表示有多少个，2表示有2维度的值
            # 假设?=10，batch如果为3，那么这个ix的形状就为（10，2），在2列中第一列表示在哪个图片上，可以取0-2的值，第二列表示该图片上有几个符合特征图等级的roi的索引值。
            ix = tf.where(tf.equal(roi_level, level))
            # 选取符合条件的batch中的某张图片的某个roi，将这些都选出来
            level_boxes = tf.gather_nd(boxes, ix)

            # 这些roi都属于一个batch中的哪张图片
            # Box indices for crop_and_resize.
            box_indices = tf.cast(ix[:, 0], tf.int32)

            # Stop gradient propogation to ROI proposals
            level_boxes = tf.stop_gradient(level_boxes)
            box_indices = tf.stop_gradient(box_indices)

            # Crop and Resize
            # Result: [batch * num_boxes, pool_height, pool_width, channels]
            # 比如说batch为3，那么box_indices可能为[0，2，0，0，1，0，1，2，0，2，2，2，1]，那么0表示一个batch中的第0张图片
            # level_boxes的形状为（？，4），这个？与box_indices的长度相同，表示符合特征图level等级的一个batch中第n张图片中的一个roi框
            # feature_map[i]表示对应的level的特征图
            pooled.append(tf.image.crop_and_resize(
                feature_maps[i], level_boxes, box_indices, self.pool_shape,
                method="bilinear"))

        # Pack pooled features into one tensor ，shape is (?,7,7,256)
        pooled = tf.concat(pooled, axis=0)

        # Pack box_to_level mapping into one array and add another
        # column representing the order of pooled boxes
        # 原来是box_to_level可能为[[0,2],[0,13],[0,21],[1,5],[2,8],[2,10]...]，经过这下面的操作变为[[0,2,0],[0,13,1],[0,21,2],[1,5,3],[2,8,4],[2,10,5]...]
        box_to_level = tf.concat(box_to_level, axis=0) # 4个（？，2）进行组合
        box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)
        box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],
                                 axis=1)

        # Rearrange pooled features to match the order of the original boxes
        # Sort box_to_level by batch then box index
        # TF doesn't have a way to sort by two columns, so merge them and sort.
        # sorting_tensor就是batch中的每个图片的索引拉开差距，第0张图片那就是自己，第1张图片需要+100000，第二张图片需要+200000，以此类推
        sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]
        # 从小到大排列，这样就有了按照batch中每个图片的顺序先排序，再按照第n个box的顺序拍列
        ix = tf.nn.top_k(sorting_tensor, k=tf.shape(
            box_to_level)[0]).indices[::-1]
        # 先将box_to_level按照该顺序排列
        ix = tf.gather(box_to_level[:, 2], ix)
        # 再将pooled按照box_to_level的顺序排列
        pooled = tf.gather(pooled, ix)

        # Re-add the batch dimension
        # 这个shape就是boxes一个batch中有几张图片，每张图片有200个roi，后面的形状就是7*7*256
        shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)
        # 重新排形状，其具体形状为（b,200,7,7,256）
        pooled = tf.reshape(pooled, shape)
        return pooled

    def compute_output_shape(self, input_shape):
        return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1], )

首先关注一个问题：目前获得的rois(经RPN网络的长和宽大多都不一样了)是通过FPN获得的(P2-P5)，P2-P5默认设置anchor的也不一样，比如说(16,32,64,128,256)，那生成的某个roi来自于哪个feature_map(P2-P5对应的featrue_map) ? 也就是说对应给定的roi，应该去哪个feature_map上做ROIAlign？因为之前生成anchor数目过多，没有记录其来自哪个feature_map，论文提出了一个近似的计算公式：（截图来自于FPN网络原论文）

然后在看代码就清晰多了，至于roialign，代码直接调用了tf.image.crop_and_resize函数，这里有个较好的roi_pooling博文：https://blog.deepsense.ai/region-of-interest-pooling-explained/

这里的roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))稍微做了改变，变成了。

TimeDistributed

具体请参考https://blog.csdn.net/u013066730/article/details/100737059。

在经过PyramidROIAlign操作后，其tensor形状为（?,200,7,7,256），所以使用正常的卷积似乎有问题，多了一个维度，但是用3为卷积也不对，这里我只是想同一个卷积对200个roi进行卷积。此时TimeDistributed就能实现，达到200所在的这个维度共享参数。

再回来看整个特征图是怎么变化的(下面将batch设置为1进行举例)：

	input_shape	output_shape
PyramidROIAlign	[rois, image_meta] + feature_maps	(1,200,7,7,256)
TimeDistributed-Conv2D(k=7,i=256,o=1024)	(1,200,7,7,256)	(1,200,1,1,1024)
TimeDistributed-BN	(1,200,1,1,1024)	(1,200,1,1,1024)
Relu	(1,200,1,1,1024)	(1,200,1,1,1024)
TimeDistributed-Conv2D(k=1,i=1024,o=1024)	(1,200,1,1,1024)	(1,200,1,1,1024)
TimeDistributed-BN	(1,200,1,1,1024)	(1,200,1,1,1024)
Relu	(1,200,1,1,1024)	(1,200,1,1,1024)
squeeze	(1,200,1,1,1024)	shared(1,200,1024)
TimeDistributed-Dense	shared(1,200,1024)	mrcnn_class_logits(1,200,81)
TimeDistributed-Softmax	mrcnn_class_logits(1,200,81)	mrcnn_class_probs(1,200,81)
TimeDistributed-Dense	shared(1,200,1024)	x(1,200,81*4)
reshape	x(1,200,81*4)	mrcnn_bbox(1,200,81,4)

最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。

mjiansun

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
【MaskRCNN】源码系列四：train的RCNN

目录DetectionTargetLayerdetection_targets_graph去0操作trim_zeros_graph处理特别拥挤的实例计算overlaps候选正负ROIS抽选部分正负ROIS参与训练确定参与训练ROIs所对应的标签计算训练正样本与gt_box之间的偏移值为训练ROIS分配掩膜规范输出的形状fpn_classifie...
复制链接

扫一扫

专栏目录