【MaskRCNN】源码系列四:train的RCNN

目录

DetectionTargetLayer

detection_targets_graph

去0操作

trim_zeros_graph

处理特别拥挤的实例 

计算overlaps

候选正负ROIS

抽选部分正负ROIS参与训练

确定参与训练ROIs所对应的标签

计算训练正样本与gt_box之间的偏移值

为训练ROIS分配掩膜

规范输出的形状

fpn_classifier_graph

PyramidROIAlign

TimeDistributed

最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。


 

        if mode == "training":
            # Class ID mask to mark class IDs supported by the dataset the image
            # came from.
            active_class_ids = KL.Lambda(
                lambda x: parse_image_meta_graph(x)["active_class_ids"]
                )(input_image_meta)

            if not config.USE_RPN_ROIS:
                # Ignore predicted ROIs and use ROIs provided as an input.
                input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
                                      name="input_roi", dtype=np.int32)
                # Normalize coordinates
                target_rois = KL.Lambda(lambda x: norm_boxes_graph(
                    x, K.shape(input_image)[1:3]))(input_rois)
            else:
                target_rois = rpn_rois

            # Generate detection targets
            # Subsamples proposals and generates target outputs for training
            # Note that proposal class IDs, gt_boxes, and gt_masks are zero
            # padded. Equally, returned rois and targets are zero padded.
            rois, target_class_ids, target_bbox, target_mask =\
                DetectionTargetLayer(config, name="proposal_targets")([
                    target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

active_class_ids:在链接中,可以看出所有的类别都被激活,其形状为(1,81);

target_rois:在【MaskRCNN】源码系列三:train&test的RPN中,最终的输出就是rpn_rois,他是通过RPN选出来的前2000个框。

DetectionTargetLayer

class DetectionTargetLayer(KE.Layer):
    def __init__(self, config, **kwargs):
        super(DetectionTargetLayer, self).__init__(**kwargs)
        self.config = config

    def call(self, inputs):
        proposals = inputs[0]
        gt_class_ids = inputs[1]
        gt_boxes = inputs[2]
        gt_masks = inputs[3]

        # Slice the batch and run a graph for each slice
        # TODO: Rename target_bbox to target_deltas for clarity
        names = ["rois", "target_class_ids", "target_bbox", "target_mask"]
        outputs = utils.batch_slice(
            [proposals, gt_class_ids, gt_boxes, gt_masks],
            lambda w, x, y, z: detection_targets_graph(
                w, x, y, z, self.config),
            self.config.IMAGES_PER_GPU, names=names)
        return outputs

    def compute_output_shape(self, input_shape):
        return [
            (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # rois
            (None, self.config.TRAIN_ROIS_PER_IMAGE),  # class_ids
            (None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # deltas
            (None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0],
             self.config.MASK_SHAPE[1])  # masks
        ]

    def compute_mask(self, inputs, mask=None):
        return [None, None, None, None]

输入参数

target_rois:在【MaskRCNN】源码系列三:train&test的RPN中,最终的输出就是rpn_rois,他是通过RPN选出来的前2000个框,形状为(?,2000,4);

下面这3个变量都是待输入的变量,具体输入值可以查看【MaskRCNN】源码系列一:train数据处理三

input_gt_class_idsbatch_gt_class_ids:形状为(1,100),这个100是指一张图片中最多出现实例的数量。

gt_boxesbatch_gt_boxes:形状为(1,100,4)

input_gt_masksbatch_gt_masks:形状为(1,56,56,100)

DetectionTargetLayer类主要就是call()函数中的detection_targets_graph函数。功能的实现主要靠detection_targets_graph。

detection_targets_graph

def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config):

下面的变量要注意一点,第一个维度没有了,也就是batchsize这个维度,主要是由于batch_slice的操作导致的。 

proposals:就是上面的target_rois

gt_class_ids:就是上面的input_gt_class_ids

gt_boxes:就是上面的gt_boxes

gt_masks:就是上面的input_gt_masks

    # Assertions
    asserts = [
        tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals],
                  name="roi_assertion"),
    ]
    with tf.control_dependencies(asserts):
        proposals = tf.identity(proposals)

说实话,这段不知道干嘛的。。。

去0操作

    # Remove zero padding
    proposals, _ = trim_zeros_graph(proposals, name="trim_proposals")
    gt_boxes, non_zeros = trim_zeros_graph(gt_boxes, name="trim_gt_boxes")
    gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros,
                                   name="trim_gt_class_ids")
    gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2,
                         name="trim_gt_masks")

这段是删除之前为了固定形状补0的部分。得到的结果如下:

proposals:删除掉0的经过修正的前小于等于2000个预设框;

gt_boxes:一张图片中具体真实有多少个实例,(?,4)其中?小于等于100;

non_zeros(100,)输入就是100,非0部分为True,0的部分为False;

gt_class_ids(?)根据non_zeros,选取非0部分每一个实例的类别;

gt_masks(56,56,?)根据non_zeros,选取非0部分每一个实例的mask。

trim_zeros_graph

def trim_zeros_graph(boxes, name='trim_zeros'):
    """Often boxes are represented with matrices of shape [N, 4] and
    are padded with zeros. This removes zero boxes.

    boxes: [N, 4] matrix of boxes.
    non_zeros: [N] a 1D boolean mask identifying the rows to keep
    """
    non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool)
    boxes = tf.boolean_mask(boxes, non_zeros, name=name)
    return boxes, non_zeros

有可能输入的形状为(?,4),表示坐标,取绝对值并在1这个维度进行求和,大于0的位置为True,等于0的位置为False,然后使用boolean_mask来选取为True位置的数据,得到的boxes形状为(?>新数量,4)。

处理特别拥挤的实例 

    # Handle COCO crowds
    # A crowd box in COCO is a bounding box around several instances. Exclude
    # them from training. A crowd box is given a negative class ID.
    crowd_ix = tf.where(gt_class_ids < 0)[:, 0]
    non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0]
    crowd_boxes = tf.gather(gt_boxes, crowd_ix)
    gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix)
    gt_boxes = tf.gather(gt_boxes, non_crowd_ix)
    gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2)

gt_class_ids小于0就是特别拥挤的,大于0就是单张图片中每一个真实的实例类别。每个的形状为:

crowd_ix:(?,)其中?表示拥挤实例的数量

non_crowd_ix:(?,)其中?表示正常实例的数量

crowd_boxes:(?,4)其中?表示拥挤实例的数量

gt_class_ids:(?,)其中?表示正常实例的数量

gt_boxes:(?,4)其中?表示正常实例的数量

gt_masks:(56,56,?)其中?表示正常实例的数量

    # Compute overlaps with crowd boxes [proposals, crowd_boxes]
    crowd_overlaps = overlaps_graph(proposals, crowd_boxes)
    crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1)
    no_crowd_bool = (crowd_iou_max < 0.001)

 计算上面得到的proposals与拥挤实例crowd_boxes框进行重叠率,在拥挤实例中找到与单个proposals重叠度最大的值,如果该值小于0.001,那说明这部分符合条件的proposals可以被保留。

计算overlaps

    # Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)
def overlaps_graph(boxes1, boxes2):
    """Computes IoU overlaps between two sets of boxes.
    boxes1, boxes2: [N, (y1, x1, y2, x2)].
    """
    # 1. Tile boxes2 and repeat boxes1. This allows us to compare
    # every boxes1 against every boxes2 without loops.
    # TF doesn't have an equivalent to np.repeat() so simulate it
    # using tf.tile() and tf.reshape.
    b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1),
                            [1, 1, tf.shape(boxes2)[0]]), [-1, 4]) # 这个形状是(N1*N2,4)这里表示N1中的第一个坐标被重复N2次,第二个坐标被重复N2次,一直到N1个坐标被重复N2次
    b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1]) # 这个形状是(N1*N2,4)这里表示N2个坐标被重复了N1次
    # 2. Compute intersections
    b1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1)
    b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1)
    y1 = tf.maximum(b1_y1, b2_y1)
    x1 = tf.maximum(b1_x1, b2_x1)
    y2 = tf.minimum(b1_y2, b2_y2)
    x2 = tf.minimum(b1_x2, b2_x2)
    intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0)
    # 3. Compute unions
    b1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1)
    b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1)
    union = b1_area + b2_area - intersection
    # 4. Compute IoU and reshape to [boxes1, boxes2]
    iou = intersection / union
    overlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]]) # roi_box与每个gt_box的得分
    return overlaps

上面这段就是计算得到的小于等于2000个前景框与该张图片中每一个非拥挤实例的框的坐标的重叠率。

输入参数

proposals:删除掉0的经过修正的前小于等于2000个预设框,假设数量为N1;

gt_boxes:非拥挤实例的框的坐标,假设数量为N2。

返回值

overlaps:返回的是每一个roi_box也就是proposals,与gt_boxes之间的得分,所以形状为(N1,N2)。

候选正负ROIS

    # Determine positive and negative ROIs
    roi_iou_max = tf.reduce_max(overlaps, axis=1)
    # 1. Positive ROIs are those with >= 0.5 IoU with a GT box
    positive_roi_bool = (roi_iou_max >= 0.5)
    positive_indices = tf.where(positive_roi_bool)[:, 0]
    # 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds.
    negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0]

在overlaps中找到每一个proposals与gt_boxes的最大重叠率,当最大重叠率大于0.5时,就认为是正样本。当小于0.5,并且不是拥挤样本时就是负样本。

positive_indices:候选正样本的索引值;

negative_indices:候选负样本的索引值。

抽选部分正负ROIS参与训练

    # Subsample ROIs. Aim for 33% positive
    # Positive ROIs
    positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
                         config.ROI_POSITIVE_RATIO) #200*0.33
    positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
    positive_count = tf.shape(positive_indices)[0]
    # Negative ROIs. Add enough to maintain positive:negative ratio.
    r = 1.0 / config.ROI_POSITIVE_RATIO
    negative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_count
    negative_indices = tf.random_shuffle(negative_indices)[:negative_count]
    # Gather selected ROIs
    positive_rois = tf.gather(proposals, positive_indices)
    negative_rois = tf.gather(proposals, negative_indices)

一共有200个参与后续训练,从上面候选正样本中选取200*0.33个数据作为后续训练,这里负样本的数量为negative_count=(1/0.33)*(200*0.33)-(200*0.33)。

确定参与训练ROIs所对应的标签

    # Assign positive ROIs to GT boxes.就是每一个正样本框对应的gt框的坐标以及类别是什么
    positive_overlaps = tf.gather(overlaps, positive_indices)
    roi_gt_box_assignment = tf.cond(
        tf.greater(tf.shape(positive_overlaps)[1], 0),
        true_fn = lambda: tf.argmax(positive_overlaps, axis=1),
        false_fn = lambda: tf.cast(tf.constant([]),tf.int64)
    )
    roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment)
    roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment)

根据训练正样本选取与gt_boxes的重叠率,然后判断这些训练正样本与哪个gt_box的重叠率最高,那就把这个gt_box的坐标和类别作为该训练正样本的标签。

计算训练正样本与gt_box之间的偏移值

    # Compute bbox refinement for positive ROIs
    deltas = utils.box_refinement_graph(positive_rois, roi_gt_boxes)
    deltas /= config.BBOX_STD_DEV
def box_refinement_graph(box, gt_box):
    """Compute refinement needed to transform box to gt_box.
    box and gt_box are [N, (y1, x1, y2, x2)]
    """
    box = tf.cast(box, tf.float32)
    gt_box = tf.cast(gt_box, tf.float32)

    height = box[:, 2] - box[:, 0]
    width = box[:, 3] - box[:, 1]
    center_y = box[:, 0] + 0.5 * height
    center_x = box[:, 1] + 0.5 * width

    gt_height = gt_box[:, 2] - gt_box[:, 0]
    gt_width = gt_box[:, 3] - gt_box[:, 1]
    gt_center_y = gt_box[:, 0] + 0.5 * gt_height
    gt_center_x = gt_box[:, 1] + 0.5 * gt_width

    dy = (gt_center_y - center_y) / height
    dx = (gt_center_x - center_x) / width
    dh = tf.log(gt_height / height)
    dw = tf.log(gt_width / width)

    result = tf.stack([dy, dx, dh, dw], axis=1)
    return result

计算训练正样本与gt_box之间的偏移值。形状为(?,4)。

为训练ROIS分配掩膜

    # Assign positive ROIs to GT masks
    # Permute masks to [N, height, width, 1]
    transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1)
    # Pick the right mask for each ROI
    roi_masks = tf.gather(transposed_masks, roi_gt_box_assignment)

gt_masks的形状为(56,56,?)。结果变换得到transposed_mask,其形状为(?,56,56,1)。

这样再为每一个训练ROIS分配标记的掩膜,得到roi_masks。

    # Compute mask targets
    boxes = positive_rois
    if config.USE_MINI_MASK:
        # Transform ROI coordinates from normalized image space
        # to normalized mini-mask space.
        y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1)
        gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1)
        gt_h = gt_y2 - gt_y1
        gt_w = gt_x2 - gt_x1
        y1 = (y1 - gt_y1) / gt_h
        x1 = (x1 - gt_x1) / gt_w
        y2 = (y2 - gt_y1) / gt_h
        x2 = (x2 - gt_x1) / gt_w
        boxes = tf.concat([y1, x1, y2, x2], 1)
    box_ids = tf.range(0, tf.shape(roi_masks)[0])
    masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes,
                                     box_ids,
                                     config.MASK_SHAPE)
    # Remove the extra dimension from masks.
    masks = tf.squeeze(masks, axis=3)

    # Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with
    # binary cross entropy loss.
    masks = tf.round(masks)

如上图所示,训练ROIS与target之间有差别,并不能完美重叠,那么就需要将他们的交集也就是阴影区域取出来,然后放缩成后续可以使用的28*28的尺寸,再经过squeeze的操作,将多余维度删除,得到masks,其形状为(?,28,28)。

其中如果对tf.image.crop_and_resize不清楚,可以参考https://blog.csdn.net/u013066730/article/details/100583484

由于resize会出现非0或1值,但标签要求必须是0或1,所以使用round函数,大于等于0.5为1,小于0.5为0。

规范输出的形状

    # Append negative ROIs and pad bbox deltas and masks that
    # are not used for negative ROIs with zeros.
    rois = tf.concat([positive_rois, negative_rois], axis=0)
    N = tf.shape(negative_rois)[0]
    P = tf.maximum(config.TRAIN_ROIS_PER_IMAGE - tf.shape(rois)[0], 0)
    rois = tf.pad(rois, [(0, P), (0, 0)])
    roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)])
    roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)])
    deltas = tf.pad(deltas, [(0, N + P), (0, 0)])
    masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)])

    return rois, roi_gt_class_ids, deltas, masks

rois的包含正负样本,但我们规定是200个,通过上面的计算有可能是少于200个的,所以为了规范形状,不足200,就补0补到200。

roi_gt_boxes,roi_gt_class_ids,deltas,masks都是只有这样本才有这样的标签,所以填补0的时候,是P+N,最终的个数都是200个。

整个DetectionTargetLayer最终得到了如下几个变量

rois:(1,200,4)这里的200包含有3部分,正样本、负样本和填补的0;

target_class_ids:(1,200)这里的200包含有2部分,正样本和填补的0;

target_bbox:(1,200,4)这里的200包含有2部分,正样本和填补的0;

target_mask:(1,200,28,28)这里的200包含有2部分,正样本和填补的0。

fpn_classifier_graph

回到build函数种

            # Network Heads
            # TODO: verify that this handles zero padded ROIs
            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
    # ROI Pooling
    # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)
    # Two 1024 FC layers (implemented with Conv2D for consistency)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
                           name="mrcnn_class_conv1")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
                           name="mrcnn_class_conv2")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
                       name="pool_squeeze")(x)

    # Classifier head
    mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
                                            name='mrcnn_class_logits')(shared)
    mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
                                     name="mrcnn_class")(mrcnn_class_logits)

    # BBox head
    # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
    x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
                           name='mrcnn_bbox_fc')(shared)
    # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
    s = K.int_shape(x)
    mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)

    return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

输入参数

rois:(1,200,4)这里的200包含有3部分,正样本、负样本和填补的0;

feature_maps:[P2, P3, P4, P5],这里请参考【MaskRCNN】源码系列二:train&test特征图

image_meta:这里就是记录了原始图片被放缩处理的信息,https://blog.csdn.net/u013066730/article/details/102501128

pool_size:7;

num_classes:81;

train_bn:False;

fc_layers_size:1024。

下面先介绍这个网络种比较关键的几个内容,再介绍网络整体结构

PyramidROIAlign

class PyramidROIAlign(KE.Layer):
    def __init__(self, pool_shape, **kwargs):
        super(PyramidROIAlign, self).__init__(**kwargs)
        self.pool_shape = tuple(pool_shape)

下面这段代码是该层的使用方式

    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)

初始化pool_size=7,然后再输入参数rois,image_meta,feature_maps。

关注主要的call函数

    def call(self, inputs):
        # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords
        boxes = inputs[0]

        # Image meta
        # Holds details about the image. See compose_image_meta()
        image_meta = inputs[1]

        # Feature Maps. List of feature maps from different level of the
        # feature pyramid. Each is [batch, height, width, channels]
        feature_maps = inputs[2:]

        # Assign each ROI to a level in the pyramid based on the ROI area.
        y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)
        h = y2 - y1
        w = x2 - x1
        # Use shape of first image. Images in a batch must have the same size.
        image_shape = parse_image_meta_graph(image_meta)['image_shape'][0]
        # Equation 1 in the Feature Pyramid Networks paper. Account for
        # the fact that our coordinates are normalized here.
        # e.g. a 224x224 ROI (in pixels) maps to P4
        image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)
        roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
        roi_level = tf.minimum(5, tf.maximum(
            2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # shape is (1,?,1)
        roi_level = tf.squeeze(roi_level, 2)

        # Loop through levels and apply ROI pooling to each. P2 to P5.
        pooled = []
        box_to_level = []
        for i, level in enumerate(range(2, 6)):
        for i, level in enumerate(range(2, 6)):
            # ix的形状为(?,2),表示每个batch中有多少个与特征图图等级相符的roi的索引,?表示有多少个,2表示有2维度的值
            # 假设?=10,batch如果为3,那么这个ix的形状就为(10,2),在2列中第一列表示在哪个图片上,可以取0-2的值,第二列表示该图片上有几个符合特征图等级的roi的索引值。
            ix = tf.where(tf.equal(roi_level, level))
            # 选取符合条件的batch中的某张图片的某个roi,将这些都选出来
            level_boxes = tf.gather_nd(boxes, ix)

            # 这些roi都属于一个batch中的哪张图片
            # Box indices for crop_and_resize.
            box_indices = tf.cast(ix[:, 0], tf.int32)

            # Stop gradient propogation to ROI proposals
            level_boxes = tf.stop_gradient(level_boxes)
            box_indices = tf.stop_gradient(box_indices)

            # Crop and Resize
            # Result: [batch * num_boxes, pool_height, pool_width, channels]
            # 比如说batch为3,那么box_indices可能为[0,2,0,0,1,0,1,2,0,2,2,2,1],那么0表示一个batch中的第0张图片
            # level_boxes的形状为(?,4),这个?与box_indices的长度相同,表示符合特征图level等级的一个batch中第n张图片中的一个roi框
            # feature_map[i]表示对应的level的特征图
            pooled.append(tf.image.crop_and_resize(
                feature_maps[i], level_boxes, box_indices, self.pool_shape,
                method="bilinear"))

        # Pack pooled features into one tensor ,shape is (?,7,7,256)
        pooled = tf.concat(pooled, axis=0)

        # Pack box_to_level mapping into one array and add another
        # column representing the order of pooled boxes
        # 原来是box_to_level可能为[[0,2],[0,13],[0,21],[1,5],[2,8],[2,10]...],经过这下面的操作变为[[0,2,0],[0,13,1],[0,21,2],[1,5,3],[2,8,4],[2,10,5]...]
        box_to_level = tf.concat(box_to_level, axis=0) # 4个(?,2)进行组合
        box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)
        box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],
                                 axis=1)

        # Rearrange pooled features to match the order of the original boxes
        # Sort box_to_level by batch then box index
        # TF doesn't have a way to sort by two columns, so merge them and sort.
        # sorting_tensor就是batch中的每个图片的索引拉开差距,第0张图片那就是自己,第1张图片需要+100000,第二张图片需要+200000,以此类推
        sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]
        # 从小到大排列,这样就有了按照batch中每个图片的顺序先排序,再按照第n个box的顺序拍列
        ix = tf.nn.top_k(sorting_tensor, k=tf.shape(
            box_to_level)[0]).indices[::-1]
        # 先将box_to_level按照该顺序排列
        ix = tf.gather(box_to_level[:, 2], ix)
        # 再将pooled按照box_to_level的顺序排列
        pooled = tf.gather(pooled, ix)

        # Re-add the batch dimension
        # 这个shape就是boxes一个batch中有几张图片,每张图片有200个roi,后面的形状就是7*7*256
        shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)
        # 重新排形状,其具体形状为(b,200,7,7,256)
        pooled = tf.reshape(pooled, shape)
        return pooled

    def compute_output_shape(self, input_shape):
        return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1], )

首先关注一个问题:目前获得的rois(经RPN网络的长和宽大多都不一样了)是通过FPN获得的(P2-P5),P2-P5默认设置anchor的也不一样,比如说(16,32,64,128,256),那生成的某个roi来自于哪个feature_map(P2-P5对应的featrue_map) ? 也就是说对应给定的roi,应该去哪个feature_map上做ROIAlign?因为之前生成anchor数目过多,没有记录其来自哪个feature_map,论文提出了一个近似的计算公式:(截图来自于FPN网络原论文) 

然后在看代码就清晰多了,至于roialign,代码直接调用了tf.image.crop_and_resize函数,这里有个较好的roi_pooling博文:https://blog.deepsense.ai/region-of-interest-pooling-explained/

这里的roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))稍微做了改变,变成了

TimeDistributed

具体请参考https://blog.csdn.net/u013066730/article/details/100737059

在经过PyramidROIAlign操作后,其tensor形状为(?,200,7,7,256),所以使用正常的卷积似乎有问题,多了一个维度,但是用3为卷积也不对,这里我只是想同一个卷积对200个roi进行卷积。此时TimeDistributed就能实现,达到200所在的这个维度共享参数。

再回来看整个特征图是怎么变化的(下面将batch设置为1进行举例):

 input_shapeoutput_shape
PyramidROIAlign[rois, image_meta] + feature_maps(1,200,7,7,256)
TimeDistributed-Conv2D(k=7,i=256,o=1024)(1,200,7,7,256)(1,200,1,1,1024)
TimeDistributed-BN(1,200,1,1,1024)(1,200,1,1,1024)
Relu(1,200,1,1,1024)(1,200,1,1,1024)
TimeDistributed-Conv2D(k=1,i=1024,o=1024)(1,200,1,1,1024)(1,200,1,1,1024)
TimeDistributed-BN(1,200,1,1,1024)(1,200,1,1,1024)
Relu(1,200,1,1,1024)(1,200,1,1,1024)
squeeze(1,200,1,1,1024)shared(1,200,1024)
TimeDistributed-Denseshared(1,200,1024)mrcnn_class_logits(1,200,81)
TimeDistributed-Softmaxmrcnn_class_logits(1,200,81)mrcnn_class_probs(1,200,81)
TimeDistributed-Denseshared(1,200,1024)x(1,200,81*4)
reshapex(1,200,81*4)mrcnn_bbox(1,200,81,4)

最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值