【MaskRCNN】源码系列一：train数据处理三

最新推荐文章于 2023-02-12 17:00:47 发布

mjiansun

最新推荐文章于 2023-02-12 17:00:47 发布

阅读量812

点赞数 2

分类专栏： Keras 论文笔记

本文链接：https://blog.csdn.net/u013066730/article/details/102504951

版权

论文笔记同时被 2 个专栏收录

87 篇文章

订阅专栏

Keras

43 篇文章

订阅专栏

data_generator（接着数据处理二，data_generator中还有一部分没讲）

build_rpn_targets

data_generator（接着数据处理二，data_generator中还有一部分没讲）

build_rpn_targets

            rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors,
                                                    gt_class_ids, gt_boxes, config)

这个build_rpn_targets函数出现在mrcnn/model.py函数中。
def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):
输入参数

image_shape：输入处理过后的图像，形状为（1024，1024，3）；

anchors：预设的anchorboxes，形状为（261888，4）；

gt_class_ids：一张图片有多少个实例，那么维度即为多少，接着数据处理二中的情况，这里依旧为（26，）；

gt_boxes：有26个实例，其对应的坐标为（26，4）；

config：就是之前设置的配置文件。
    # RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral
    rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)
    # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))]
    rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))
参数初始化，rpn_match形状为（261888，），rpn_bbox形状为（256，4）。
    # Handle COCO crowds
    # A crowd box in COCO is a bounding box around several instances. Exclude
    # them from training. A crowd box is given a negative class ID.
    crowd_ix = np.where(gt_class_ids < 0)[0]
    if crowd_ix.shape[0] > 0:
        # Filter out crowds from ground truth class IDs and boxes
        non_crowd_ix = np.where(gt_class_ids > 0)[0]
        crowd_boxes = gt_boxes[crowd_ix]
        gt_class_ids = gt_class_ids[non_crowd_ix]
        gt_boxes = gt_boxes[non_crowd_ix]
        # Compute overlaps with crowd boxes [anchors, crowds]
        crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes)
        crowd_iou_max = np.amax(crowd_overlaps, axis=1)
        no_crowd_bool = (crowd_iou_max < 0.001)
    else:
        # All anchors don't intersect a crowd
        no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)
寻找数据当中有没有特别拥挤的，如果是特别拥挤的就需要另作处理，但这里不介绍，因为一般情况下也不会有这样的数据。

这里no_crowd_bool都为1，表明没有特别拥挤，所有数据都可以被使用。
    overlaps = utils.compute_overlaps(anchors, gt_boxes)
这个compute_overlaps在mrcnn\utils.py中
def compute_overlaps(boxes1, boxes2):
    """Computes IoU overlaps between two sets of boxes.
    boxes1, boxes2: [N, (y1, x1, y2, x2)].

    For better performance, pass the largest set first and the smaller second.
    """
输入参数

boxes1：一张图片中预设的anchorboxes，形状为（261888，4）；

boxes2：一张图片中每个实例所在坐标，例如有26个实例，那么形状为（26，4）。
    # Areas of anchors and GT boxes
    area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
    area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])

    # Compute overlaps to generate matrix [boxes1 count, boxes2 count]
    # Each cell contains the IoU value.
    overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0]))
    for i in range(overlaps.shape[1]):
        box2 = boxes2[i]
        overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1)
    return overlaps
这里的area1表示预设anchorboxes的每一个框的面积，area2表示一张图片中每一个实例的外接矩形框的面积。

这时overlaps的形状为（261888，26），这表明每一个预设anchorbox与26个gt_box之间的重叠率。

这时我们回到compute_iou函数，来仔细阅读以下代码，输入的参数
def compute_iou(box, boxes, box_area, boxes_area):
    """Calculates IoU of the given box with the array of the given boxes.
    box: 1D vector [y1, x1, y2, x2]
    boxes: [boxes_count, (y1, x1, y2, x2)]
    box_area: float. the area of 'box'
    boxes_area: array of length boxes_count.

    Note: the areas are passed in rather than calculated here for
    efficiency. Calculate once in the caller to avoid duplicate work.
    """
    # Calculate intersection areas
    y1 = np.maximum(box[0], boxes[:, 0])
    y2 = np.minimum(box[2], boxes[:, 2])
    x1 = np.maximum(box[1], boxes[:, 1])
    x2 = np.minimum(box[3], boxes[:, 3])
    intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0)
    union = box_area + boxes_area[:] - intersection[:]
    iou = intersection / union
    return iou
    # 1. Set negative anchors first. They get overwritten below if a GT box is
    # matched to them. Skip boxes in crowd areas.
    anchor_iou_argmax = np.argmax(overlaps, axis=1)
    anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
    rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1
    # 2. Set an anchor for each GT box (regardless of IoU value).
    # If multiple anchors have the same IoU match all of them
    gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
    rpn_match[gt_iou_argmax] = 1
    # 3. Set anchors with high overlap as positive.
    rpn_match[anchor_iou_max >= 0.7] = 1
anchor_iou_argmax表示预设anchorbox与gt_box重叠最大的gt_box的索引值，所以其形状为（261888，）。

anchor_iou_max表示根据anchor_iou_argmax的索引值，得到每一个预设anchorbox与所有gt_box重叠最大的重叠率，其形状为（261888，）。

rpn_match将anchor_iou_max中重叠率小于0.3并且非拥挤的数据都置位-1，这时rpn_match当中包含0和-1的值，其形状为（261888，）。-1就是负样本候选了。

gt_iou_argmax，每一个gt_box只有一个最大重叠率，但这个最大重叠率并不是只有一个预设anchorbox才满足，可能有多个，所以其最终的形状为（99，），而不是（26，）。

rpn_match通过gt_iou_argmax的索引值，将对应为值置位1，表明这些位置的预设anchorbox就是正样本了。这步的操作就是为了防止某些gt_box与预设框的最大重叠率小于0.7，导致后来没有预设框与gt_box进行匹配，导致这个实例没有参与训练。

rpn_match根据anchor_iou_max中大于等于0.7的位置置位1，这类就是正样本。
    # Subsample to balance positive and negative anchors
    # Don't let positives be more than half the anchors
    ids = np.where(rpn_match == 1)[0]
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)
    if extra > 0:
        # Reset the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0
    # Same for negative proposals
    ids = np.where(rpn_match == -1)[0]
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE -
                        np.sum(rpn_match == 1))
    if extra > 0:
        # Rest the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0
ids是rpn_match中等于1，也就是正样本的索引值。假设有121个正样本，其形状为（121，）。

extra=正样本的个数 - （256//2），根据上面ids的数量，此时的extra等于-7，那么就不会进入循环。

那么循环内在做什么呢？其实就是为了平衡数据用的，一共参与loss的就256个，不能让正样本太多，所以循环内让多余128的正样本置零，这样就能保证正负样本基本相等。

ids就是负样本的索引值。比如有258003个，那么形状为（258003，）

extra=负样本数量 - （256-正样本数量），那么这里结合121的正样本的话，extra等于257868。

然后满足条件，进入循环，然后将rpn_match将这部分额外的负样本量置为0，也就是这部分不参与loss。
    ids = np.where(rpn_match == 1)[0]
    ix = 0  # index into rpn_bbox
    # TODO: use box_refinement() rather than duplicating the code here
    for i, a in zip(ids, anchors[ids]):
        # Closest gt box (it might have IoU < 0.7)
        gt = gt_boxes[anchor_iou_argmax[i]]

        # Convert coordinates to center plus width/height.
        # GT Box
        gt_h = gt[2] - gt[0]
        gt_w = gt[3] - gt[1]
        gt_center_y = gt[0] + 0.5 * gt_h
        gt_center_x = gt[1] + 0.5 * gt_w
        # Anchor
        a_h = a[2] - a[0]
        a_w = a[3] - a[1]
        a_center_y = a[0] + 0.5 * a_h
        a_center_x = a[1] + 0.5 * a_w

        # Compute the bbox refinement that the RPN should predict.
        rpn_bbox[ix] = [
            (gt_center_y - a_center_y) / a_h,
            (gt_center_x - a_center_x) / a_w,
            np.log(gt_h / a_h),
            np.log(gt_w / a_w),
        ]
        # Normalize
        rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV
        ix += 1
ids正样本的索引值，这里根据上面的例子，其形状为（121，）。

i表示一个正样本索引，他是在261888个anchorbox中的索引值。

a表示一个正样本的坐标值，它是根据正样本索引从261888个anchorbox中选取对应位置的坐标值。

gt表示这个正样本预设anchorbox对应的mask框的坐标。

gt_h，gt_w，gt_center_y，gt_center_x表示mask框的宽高，中心点坐标。

a_h，a_w，a_center_y，a_center_x表示预设anchorbox的宽高，中心点坐标。

接下来的一通骚操作是在干嘛呢？请参考boundingbox回归的原理。主要就是为了方便收敛。

本文设置的config.RPN_BBOX_STD_DEV=[0.1,0.1,0.2,0.2]，感觉是为了放大这个坐标，以免计算的时候占比太少，学习不到位。

如此循环至所有的正样本都有了对应的标签mask。
    return rpn_match, rpn_bbox
rpn_match：包含了-1，0，1三种值，-1代表负样本，0代表不参与，1代表正样本。其形状为（261888，）。

rpn_bbox：包含了2种情况，一种是正样本的情况，包含了由中心点坐标和宽高转变而来的转换值，另一种是负样本的情况，包含了4个0。所以最终的形状为（256，4）。

由于random_rois为0，所以所有的关于这个判断都不需要进去处理了。

返回到data_generator，

            # Init batch arrays
            if b == 0:
                batch_image_meta = np.zeros(
                    (batch_size,) + image_meta.shape, dtype=image_meta.dtype)
                batch_rpn_match = np.zeros(
                    [batch_size, anchors.shape[0], 1], dtype=rpn_match.dtype)
                batch_rpn_bbox = np.zeros(
                    [batch_size, config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4], dtype=rpn_bbox.dtype)
                batch_images = np.zeros(
                    (batch_size,) + image.shape, dtype=np.float32)
                batch_gt_class_ids = np.zeros(
                    (batch_size, config.MAX_GT_INSTANCES), dtype=np.int32)
                batch_gt_boxes = np.zeros(
                    (batch_size, config.MAX_GT_INSTANCES, 4), dtype=np.int32)
                batch_gt_masks = np.zeros(
                    (batch_size, gt_masks.shape[0], gt_masks.shape[1],
                     config.MAX_GT_INSTANCES), dtype=gt_masks.dtype)

batch_image_meta：形状为（1，93），保存了图像从原始的大小变换成目标大小（1024，1024）的变换操作。

batch_rpn_match：形状为（1，261888，1）

batch_rpn_bbox：形状为（1，256，4），256表示参与训练的rpnbbox的数量

batch_images：形状为（1，1024，1024，3）

batch_gt_class_ids：形状为（1，100），这个100是指一张图片中最多出现实例的数量。

batch_gt_boxes：形状为（1，100，4）

batch_gt_masks：形状为（1，56，56，100）

            # If more instances than fits in the array, sub-sample from them.
            if gt_boxes.shape[0] > config.MAX_GT_INSTANCES:
                ids = np.random.choice(
                    np.arange(gt_boxes.shape[0]), config.MAX_GT_INSTANCES, replace=False)
                gt_class_ids = gt_class_ids[ids]
                gt_boxes = gt_boxes[ids]
                gt_masks = gt_masks[:, :, ids]

由于本例中一张图片只有26个实例，所以显然小于100，就不能进入判断。那如果大于100，那么就需要将多出来的部分给扔掉。

            batch_image_meta[b] = image_meta
            batch_rpn_match[b] = rpn_match[:, np.newaxis]
            batch_rpn_bbox[b] = rpn_bbox
            batch_images[b] = mold_image(image.astype(np.float32), config)
            batch_gt_class_ids[b, :gt_class_ids.shape[0]] = gt_class_ids
            batch_gt_boxes[b, :gt_boxes.shape[0]] = gt_boxes
            batch_gt_masks[b, :, :, :gt_masks.shape[-1]] = gt_masks

上面这段代码的就是简单的赋值，先提一下batch_gt_class_ids，batch_gt_boxes，batch_gt_masks有多少就赋值多少，没有的就是0，后面0的部分会被去掉的。

先来看一下mold_image函数
def mold_image(images, config):
    """Expects an RGB image (or array of images) and subtracts
    the mean pixel and converts it to float. Expects image
    colors in RGB order.
    """
    return images.astype(np.float32) - config.MEAN_PIXEL
就是做了一个减均值的处理，也是为了加速收敛，但我个人感觉作用不是很大。

            # Batch full?
            if b >= batch_size:
                inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox,
                          batch_gt_class_ids, batch_gt_boxes, batch_gt_masks]
                outputs = []

                if random_rois:
                    inputs.extend([batch_rpn_rois])
                    if detection_targets:
                        inputs.extend([batch_rois])
                        # Keras requires that output and targets have the same number of dimensions
                        batch_mrcnn_class_ids = np.expand_dims(
                            batch_mrcnn_class_ids, -1)
                        outputs.extend(
                            [batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask])

                yield inputs, outputs

                # start a new batch
                b = 0

这里是判断达没达到设定的batch_size，这里由于设置的batch_size为1，所以直接直接进入该判断，如果batchsize为2，就会再重复一遍就行了。

接下来我们进入到这个判断中，整合成inputs，outputs，然后返回即可，别忘了将b置为0。