【MaskRCNN】源码系列一:train数据处理三

目录

data_generator(接着数据处理二,data_generator中还有一部分没讲)

build_rpn_targets


data_generator(接着数据处理二,data_generator中还有一部分没讲)

build_rpn_targets

            rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors,
                                                    gt_class_ids, gt_boxes, config)

这个build_rpn_targets函数出现在mrcnn/model.py函数中。

def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):

输入参数

image_shape:输入处理过后的图像,形状为(1024,1024,3);

anchors:预设的anchorboxes,形状为(261888,4);

gt_class_ids:一张图片有多少个实例,那么维度即为多少,接着数据处理二中的情况,这里依旧为(26,);

gt_boxes:有26个实例,其对应的坐标为(26,4);

config:就是之前设置的配置文件。

    # RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral
    rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)
    # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))]
    rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))

参数初始化,rpn_match形状为(261888,),rpn_bbox形状为(256,4)。

 

    # Handle COCO crowds
    # A crowd box in COCO is a bounding box around several instances. Exclude
    # them from training. A crowd box is given a negative class ID.
    crowd_ix = np.where(gt_class_ids < 0)[0]
    if crowd_ix.shape[0] > 0:
        # Filter out crowds from ground truth class IDs and boxes
        non_crowd_ix = np.where(gt_class_ids > 0)[0]
        crowd_boxes = gt_boxes[crowd_ix]
        gt_class_ids = gt_class_ids[non_crowd_ix]
        gt_boxes = gt_boxes[non_crowd_ix]
        # Compute overlaps with crowd boxes [anchors, crowds]
        crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes)
        crowd_iou_max = np.amax(crowd_overlaps, axis=1)
        no_crowd_bool = (crowd_iou_max < 0.001)
    else:
        # All anchors don't intersect a crowd
        no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)

寻找数据当中有没有特别拥挤的,如果是特别拥挤的就需要另作处理,但这里不介绍,因为一般情况下也不会有这样的数据。

这里no_crowd_bool都为1,表明没有特别拥挤,所有数据都可以被使用。

 

    overlaps = utils.compute_overlaps(anchors, gt_boxes)

这个compute_overlaps在mrcnn\utils.py中

def compute_overlaps(boxes1, boxes2):
    """Computes IoU overlaps between two sets of boxes.
    boxes1, boxes2: [N, (y1, x1, y2, x2)].

    For better performance, pass the largest set first and the smaller second.
    """

输入参数

boxes1:一张图片中预设的anchorboxes,形状为(261888,4);

boxes2:一张图片中每个实例所在坐标,例如有26个实例,那么形状为(26,4)。

    # Areas of anchors and GT boxes
    area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
    area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])

    # Compute overlaps to generate matrix [boxes1 count, boxes2 count]
    # Each cell contains the IoU value.
    overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0]))
    for i in range(overlaps.shape[1]):
        box2 = boxes2[i]
        overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1)
    return overlaps

这里的area1表示预设anchorboxes的每一个框的面积,area2表示一张图片中每一个实例的外接矩形框的面积。

这时overlaps的形状为(261888,26),这表明每一个预设anchorbox与26个gt_box之间的重叠率。

这时我们回到compute_iou函数,来仔细阅读以下代码,输入的参数

def compute_iou(box, boxes, box_area, boxes_area):
    """Calculates IoU of the given box with the array of the given boxes.
    box: 1D vector [y1, x1, y2, x2]
    boxes: [boxes_count, (y1, x1, y2, x2)]
    box_area: float. the area of 'box'
    boxes_area: array of length boxes_count.

    Note: the areas are passed in rather than calculated here for
    efficiency. Calculate once in the caller to avoid duplicate work.
    """
    # Calculate intersection areas
    y1 = np.maximum(box[0], boxes[:, 0])
    y2 = np.minimum(box[2], boxes[:, 2])
    x1 = np.maximum(box[1], boxes[:, 1])
    x2 = np.minimum(box[3], boxes[:, 3])
    intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0)
    union = box_area + boxes_area[:] - intersection[:]
    iou = intersection / union
    return iou

 

    # 1. Set negative anchors first. They get overwritten below if a GT box is
    # matched to them. Skip boxes in crowd areas.
    anchor_iou_argmax = np.argmax(overlaps, axis=1)
    anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
    rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1
    # 2. Set an anchor for each GT box (regardless of IoU value).
    # If multiple anchors have the same IoU match all of them
    gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
    rpn_match[gt_iou_argmax] = 1
    # 3. Set anchors with high overlap as positive.
    rpn_match[anchor_iou_max >= 0.7] = 1

anchor_iou_argmax表示预设anchorbox与gt_box重叠最大的gt_box的索引值,所以其形状为(261888,)。

anchor_iou_max表示根据anchor_iou_argmax的索引值,得到每一个预设anchorbox与所有gt_box重叠最大的重叠率,其形状为(261888,)。

rpn_match将anchor_iou_max中重叠率小于0.3并且非拥挤的数据都置位-1,这时rpn_match当中包含0和-1的值,其形状为(261888,)。-1就是负样本候选了。

gt_iou_argmax,每一个gt_box只有一个最大重叠率,但这个最大重叠率并不是只有一个预设anchorbox才满足,可能有多个,所以其最终的形状为(99,),而不是(26,)。

rpn_match通过gt_iou_argmax的索引值,将对应为值置位1,表明这些位置的预设anchorbox就是正样本了。这步的操作就是为了防止某些gt_box与预设框的最大重叠率小于0.7,导致后来没有预设框与gt_box进行匹配,导致这个实例没有参与训练。

rpn_match根据anchor_iou_max中大于等于0.7的位置置位1,这类就是正样本。

 

    # Subsample to balance positive and negative anchors
    # Don't let positives be more than half the anchors
    ids = np.where(rpn_match == 1)[0]
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)
    if extra > 0:
        # Reset the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0
    # Same for negative proposals
    ids = np.where(rpn_match == -1)[0]
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE -
                        np.sum(rpn_match == 1))
    if extra > 0:
        # Rest the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0

ids是rpn_match中等于1,也就是正样本的索引值。假设有121个正样本,其形状为(121,)。

extra=正样本的个数 - (256//2),根据上面ids的数量,此时的extra等于-7,那么就不会进入循环。

那么循环内在做什么呢?其实就是为了平衡数据用的,一共参与loss的就256个,不能让正样本太多,所以循环内让多余128的正样本置零,这样就能保证正负样本基本相等。

ids就是负样本的索引值。比如有258003个,那么形状为(258003,)

extra=负样本数量 - (256-正样本数量),那么这里结合121的正样本的话,extra等于257868。

然后满足条件,进入循环,然后将rpn_match将这部分额外的负样本量置为0,也就是这部分不参与loss。

 

    ids = np.where(rpn_match == 1)[0]
    ix = 0  # index into rpn_bbox
    # TODO: use box_refinement() rather than duplicating the code here
    for i, a in zip(ids, anchors[ids]):
        # Closest gt box (it might have IoU < 0.7)
        gt = gt_boxes[anchor_iou_argmax[i]]

        # Convert coordinates to center plus width/height.
        # GT Box
        gt_h = gt[2] - gt[0]
        gt_w = gt[3] - gt[1]
        gt_center_y = gt[0] + 0.5 * gt_h
        gt_center_x = gt[1] + 0.5 * gt_w
        # Anchor
        a_h = a[2] - a[0]
        a_w = a[3] - a[1]
        a_center_y = a[0] + 0.5 * a_h
        a_center_x = a[1] + 0.5 * a_w

        # Compute the bbox refinement that the RPN should predict.
        rpn_bbox[ix] = [
            (gt_center_y - a_center_y) / a_h,
            (gt_center_x - a_center_x) / a_w,
            np.log(gt_h / a_h),
            np.log(gt_w / a_w),
        ]
        # Normalize
        rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV
        ix += 1

ids正样本的索引值,这里根据上面的例子,其形状为(121,)。

i表示一个正样本索引,他是在261888个anchorbox中的索引值。

a表示一个正样本的坐标值,它是根据正样本索引从261888个anchorbox中选取对应位置的坐标值。

gt表示这个正样本预设anchorbox对应的mask框的坐标。

gt_h,gt_w,gt_center_y,gt_center_x表示mask框的宽高,中心点坐标。

a_h,a_w,a_center_y,a_center_x表示预设anchorbox的宽高,中心点坐标。

接下来的一通骚操作是在干嘛呢?请参考boundingbox回归的原理。主要就是为了方便收敛。

本文设置的config.RPN_BBOX_STD_DEV=[0.1,0.1,0.2,0.2],感觉是为了放大这个坐标,以免计算的时候占比太少,学习不到位。

如此循环至所有的正样本都有了对应的标签mask。

    return rpn_match, rpn_bbox

rpn_match:包含了-1,0,1三种值,-1代表负样本,0代表不参与,1代表正样本。其形状为(261888,)。

rpn_bbox:包含了2种情况,一种是正样本的情况,包含了由中心点坐标和宽高转变而来的转换值,另一种是负样本的情况,包含了4个0。所以最终的形状为(256,4)。

 

 

由于random_rois为0,所以所有的关于这个判断都不需要进去处理了。

 

返回到data_generator,

            # Init batch arrays
            if b == 0:
                batch_image_meta = np.zeros(
                    (batch_size,) + image_meta.shape, dtype=image_meta.dtype)
                batch_rpn_match = np.zeros(
                    [batch_size, anchors.shape[0], 1], dtype=rpn_match.dtype)
                batch_rpn_bbox = np.zeros(
                    [batch_size, config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4], dtype=rpn_bbox.dtype)
                batch_images = np.zeros(
                    (batch_size,) + image.shape, dtype=np.float32)
                batch_gt_class_ids = np.zeros(
                    (batch_size, config.MAX_GT_INSTANCES), dtype=np.int32)
                batch_gt_boxes = np.zeros(
                    (batch_size, config.MAX_GT_INSTANCES, 4), dtype=np.int32)
                batch_gt_masks = np.zeros(
                    (batch_size, gt_masks.shape[0], gt_masks.shape[1],
                     config.MAX_GT_INSTANCES), dtype=gt_masks.dtype)

batch_image_meta:形状为(1,93),保存了图像从原始的大小变换成目标大小(1024,1024)的变换操作。

batch_rpn_match:形状为(1,261888,1)

batch_rpn_bbox:形状为(1,256,4),256表示参与训练的rpnbbox的数量

batch_images:形状为(1,1024,1024,3)

batch_gt_class_ids:形状为(1,100),这个100是指一张图片中最多出现实例的数量。

batch_gt_boxes:形状为(1,100,4)

batch_gt_masks:形状为(1,56,56,100)

 

            # If more instances than fits in the array, sub-sample from them.
            if gt_boxes.shape[0] > config.MAX_GT_INSTANCES:
                ids = np.random.choice(
                    np.arange(gt_boxes.shape[0]), config.MAX_GT_INSTANCES, replace=False)
                gt_class_ids = gt_class_ids[ids]
                gt_boxes = gt_boxes[ids]
                gt_masks = gt_masks[:, :, ids]

由于本例中一张图片只有26个实例,所以显然小于100,就不能进入判断。那如果大于100,那么就需要将多出来的部分给扔掉。

 

            batch_image_meta[b] = image_meta
            batch_rpn_match[b] = rpn_match[:, np.newaxis]
            batch_rpn_bbox[b] = rpn_bbox
            batch_images[b] = mold_image(image.astype(np.float32), config)
            batch_gt_class_ids[b, :gt_class_ids.shape[0]] = gt_class_ids
            batch_gt_boxes[b, :gt_boxes.shape[0]] = gt_boxes
            batch_gt_masks[b, :, :, :gt_masks.shape[-1]] = gt_masks

上面这段代码的就是简单的赋值,先提一下batch_gt_class_idsbatch_gt_boxes,batch_gt_masks有多少就赋值多少,没有的就是0,后面0的部分会被去掉的。

先来看一下mold_image函数

def mold_image(images, config):
    """Expects an RGB image (or array of images) and subtracts
    the mean pixel and converts it to float. Expects image
    colors in RGB order.
    """
    return images.astype(np.float32) - config.MEAN_PIXEL

就是做了一个减均值的处理,也是为了加速收敛,但我个人感觉作用不是很大。

 

            # Batch full?
            if b >= batch_size:
                inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox,
                          batch_gt_class_ids, batch_gt_boxes, batch_gt_masks]
                outputs = []

                if random_rois:
                    inputs.extend([batch_rpn_rois])
                    if detection_targets:
                        inputs.extend([batch_rois])
                        # Keras requires that output and targets have the same number of dimensions
                        batch_mrcnn_class_ids = np.expand_dims(
                            batch_mrcnn_class_ids, -1)
                        outputs.extend(
                            [batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask])

                yield inputs, outputs

                # start a new batch
                b = 0

这里是判断达没达到设定的batch_size,这里由于设置的batch_size为1,所以直接直接进入该判断,如果batchsize为2,就会再重复一遍就行了。

接下来我们进入到这个判断中,整合成inputs,outputs,然后返回即可,别忘了将b置为0。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值