目录
data_generator(接着数据处理二,data_generator中还有一部分没讲)
data_generator(接着数据处理二,data_generator中还有一部分没讲)
build_rpn_targets
rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors,
gt_class_ids, gt_boxes, config)
这个build_rpn_targets函数出现在mrcnn/model.py函数中。
def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):
输入参数
image_shape:输入处理过后的图像,形状为(1024,1024,3);
anchors:预设的anchorboxes,形状为(261888,4);
gt_class_ids:一张图片有多少个实例,那么维度即为多少,接着数据处理二中的情况,这里依旧为(26,);
gt_boxes:有26个实例,其对应的坐标为(26,4);
config:就是之前设置的配置文件。
# RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32) # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))] rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))
参数初始化,rpn_match形状为(261888,),rpn_bbox形状为(256,4)。
# Handle COCO crowds # A crowd box in COCO is a bounding box around several instances. Exclude # them from training. A crowd box is given a negative class ID. crowd_ix = np.where(gt_class_ids < 0)[0] if crowd_ix.shape[0] > 0: # Filter out crowds from ground truth class IDs and boxes non_crowd_ix = np.where(gt_class_ids > 0)[0] crowd_boxes = gt_boxes[crowd_ix] gt_class_ids = gt_class_ids[non_crowd_ix] gt_boxes = gt_boxes[non_crowd_ix] # Compute overlaps with crowd boxes [anchors, crowds] crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes) crowd_iou_max = np.amax(crowd_overlaps, axis=1) no_crowd_bool = (crowd_iou_max < 0.001) else: # All anchors don't intersect a crowd no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)
寻找数据当中有没有特别拥挤的,如果是特别拥挤的就需要另作处理,但这里不介绍,因为一般情况下也不会有这样的数据。
这里no_crowd_bool都为1,表明没有特别拥挤,所有数据都可以被使用。
overlaps = utils.compute_overlaps(anchors, gt_boxes)
这个compute_overlaps在mrcnn\utils.py中
def compute_overlaps(boxes1, boxes2): """Computes IoU overlaps between two sets of boxes. boxes1, boxes2: [N, (y1, x1, y2, x2)]. For better performance, pass the largest set first and the smaller second. """
输入参数
boxes1:一张图片中预设的anchorboxes,形状为(261888,4);
boxes2:一张图片中每个实例所在坐标,例如有26个实例,那么形状为(26,4)。
# Areas of anchors and GT boxes area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1]) area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1]) # Compute overlaps to generate matrix [boxes1 count, boxes2 count] # Each cell contains the IoU value. overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0])) for i in range(overlaps.shape[1]): box2 = boxes2[i] overlaps[:, i] = compute_iou(box2, boxes1, area2[i], area1) return overlaps
这里的area1表示预设anchorboxes的每一个框的面积,area2表示一张图片中每一个实例的外接矩形框的面积。
这时overlaps的形状为(261888,26),这表明每一个预设anchorbox与26个gt_box之间的重叠率。
这时我们回到compute_iou函数,来仔细阅读以下代码,输入的参数
def compute_iou(box, boxes, box_area, boxes_area): """Calculates IoU of the given box with the array of the given boxes. box: 1D vector [y1, x1, y2, x2] boxes: [boxes_count, (y1, x1, y2, x2)] box_area: float. the area of 'box' boxes_area: array of length boxes_count. Note: the areas are passed in rather than calculated here for efficiency. Calculate once in the caller to avoid duplicate work. """ # Calculate intersection areas y1 = np.maximum(box[0], boxes[:, 0]) y2 = np.minimum(box[2], boxes[:, 2]) x1 = np.maximum(box[1], boxes[:, 1]) x2 = np.minimum(box[3], boxes[:, 3]) intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0) union = box_area + boxes_area[:] - intersection[:] iou = intersection / union return iou
# 1. Set negative anchors first. They get overwritten below if a GT box is # matched to them. Skip boxes in crowd areas. anchor_iou_argmax = np.argmax(overlaps, axis=1) anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax] rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1 # 2. Set an anchor for each GT box (regardless of IoU value). # If multiple anchors have the same IoU match all of them gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0] rpn_match[gt_iou_argmax] = 1 # 3. Set anchors with high overlap as positive. rpn_match[anchor_iou_max >= 0.7] = 1
anchor_iou_argmax表示预设anchorbox与gt_box重叠最大的gt_box的索引值,所以其形状为(261888,)。
anchor_iou_max表示根据anchor_iou_argmax的索引值,得到每一个预设anchorbox与所有gt_box重叠最大的重叠率,其形状为(261888,)。
rpn_match将anchor_iou_max中重叠率小于0.3并且非拥挤的数据都置位-1,这时rpn_match当中包含0和-1的值,其形状为(261888,)。-1就是负样本候选了。
gt_iou_argmax,每一个gt_box只有一个最大重叠率,但这个最大重叠率并不是只有一个预设anchorbox才满足,可能有多个,所以其最终的形状为(99,),而不是(26,)。
rpn_match通过gt_iou_argmax的索引值,将对应为值置位1,表明这些位置的预设anchorbox就是正样本了。这步的操作就是为了防止某些gt_box与预设框的最大重叠率小于0.7,导致后来没有预设框与gt_box进行匹配,导致这个实例没有参与训练。
rpn_match根据anchor_iou_max中大于等于0.7的位置置位1,这类就是正样本。
# Subsample to balance positive and negative anchors # Don't let positives be more than half the anchors ids = np.where(rpn_match == 1)[0] extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2) if extra > 0: # Reset the extra ones to neutral ids = np.random.choice(ids, extra, replace=False) rpn_match[ids] = 0 # Same for negative proposals ids = np.where(rpn_match == -1)[0] extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE - np.sum(rpn_match == 1)) if extra > 0: # Rest the extra ones to neutral ids = np.random.choice(ids, extra, replace=False) rpn_match[ids] = 0
ids是rpn_match中等于1,也就是正样本的索引值。假设有121个正样本,其形状为(121,)。
extra=正样本的个数 - (256//2),根据上面ids的数量,此时的extra等于-7,那么就不会进入循环。
那么循环内在做什么呢?其实就是为了平衡数据用的,一共参与loss的就256个,不能让正样本太多,所以循环内让多余128的正样本置零,这样就能保证正负样本基本相等。
ids就是负样本的索引值。比如有258003个,那么形状为(258003,)
extra=负样本数量 - (256-正样本数量),那么这里结合121的正样本的话,extra等于257868。
然后满足条件,进入循环,然后将rpn_match将这部分额外的负样本量置为0,也就是这部分不参与loss。
ids = np.where(rpn_match == 1)[0] ix = 0 # index into rpn_bbox # TODO: use box_refinement() rather than duplicating the code here for i, a in zip(ids, anchors[ids]): # Closest gt box (it might have IoU < 0.7) gt = gt_boxes[anchor_iou_argmax[i]] # Convert coordinates to center plus width/height. # GT Box gt_h = gt[2] - gt[0] gt_w = gt[3] - gt[1] gt_center_y = gt[0] + 0.5 * gt_h gt_center_x = gt[1] + 0.5 * gt_w # Anchor a_h = a[2] - a[0] a_w = a[3] - a[1] a_center_y = a[0] + 0.5 * a_h a_center_x = a[1] + 0.5 * a_w # Compute the bbox refinement that the RPN should predict. rpn_bbox[ix] = [ (gt_center_y - a_center_y) / a_h, (gt_center_x - a_center_x) / a_w, np.log(gt_h / a_h), np.log(gt_w / a_w), ] # Normalize rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV ix += 1
ids正样本的索引值,这里根据上面的例子,其形状为(121,)。
i表示一个正样本索引,他是在261888个anchorbox中的索引值。
a表示一个正样本的坐标值,它是根据正样本索引从261888个anchorbox中选取对应位置的坐标值。
gt表示这个正样本预设anchorbox对应的mask框的坐标。
gt_h,gt_w,gt_center_y,gt_center_x表示mask框的宽高,中心点坐标。
a_h,a_w,a_center_y,a_center_x表示预设anchorbox的宽高,中心点坐标。
接下来的一通骚操作是在干嘛呢?请参考boundingbox回归的原理。主要就是为了方便收敛。
本文设置的config.RPN_BBOX_STD_DEV=[0.1,0.1,0.2,0.2],感觉是为了放大这个坐标,以免计算的时候占比太少,学习不到位。
如此循环至所有的正样本都有了对应的标签mask。
return rpn_match, rpn_bbox
rpn_match:包含了-1,0,1三种值,-1代表负样本,0代表不参与,1代表正样本。其形状为(261888,)。
rpn_bbox:包含了2种情况,一种是正样本的情况,包含了由中心点坐标和宽高转变而来的转换值,另一种是负样本的情况,包含了4个0。所以最终的形状为(256,4)。
由于random_rois为0,所以所有的关于这个判断都不需要进去处理了。
返回到data_generator,
# Init batch arrays
if b == 0:
batch_image_meta = np.zeros(
(batch_size,) + image_meta.shape, dtype=image_meta.dtype)
batch_rpn_match = np.zeros(
[batch_size, anchors.shape[0], 1], dtype=rpn_match.dtype)
batch_rpn_bbox = np.zeros(
[batch_size, config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4], dtype=rpn_bbox.dtype)
batch_images = np.zeros(
(batch_size,) + image.shape, dtype=np.float32)
batch_gt_class_ids = np.zeros(
(batch_size, config.MAX_GT_INSTANCES), dtype=np.int32)
batch_gt_boxes = np.zeros(
(batch_size, config.MAX_GT_INSTANCES, 4), dtype=np.int32)
batch_gt_masks = np.zeros(
(batch_size, gt_masks.shape[0], gt_masks.shape[1],
config.MAX_GT_INSTANCES), dtype=gt_masks.dtype)
batch_image_meta:形状为(1,93),保存了图像从原始的大小变换成目标大小(1024,1024)的变换操作。
batch_rpn_match:形状为(1,261888,1)
batch_rpn_bbox:形状为(1,256,4),256表示参与训练的rpnbbox的数量
batch_images:形状为(1,1024,1024,3)
batch_gt_class_ids:形状为(1,100),这个100是指一张图片中最多出现实例的数量。
batch_gt_boxes:形状为(1,100,4)
batch_gt_masks:形状为(1,56,56,100)
# If more instances than fits in the array, sub-sample from them.
if gt_boxes.shape[0] > config.MAX_GT_INSTANCES:
ids = np.random.choice(
np.arange(gt_boxes.shape[0]), config.MAX_GT_INSTANCES, replace=False)
gt_class_ids = gt_class_ids[ids]
gt_boxes = gt_boxes[ids]
gt_masks = gt_masks[:, :, ids]
由于本例中一张图片只有26个实例,所以显然小于100,就不能进入判断。那如果大于100,那么就需要将多出来的部分给扔掉。
batch_image_meta[b] = image_meta
batch_rpn_match[b] = rpn_match[:, np.newaxis]
batch_rpn_bbox[b] = rpn_bbox
batch_images[b] = mold_image(image.astype(np.float32), config)
batch_gt_class_ids[b, :gt_class_ids.shape[0]] = gt_class_ids
batch_gt_boxes[b, :gt_boxes.shape[0]] = gt_boxes
batch_gt_masks[b, :, :, :gt_masks.shape[-1]] = gt_masks
上面这段代码的就是简单的赋值,先提一下batch_gt_class_ids,batch_gt_boxes,batch_gt_masks有多少就赋值多少,没有的就是0,后面0的部分会被去掉的。
先来看一下mold_image函数
def mold_image(images, config): """Expects an RGB image (or array of images) and subtracts the mean pixel and converts it to float. Expects image colors in RGB order. """ return images.astype(np.float32) - config.MEAN_PIXEL
就是做了一个减均值的处理,也是为了加速收敛,但我个人感觉作用不是很大。
# Batch full?
if b >= batch_size:
inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox,
batch_gt_class_ids, batch_gt_boxes, batch_gt_masks]
outputs = []
if random_rois:
inputs.extend([batch_rpn_rois])
if detection_targets:
inputs.extend([batch_rois])
# Keras requires that output and targets have the same number of dimensions
batch_mrcnn_class_ids = np.expand_dims(
batch_mrcnn_class_ids, -1)
outputs.extend(
[batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask])
yield inputs, outputs
# start a new batch
b = 0
这里是判断达没达到设定的batch_size,这里由于设置的batch_size为1,所以直接直接进入该判断,如果batchsize为2,就会再重复一遍就行了。
接下来我们进入到这个判断中,整合成inputs,outputs,然后返回即可,别忘了将b置为0。