Mask RCNN道路分割（五）——源码解析

最新推荐文章于 2023-04-02 13:21:18 发布

suiyuan2009

最新推荐文章于 2023-04-02 13:21:18 发布

阅读量416

点赞数 1

分类专栏： # Mask R-CNN 文章标签： tensorflow

本文链接：https://blog.csdn.net/suiyuan2009/article/details/106156876

版权

Mask R-CNN 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

附录一：MaskRCNN源码解析

1.1 `model.py`文件内容一览

def log(text, array=None):
class BatchNorm(KL.BatchNormalization):
def compute_backbone_shapes(config, image_shape):
def identity_block(input_tensor, kernel_size, filters, stage, block,
                   use_bias=True, train_bn=True):
def conv_block(input_tensor, kernel_size, filters, stage, block,
               strides=(2, 2), use_bias=True, train_bn=True):
def resnet_graph(input_image, architecture, stage5=False, train_bn=True): 
	return [C1, C2, C3, C4, C5]  
def apply_box_deltas_graph(boxes, deltas):
def clip_boxes_graph(boxes, window):

1.1.1 ProposalLayer

class ProposalLayer(KE.Layer):

说明：
接收一个anchor得分，选择一个子集作为proposals传递给第二阶段。基于anchor得分和非极大值抑制来移除重叠以进行过滤。Receives anchor scores and selects a subset to pass as proposals to the second stage. Filtering is done based on anchor scores and non-max suppression to remove overlaps. It also applies bounding box refinement deltas to anchors.
输入：
rpn_probs: [batch, num_anchors, (bg prob, fg prob)]
rpn_bbox: [batch, num_anchors, (dy, dx, log(dh), log(dw))]
anchors: [batch, num_anchors, (y1, x1, y2, x2)] anchors in normalized coordinates
返回：
Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
调用：
在MaskRCNN类的build方法中被调用。
rpn_rois = ProposalLayer(
proposal_count=proposal_count,
nms_threshold=config.RPN_NMS_THRESHOLD,
name=“ROI”,
config=config)([rpn_class, rpn_bbox, anchors])
实现过程：
首先通过ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, name=“top_anchors”).indices依据rpn_probs[:,1]也即为前景额概率进行排序，选出得分最高的config.PRE_NMS_LIMIT（6000）个框（当PRE_NMS_LIMIT大于anchor个数时，则取所有的anchor数）。然后通过apply_box_deltas_graph(x, y)恢复每个框在原图像坐标系的真实坐标，并通过clip_boxes_graph(x, window)去掉超出边界的框。最后通过def nms(boxes, scores)函数进行非极大值抑制，得到config.POST_NMS_ROIS_TRAINING个推荐框，作为最终的推荐框进行返回。

def log2_graph(x):
class PyramidROIAlign(KE.Layer):

def overlaps_graph(boxes1, boxes2):

def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config):

说明：为每张图片产生检测的目标. Subsamples proposals and generates target class IDs, bounding box deltas, and masks for each.
Inputs:
proposals: [POST_NMS_ROIS_TRAINING, (y1, x1, y2, x2)] in normalized coordinates. Might be zero padded if there are not enough proposals.
gt_class_ids: [MAX_GT_INSTANCES] int class IDs
gt_boxes: [MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized coordinates.
gt_masks: [height, width, MAX_GT_INSTANCES] of boolean type.
Returns: Target ROIs and corresponding class IDs, bounding box shifts, and masks.
rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinates
class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs. Zero padded.
deltas: [TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw))]
masks: [TRAIN_ROIS_PER_IMAGE, height, width]. Masks cropped to bbox boundaries and resized to neural network output size.
Note: Returned arrays might be zero padded if not enough target ROIs.
调用：在类DetectionTargetLayer(KE.Layer):的call方法中被调用，
names = [“rois”, “target_class_ids”, “target_bbox”, “target_mask”]
outputs = utils.batch_slice([proposals, gt_class_ids, gt_boxes, gt_masks],
lambda w, x, y, z: detection_targets_graph(w, x, y, z, self.config),
self.config.IMAGES_PER_GPU, names=names)
实现过程：
首先通过trim_zeros_graph(proposals, name=“trim_proposals”)去掉为0的框；再通过crowd_ix = tf.where(gt_class_ids < 0)[:, 0]排除掉那些标记为crowd的框；再用overlaps = overlaps_graph(proposals, gt_boxes)和crowd_overlaps = overlaps_graph(proposals, crowd_boxes)计算推荐框与目标框的IoU。
再通过roi_iou_max = tf.reduce_max(overlaps, axis=1)和positive_roi_bool = (roi_iou_max >= 0.5)等按照IoU是否达到阈值对推荐框进行筛选。
通过roi_gt_box_assignment = tf.cond(
tf.greater(tf.shape(positive_overlaps)[1], 0),
true_fn = lambda: tf.argmax(positive_overlaps, axis=1),
false_fn = lambda: tf.cast(tf.constant([]),tf.int64)
)划分出正负样本。

class DetectionTargetLayer(KE.Layer):
    def __init__(self, config, **kwargs):
        super(DetectionTargetLayer, self).__init__(**kwargs)
        self.config = config
    def call(self, inputs):
        proposals = inputs[0]
        gt_class_ids = inputs[1]
        gt_boxes = inputs[2]
        gt_masks = inputs[3]

        # Slice the batch and run a graph for each slice
        # TODO: Rename target_bbox to target_deltas for clarity
        names = ["rois", "target_class_ids", "target_bbox", "target_mask"]
        outputs = utils.batch_slice(
            [proposals, gt_class_ids, gt_boxes, gt_masks],
            lambda w, x, y, z: detection_targets_graph(
                w, x, y, z, self.config),
            self.config.IMAGES_PER_GPU, names=names)
        return outputs

说明：
Subsamples proposals and generates target box refinement, class_ids, and masks for each.
输入：
proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might be zero padded if there are not enough proposals.
gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
coordinates.
gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type
返回：
Target ROIs and corresponding class IDs, bounding box shifts, and masks.
rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinates。这个rois会被送入第二阶段进行分类和回归。
target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw)]
target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width] Masks cropped to bbox boundaries and resized to neural network output size.
**说明: **
Returned arrays might be zero padded if not enough target ROIs.
调用：
再MaskRCNN类的build方法中被调用：
rois, target_class_ids, target_bbox, target_mask = DetectionTargetLayer(config, name=“proposal_targets”)(
[target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
而target_rois = rpn_rois，
而rpn_rois = ProposalLayer(proposal_count=proposal_count, nms_threshold=config.RPN_NMS_THRESHOLD, name=“ROI”, config=config)([rpn_class, rpn_bbox, anchors])
而rpn_class_logits, rpn_class, rpn_bbox = outputs
而outputs来自for p in rpn_feature_maps: layer_outputs.append(rpn([p]))的结果的堆叠，也即来自rpn。
而rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
调用：rois在哪里被使用？
mrcnn_class_logits, mrcnn_class, mrcnn_bbox = fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta, config.POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN, fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps, input_image_meta, config.MASK_POOL_SIZE, config.NUM_CLASSES, train_bn=config.TRAIN_BN)
实现过程：

def refine_detections_graph(rois, probs, deltas, window, config):
class DetectionLayer(KE.Layer):
def rpn_graph(feature_map, anchors_per_location, anchor_stride):
def build_rpn_model(anchor_stride, anchors_per_location, depth):
def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
def build_fpn_mask_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True):

def smooth_l1_loss(y_true, y_pred):
def rpn_class_loss_graph(rpn_match, rpn_class_logits):
def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):
def mrcnn_class_loss_graph(target_class_ids, pred_class_logits,
                           active_class_ids):
def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):
def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):

def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
                  use_mini_mask=False):

说明：在data_generator函数中被调用。

def build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config):

说明：在data_generator函数中被调用。生成训练第二阶段分类器和mask的目标值，该函数在一般训练中不使用。对调试和训练无RPN头部的MaskRCNN头部是很有用处的。
输入：
rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes.
gt_class_ids: [instance count] Integer class IDs
gt_boxes: [instance count, (y1, x1, y2, x2)]
gt_masks: [height, width, instance count] Ground truth masks. Can be full
size or mini-masks.
返回：
rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]
class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specific
bbox refinements.
masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks cropped
to bbox boundaries and resized to neural network output size.

def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):
    """Given the anchors and GT boxes, compute overlaps and identify positive
    anchors and deltas to refine them to match their corresponding GT boxes.

    anchors: [num_anchors, (y1, x1, y2, x2)]
    gt_class_ids: [num_gt_boxes] Integer class IDs.
    gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]

    Returns:
    rpn_match: [N] (int32) matches between anchors and GT boxes.
               1 = positive anchor正样本, -1 = negative anchor负样本, 0 = neutral忽略
    rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
    """
    # RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutral
    rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)
    # RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))]
    rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))

    # Handle COCO crowds
    crowd_ix = np.where(gt_class_ids < 0)[0]
    if crowd_ix.shape[0] > 0:
    else:
        no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)
    # Compute overlaps [num_anchors, num_gt_boxes]
    overlaps = utils.compute_overlaps(anchors, gt_boxes)

    # Match anchors to GT Boxes
    # If an anchor overlaps a GT box with IoU >= 0.7 then it's positive.
    # If an anchor overlaps a GT box with IoU < 0.3 then it's negative.
    # Neutral anchors are those that don't match the conditions above,
    # and they don't influence the loss function.
    # However, don't keep any GT box unmatched (rare, but happens). Instead,
    # match it to the closest anchor (even if its max IoU is < 0.3).
    #
    # 1. Set negative anchors first. They get overwritten below if a GT box is
    # matched to them. Skip boxes in crowd areas.
    anchor_iou_argmax = np.argmax(overlaps, axis=1) # 需要重点理解，这里是求每一行的最大值的索引。相当于比较对象是overlaps[..., i]中取最大值。对应axis=1的维度被消减掉了。输出结果的维度为overlaps.shape[0]。
    anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax] # 需要重点理解。这里是取每一行的最大值。
    rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1
    # 2. Set an anchor for each GT box (regardless of IoU value).
    # If multiple anchors have the same IoU match all of them
    gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0] # 需要重点理解，返回最大值所在的行？
    rpn_match[gt_iou_argmax] = 1
    # 3. Set anchors with high overlap as positive.
    rpn_match[anchor_iou_max >= 0.7] = 1

    # Subsample to balance positive and negative anchors
    # Don't let positives be more than half the anchors
    ids = np.where(rpn_match == 1)[0] 
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)
    if extra > 0:
        # Reset the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0
    # Same for negative proposals
    ids = np.where(rpn_match == -1)[0]
    extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE -
                        np.sum(rpn_match == 1))
    if extra > 0:
        # Rest the extra ones to neutral
        ids = np.random.choice(ids, extra, replace=False)
        rpn_match[ids] = 0

    # For positive anchors, compute shift and scale needed to transform them
    # to match the corresponding GT boxes.
    ids = np.where(rpn_match == 1)[0] # 此处ids代表rpn_match中值为1的行的行号
    ix = 0  # index into rpn_bbox
    # TODO: use box_refinement() rather than duplicating the code here
    # 遍历rpn_match中值为1的行，将对应的gt_box加入rpn_bbox。因此，若rpn_match的形状为[N, 1], rpn_bbox的形状应为[M, 4], 那么M应该为N中值为1的行数，且具有一一对应关系。这个关系应该是后期恢复bbox的依据。
    for i, a in zip(ids, anchors[ids]):
        # Closest gt box (it might have IoU < 0.7)
        gt = gt_boxes[anchor_iou_argmax[i]] # 注意

        # Convert coordinates to center plus width/height.
        # GT Box
        gt_h = gt[2] - gt[0]
        gt_w = gt[3] - gt[1]
        gt_center_y = gt[0] + 0.5 * gt_h
        gt_center_x = gt[1] + 0.5 * gt_w
        # Anchor
        a_h = a[2] - a[0]
        a_w = a[3] - a[1]
        a_center_y = a[0] + 0.5 * a_h
        a_center_x = a[1] + 0.5 * a_w

        # Compute the bbox refinement that the RPN should predict.
        rpn_bbox[ix] = [
            (gt_center_y - a_center_y) / a_h,
            (gt_center_x - a_center_x) / a_w,
            np.log(gt_h / a_h),
            np.log(gt_w / a_w),
        ]
        # Normalize
        rpn_bbox[ix] /= config.RPN_BBOX_STD_DEV
        ix += 1

    return rpn_match, rpn_bbox

说明：在data_generator函数中被调用。根据给定的anchors和GT boxes，计算IOU并且找出正的anchors和他们相对于GT boxes的修正量。
输入：

anchors: [num_anchors, (y1, x1, y2, x2)]
gt_class_ids: [num_gt_boxes] Integer class IDs.
gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]
返回：
rpn_match: [N] (int32) matches between anchors and GT boxes. 1 = positive anchor, -1 = negative anchor, 0 = neutral
rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
注意：rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)，rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))。rpn_match的N是anchor的总数，而rpn_bbox的N是config.RPN_TRAIN_ANCHORS_PER_IMAGE，默认设置为256。

通过overlaps = utils.compute_overlaps(anchors, gt_boxes)所构成的一个IOU得分的矩阵，行方向是anchors，列方向是gt_boxes。找到每一行的最大值。

def generate_random_rois(image_shape, count, gt_class_ids, gt_boxes):

说明：在data_generator函数中被调用。根据输入图像的大小、和GT boxes，随机生成指定数量的ROI proposals，类似RPN网络的生成方式。返回值为: [count, (y1, x1, y2, x2)] ROI boxes in pixels.

def data_generator(dataset, config, shuffle=True, augment=False, augmentation=None,
                   random_rois=0, batch_size=1, detection_targets=False,
                   no_augmentation_sources=None):

**说明：**数据生成器，从输入的数据集dataset中按batch_size取数据，返回为：yield inputs, outputs
返回值
inputs list:

images: [batch, H, W, C]
image_meta: [batch, (meta data)]. 维护图像的宽高、原始宽高等基本信息。参看compose_image_meta()函数。
rpn_match: [batch, N] Integer (1=positive anchor, -1=negative, 0=neutral)
rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs
gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]
gt_masks: [batch, height, width, MAX_GT_INSTANCES]. The height and width
are those of the image unless use_mini_mask is True, in which
case they are defined in MINI_MASK_SHAPE.
outputs list: Usually empty in regular training. But if detection_targets
is True then the outputs list contains target class_ids, bbox deltas,
and masks.
基本过程：

b=0，指示batch的数量; dataset.image_ids => image_ids; image_index = (image_index + 1) % len(image_ids); image_id = image_ids[image_index] 定义image的索引id
config/config.IMAGE_SHAPE => compute_backbone_shapes(config, config.IMAGE_SHAPE) => backbone_shapes + config中ANCHOR的比例、大小、步长和bachbone的步长 => utils.generate_pyramid_anchors => anchors
image, image_meta, gt_class_ids, gt_boxes, gt_masks = load_image_gt(dataset, config, image_id, augment=augment, augmentation=augmentation, use_mini_mask=config.USE_MINI_MASK) 这里获取图像的基本数据和bbox、mask数据
rpn_match, rpn_bbox = build_rpn_targets(image.shape, anchors, gt_class_ids, gt_boxes, config) 这里是建立RPN的目标值。
rpn_rois = generate_random_rois(image.shape, random_rois, gt_class_ids, gt_boxes) 如果random_rois为真，则计算随机的rpn_rois。#TO-DO
rois, mrcnn_class_ids, mrcnn_bbox, mrcnn_mask = build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config) 如果detection_targets为真，则计算roi。#TO-DO
通过循环，b=0~batch_size，将输入组成batch，输出。
inputs = [batch_images, batch_image_meta, batch_rpn_match, batch_rpn_bbox, batch_gt_class_ids, batch_gt_boxes, batch_gt_masks]；outputs = []。如果random_rois，则inputs.extend([batch_rpn_rois])；如果detection_targets，则inputs.extend([batch_rois])，同时batch_mrcnn_class_ids = np.expand_dims( batch_mrcnn_class_ids, -1)，且outputs.extend([batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask])。
判断b是否达到了batch_size，是则产生输出yield inputs, outputs，同时将b=0。
调用方式
在MaskRCNN类的train()方法中被调用：
train_generator = data_generator(train_dataset, self.config, shuffle=True,
augmentation=augmentation,
batch_size=self.config.BATCH_SIZE,
no_augmentation_sources=no_augmentation_sources)
最终的调用位置：
self.keras_model.fit_generator(
train_generator,
initial_epoch=self.epoch,
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=True,
)

class MaskRCNN():
    def __init__(self, mode, config, model_dir):
    def build(self, mode, config):
    def find_last(self):
    def load_weights(self, filepath, by_name=False, exclude=None):
    def get_imagenet_weights(self):
    def set_trainable(self, layer_regex, keras_model=None, indent=0, verbose=1):
    def set_log_dir(self, model_path=None):
    def train(self, train_dataset, val_dataset, learning_rate, epochs, layers,
              augmentation=None, custom_callbacks=None, no_augmentation_sources=None):
    def mold_inputs(self, images):
    def unmold_detections(self, detections, mrcnn_mask, original_image_shape,
                          image_shape, window):
    def detect(self, images, verbose=0):
    def detect_molded(self, molded_images, image_metas, verbose=0):
    def get_anchors(self, image_shape):
    def ancestor(self, tensor, name, checked=None):
    def find_trainable_layer(self, layer):
    def get_trainable_layers(self):
    def run_graph(self, images, outputs, image_metas=None):

def compose_image_meta(image_id, original_image_shape, image_shape,
                       window, scale, active_class_ids):
def parse_image_meta(meta):
def parse_image_meta_graph(meta):
def mold_image(images, config):
def unmold_image(normalized_images, config):
def trim_zeros_graph(boxes, name='trim_zeros'):
def batch_pack_graph(x, counts, num_rows):
def norm_boxes_graph(boxes, shape):
def denorm_boxes_graph(boxes, shape):

1.2 MaskRCNN.build()函数解析

1.2.1 模型输入

1	2	3	4	5	6	7	8
input_image	input_image_meta	input_rpn_match	input_rpn_bbox	input_gt_class_ids	input_gt_boxes	input_gt_masks	input_anchors

1.2.2 模型输出

1	2	3	4	5	6	7
rpn_class_logits	rpn_class	rpn_bbox	mrcnn_class_logits	mrcnn_class	mrcnn_bbox	mrcnn_mask
8	9	10	11	12	13	14
rpn_rois	output_rois	rpn_class_loss	rpn_bbox_loss	class_loss	bbox_loss	mask_loss

1.2.3 模型建立路线

input_image => resnet_graph => _, C2, C3, C4, C5 => Conv Ops => [P2, P3, P4, P5, P6] => rpn_feature_maps = [P2, P3, P4, P5, P6]和mrcnn_feature_maps = [P2, P3, P4, P5]
rpn_feature_maps和anchors => rpn = build_rpn_model模型 => rpn_class_logits, rpn_class, rpn_bbox（输出） => ProposalLayer(…)函数 => rpn_rois（输出）或者input_rois => target_rois
target_rois, input_gt_class_ids, gt_boxes, input_gt_masks => DetectionTargetLayer => rois, target_class_ids, target_bbox, target_mask
rois, mrcnn_feature_maps, input_image_meta => fpn_classifier_graph => mrcnn_class_logits, mrcnn_class, mrcnn_bbox（输出）
rois, mrcnn_feature_maps, input_image_meta => build_fpn_mask_graph => mrcnn_mask（输出）
output_rois（输出） = KL.Lambda(lambda x: x * 1, name=“output_rois”)(rois)
[input_rpn_match, rpn_class_logits] => rpn_class_loss_graph => rpn_class_loss
[input_rpn_bbox, input_rpn_match, rpn_bbox] => rpn_bbox_loss_graph => rpn_bbox_loss
[target_class_ids, mrcnn_class_logits, active_class_ids] => mrcnn_class_loss_graph => class_loss
[target_bbox, target_class_ids, mrcnn_bbox] => mrcnn_bbox_loss_graph => bbox_loss
[target_mask, target_class_ids, mrcnn_mask] => mrcnn_mask_loss_graph => mask_loss

1.2.4 代码段解析

    def build(self, mode, config):
        input_image = KL.Input(
        input_image_meta = KL.Input(
        if mode == "training":
        	input_rpn_match = KL.Input(
        	input_rpn_bbox = KL.Input(
        	input_gt_class_ids = KL.Input(
        	input_gt_boxes = KL.Input(
        	gt_boxes = KL.Lambda(...)(input_gt_boxes)
        	if/else config.USE_MINI_MASK:
        		input_gt_masks = KL.Input(
        elif mode == "inference":
        	input_anchors = KL.Input(
        if callable(config.BACKBONE):
        	_, C2, C3, C4, C5 = resnet_graph(input_image, ...)
        rpn_feature_maps = [P2, P3, P4, P5, P6]
        mrcnn_feature_maps = [P2, P3, P4, P5]
        if mode == "training":
        	anchors = self.get_anchors(config.IMAGE_SHAPE)
        	anchors = np.broadcast_to(anchors, ...)
        	anchors = KL.Lambda(...)(input_image)
        else:
        	anchors = input_anchors
        # 通过这个build_rpn_model函数得到rpn模型的实例
        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, ...)
        ...
        rpn_class_logits, rpn_class, rpn_bbox = outputs
        proposal_count = config.POST_NMS_ROIS_TRAINING ...
        # 通过ProposalLayer函数，从rpn得到的推荐框和anchors的推荐框
        rpn_rois = ProposalLayer(...)([rpn_class, rpn_bbox, anchors])
        if mode == "training":
        	active_class_ids = KL.Lambda(...)(input_image_meta)
        	if not config.USE_RPN_ROIS:
        		input_rois = KL.Input(
        		# norm_boxes_graph函数
        		target_rois = KL.Lambda(lambda x: norm_boxes_graph(
                    x, K.shape(input_image)[1:3]))(input_rois)
           	else:
           		target_rois = rpn_rois
           	# 通过DetectionTargetLayer函数，从
           	rois, target_class_ids, target_bbox, target_mask =\
                DetectionTargetLayer(config, name="proposal_targets")([
                    target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
            # 通过fpn_classifier_graph函数
            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
            # 通过build_fpn_mask_graph函数
            mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
                                              input_image_meta,
                                              config.MASK_POOL_SIZE,
                                              config.NUM_CLASSES,
                                              train_bn=config.TRAIN_BN)
            output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)
            # 构建loss函数
            rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
                [input_rpn_match, rpn_class_logits])
            rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
                [input_rpn_bbox, input_rpn_match, rpn_bbox])
            class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
                [target_class_ids, mrcnn_class_logits, active_class_ids])
            bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
                [target_bbox, target_class_ids, mrcnn_bbox])
            mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
                [target_mask, target_class_ids, mrcnn_mask])
            # 构建输入输出和模型
            inputs = [input_image, input_image_meta, input_rpn_match, input_rpn_bbox, 
            		  input_gt_class_ids, input_gt_boxes, input_gt_masks]
            if not config.USE_RPN_ROIS:
                inputs.append(input_rois)
            outputs = [rpn_class_logits, rpn_class, rpn_bbox, mrcnn_class_logits, 
            		   mrcnn_class, mrcnn_bbox, mrcnn_mask, pn_rois, output_rois,
            		   rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
            model = KM.Model(inputs, outputs, name='mask_rcnn')
        else:
        	mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
            detections = DetectionLayer(config, name="mrcnn_detection")(
                [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])
            detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections)
            mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,
                                              input_image_meta,
                                              config.MASK_POOL_SIZE,
                                              config.NUM_CLASSES,
                                              train_bn=config.TRAIN_BN)

            model = KM.Model([input_image, input_image_meta, input_anchors],
                             [detections, mrcnn_class, mrcnn_bbox,
                                 mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],
                             name='mask_rcnn')
		return model

代码段解析

def train(self, train_dataset, val_dataset, learning_rate, epochs, layers,
              augmentation=None, custom_callbacks=None, no_augmentation_sources=None):

1.3 utils.py文件解析

def batch_slice(inputs, graph_fn, batch_size, names=None):

说明：
Batch Slicing
Some custom layers support a batch size of 1 only, and require a lot of work to support batches greater than 1. This function slices an input tensor across the batch dimension and feeds batches of size 1. Effectively, an easy way to support batches > 1 quickly with little code modification. In the long run, it’s more efficient to modify the code to support large batches and getting rid of this function. Consider this a temporary solution
说明：
Splits inputs into slices and feeds each slice to a copy of the given computation graph and then combines the results. It allows you to run a graph on a batch of inputs even if the graph is written to support one instance only.
inputs: list of tensors. All must have the same first dimension length
graph_fn: A function that returns a TF tensor that’s part of a graph.
batch_size: number of slices to divide the data into.
names: If provided, assigns names to the resulting tensors.
注意这一段代码：
for i in range(batch_size):
inputs_slice = [x[i] for x in inputs]
output_slice = graph_fn(*inputs_slice)
outputs.append(output_slice)
说明inputs是一个list，对list每个元素按第一个维度进行切分，取出组成一个list，也即input_slice，再将其作为非关键字参数传递给graph_fn，计算得到结果。注意输出结果outputs也是一个list，其每个元素的第一个维度与输入是相同的。