MaskRCNN源码解析5：损失部分解析

最新推荐文章于 2024-04-10 00:15:00 发布

业余狙击手19

最新推荐文章于 2024-04-10 00:15:00 发布

阅读量9.1k

点赞数 5

分类专栏： # 目标检测算法

本文链接：https://blog.csdn.net/sxlsxl119/article/details/103433078

版权

目标检测算法专栏收录该内容

28 篇文章 17 订阅

订阅专栏

MaskRCNN源码解析3：RPN、ProposalLayer、DetectionTargetLayer

MaskRCNN源码解析4-0：ROI Pooling 与 ROI Align理论

MaskRCNN源码解析4：头网络(Networks Heads)解析

MaskRCNN源码解析5：损失部分解析

目录

MaskRCNN概述：

D)，损失部分解析

1，rpn 分类损失交叉熵

2，rpn 回归损失 SmoothL1

3，mrcnn 分类损失交叉熵

4，mrcnn 回归损失 SmoothL1

5，mask 损失掩膜二进制交叉熵

Smooth-L1

MaskRCNN概述：

Mask R-CNN是一个小巧、灵活的通用对象实例分割框架（object instance segmentation）。它不仅可对图像中的目标进行检测，还可以对每一个目标给出一个高质量的分割结果。它在Faster R-CNN[1]基础之上进行扩展，并行地在bounding box recognition分支上添加一个用于预测目标掩模（object mask）的新分支。该网络还很容易扩展到其他任务中，比如估计人的姿势，也就是关键点识别（person keypoint detection）。该框架在COCO的一些列挑战任务重都取得了最好的结果，包括实例分割（instance segmentation）、候选框目标检测（bounding-box object detection）和人关键点检测（person keypoint detection）。

参考文章：

Mask RCNN 学习笔记

MaskRCNN源码解读

令人拍案称奇的Mask RCNN

论文笔记：Mask R-CNN

Mask R-CNN个人理解

解析源码地址：

https://github.com/matterport/Mask_RCNN

D)，损失部分解析

Mask RCNN中总共有五个损失函数，分别是rpn网络的两个损失，mrcnn的两个损失，以及mask分支的损失函数。
前四个损失函数与fasterrcnn的损失函数一样，最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。K个（类别数）分辨率为m * m的二值mask。 Lmask为平均二值交叉熵损失（the average binary cross - entropy loss）. 对于一个属于第k个类别的RoI， Lmask仅仅考虑第k个mask（其他的掩模输入不会贡献到损失函数中）。这样的定义会允许对每个类别都会生成掩模，并且不会存在类间竞争。

代码中损失部分的整体代码如下：

            # *************************8，计算各部分的损失******************************************************************
            # maskrcnn中总共有五个损失函数，分别是rpn网络的两个损失，mrcnn的两个损失，以及mask分支的损失函数。
            # 前四个损失函数与fasterrcnn的损失函数一样，最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。
            # K个（类别数）分辨率为m * m的二值mask。 
            # 因此作者利用了aper - pixelsigmoid，并且定义Lmask为平均二值交叉熵损失（the average binary cross - entropy loss）. 
            # 对于一个属于第k个类别的RoI， Lmask仅仅考虑第k个mask（其他的掩模输入不会贡献到损失函数中）。
            # 这样的定义会允许对每个类别都会生成掩模，并且不会存在类间竞争。

            # Losses
            # rpn 分类损失
            rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
                [input_rpn_match, rpn_class_logits])
            # rpn 回归损失
            rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
                [input_rpn_bbox, input_rpn_match, rpn_bbox])
            # mrcnn 分类损失
            class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
                [target_class_ids, mrcnn_class_logits, active_class_ids])
            # mrcnn 回归损失
            bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
                [target_bbox, target_class_ids, mrcnn_bbox])
            # mask 损失
            mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
                [target_mask, target_class_ids, mrcnn_mask])

1，rpn 分类损失交叉熵

rpn_match与GT有关，前景为1背景为0；
rpn_class_logits 是rpn_graph中生成的，是特征图Reshape to [batch, anchors, 2]但没有经过softmax激活的值。

# rpn 分类损失  交叉熵
def rpn_class_loss_graph(rpn_match, rpn_class_logits):
    """RPN anchor classifier loss.

    rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
               -1=negative, 0=neutral anchor.
    rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG.
    """
    # Squeeze last dim to simplify
    rpn_match = tf.squeeze(rpn_match, -1)
    # Get anchor classes. Convert the -1/+1 match to 0/1 values. # 正样本转换为1，负样本和忽略的转换为0
    anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)
    # Positive and Negative anchors contribute to the loss, but neutral anchors (match value = 0) don't.
    indices = tf.where(K.not_equal(rpn_match, 0))  # 取不等于0的，即只取正样本
    # Pick rows that contribute to the loss and filter out the rest.
    rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) # 选择对损失由贡献的行，忽略其他行
    anchor_class = tf.gather_nd(anchor_class, indices)
    # 交叉熵损失Cross entropy loss
    loss = K.sparse_categorical_crossentropy(target=anchor_class,
                                             output=rpn_class_logits,
                                             from_logits=True)
    loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))  # 如果损失大于0输出，小于0输出0
    return loss

2，rpn 回归损失 SmoothL1

target_bbox 就是GT
rpn_match与GT有关，前景为1背景为0；
rpn_bbox 是rpn_graph中生成的,特征图Reshape to [batch, anchors, 4]的值。

SmoothL1损失之前解析其他算法时已经讲过了，这里也不多说了。

# rpn 回归损失  SmoothL1
def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):
    """Return the RPN bounding box loss graph.

    config: the model config object.
    target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].
        Uses 0 padding to fill in unsed bbox deltas.
    rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
               -1=negative, 0=neutral anchor.
    rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
    """
    # Positive anchors contribute to the loss, but negative and
    # neutral anchors (match value of 0 or -1) don't.
    rpn_match = K.squeeze(rpn_match, -1)  # squeeze()将下标为axis的一维从张量中移除
    indices = tf.where(K.equal(rpn_match, 1))  # 取正样本索引

    # Pick bbox deltas that contribute to the loss
    rpn_bbox = tf.gather_nd(rpn_bbox, indices)   # 选择正样本偏移量

    # Trim target bounding box deltas to the same length as rpn_bbox.
    # 将目标边界框增量修剪为与rpn_bbox相同的长度。
    batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)
    target_bbox = batch_pack_graph(target_bbox, batch_counts, config.IMAGES_PER_GPU)

    loss = smooth_l1_loss(target_bbox, rpn_bbox)
    
    loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))
    return loss

3，mrcnn 分类损失交叉熵

target_class_ids, 目标类别ID GT;
pred_class_logits, 特征图由头网络连接全连接层得到，预测的类别ID;
active_class_ids 实际的类别ids 80类;
实际用target_class_ids和pred_class_logits计算交叉熵损失;
active_class_ids用于消除不在图像的预测类别中的类别的预测损失。

# mrcnn 分类损失  交叉熵
def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids):
    """Loss for the classifier head of Mask RCNN.

    target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero
        padding to fill in the array.
    pred_class_logits: [batch, num_rois, num_classes]
    active_class_ids: [batch, num_classes]. Has a value of 1 for
        classes that are in the dataset of the image, and 0
        for classes that are not in the dataset.
    """
    # During model building, Keras calls this function with
    # target_class_ids of type float32. Unclear why. Cast it
    # to int to get around it.
    target_class_ids = tf.cast(target_class_ids, 'int64')

    # Find predictions of classes that are not in the dataset.
    # 查找不在数据集中的类的预测
    pred_class_ids = tf.argmax(pred_class_logits, axis=2)
    # TODO: Update this line to work with batch > 1. Right now it assumes all
    #       images in a batch have the same active_class_ids
    pred_active = tf.gather(active_class_ids[0], pred_class_ids)

    # Loss
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=target_class_ids, logits=pred_class_logits)

    # Erase losses of predictions of classes that are not in the active
    # classes of the image.
    # 消除不在图像的预测类别中的类别的预测损失。
    loss = loss * pred_active

    # Computer loss mean. Use only predictions that contribute
    # to the loss to get a correct mean.
    loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)
    return loss

4，mrcnn 回归损失 SmoothL1

target_bbox, 就是GT框
target_class_ids, GT框对应的类别ID
pred_bbox 由特征图经过头网络卷积得到的预测框

# mrcnn 回归损失  SmoothL1
def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):
    """Loss for Mask R-CNN bounding box refinement.

    target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]
    target_class_ids: [batch, num_rois]. Integer class IDs.
    pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]
    """
    # Reshape to merge batch and roi dimensions for simplicity.
    target_class_ids = K.reshape(target_class_ids, (-1,))
    target_bbox = K.reshape(target_bbox, (-1, 4))
    pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))

    # Only positive ROIs contribute to the loss. And only
    # the right class_id of each ROI. Get their indices.
    positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]
    positive_roi_class_ids = tf.cast(
        tf.gather(target_class_ids, positive_roi_ix), tf.int64)
    indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)

    # Gather the deltas (predicted and true) that contribute to loss
    target_bbox = tf.gather(target_bbox, positive_roi_ix)
    pred_bbox = tf.gather_nd(pred_bbox, indices)

    # Smooth-L1 Loss
    loss = K.switch(tf.size(target_bbox) > 0,
                    smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),
                    tf.constant(0.0))
    loss = K.mean(loss)
    return loss

5，mask 损失掩膜二进制交叉熵

Lmask是mask分支上的损失函数，输出大小为K*m*m，其编码分辨率为m*m的K个二进制mask，即K个类别每个对应一个二进制mask，对每个像素使用sigmoid 函数，Lmask是平均二进制交叉熵损失。RoI的groundtruth类别为k，Lmask只定义在第k个Mask上，其余的mask属于对它没有影响（也就是说在训练的时候，虽然每个点都会有K个二进制mask，但是只有一个k类mask对损失有贡献，这个k值是分类branch预测出来的）。

Mask-RCNN没有类间竞争，因为其他类别不贡献损失。mask分支对每个类别都有预测，依靠分类层选择输出mask（此时大小应该是m*m，之预测了一个类别出来，只需要输出该类别对应的mask即可），使用FCN的一般方法是对每个像素使用softmax以及多项交叉熵损失，会出现类间竞争。二值交叉熵会使得每一类的 mask 不相互竞争，而不是和其他类别的 mask 比较。

target_mask, GT mask
target_class_ids, GT框对应的类别ID
mrcnn_mask 由图训练的到的mask

def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):
    """Mask binary cross-entropy loss for the masks head.

    target_masks: [batch, num_rois, height, width].
        A float32 tensor of values 0 or 1. Uses zero padding to fill array.
    target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.
    pred_masks: [batch, proposals, height, width, num_classes] float32 tensor
                with values from 0 to 1.
    """
    # Reshape for simplicity. Merge first two dimensions into one.
    target_class_ids = K.reshape(target_class_ids, (-1,))
    mask_shape = tf.shape(target_masks)
    target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))
    pred_shape = tf.shape(pred_masks)
    pred_masks = K.reshape(pred_masks,
                           (-1, pred_shape[2], pred_shape[3], pred_shape[4]))
    # Permute predicted masks to [N, num_classes, height, width]
    pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])

    # Only positive ROIs contribute to the loss. And only
    # the class specific mask of each ROI.
    positive_ix = tf.where(target_class_ids > 0)[:, 0]
    positive_class_ids = tf.cast(
        tf.gather(target_class_ids, positive_ix), tf.int64)
    indices = tf.stack([positive_ix, positive_class_ids], axis=1)

    # Gather the masks (predicted and true) that contribute to loss
    y_true = tf.gather(target_masks, positive_ix)
    y_pred = tf.gather_nd(pred_masks, indices)

    # Compute binary cross entropy. If no positive ROIs, then return 0.
    # shape: [batch, roi, num_classes]
    loss = K.switch(tf.size(y_true) > 0,
                    K.binary_crossentropy(target=y_true, output=y_pred),
                    tf.constant(0.0))
    loss = K.mean(loss)
    return loss

Smooth-L1

代码中的x=K.abs(y_true-y_pred)

"""
Implements Smooth-L1 loss.
y_true and y_pred are typically: [N, 4], but could be any shape.
"""
def smooth_l1_loss(y_true, y_pred):
    diff = K.abs(y_true - y_pred)
    less_than_one = K.cast(K.less(diff, 1.0), "float32")
    loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)
    return loss

业余狙击手19

关注

5
点赞
踩
46

收藏

觉得还不错? 一键收藏
2
评论
MaskRCNN源码解析5：损失部分解析

MaskRCNN源码解析1：整体结构概述MaskRCNN源码解析2：特征图与anchors生成MaskRCNN源码解析3：RPN、ProposalLayer、DetectionTargetLayerMaskRCNN源码解析4-0：ROI Pooling 与 ROI Align理论MaskRCNN源码解析4：头网络(Networks Heads)解析MaskRCNN源码解析5：损...
复制链接

扫一扫