RetinaNet

最新推荐文章于 2024-04-13 23:51:47 发布

被自己蠢哭了

最新推荐文章于 2024-04-13 23:51:47 发布

阅读量1.8k

点赞数 2

分类专栏：深度学习

本文链接：https://blog.csdn.net/jicong44/article/details/86500252

版权

深度学习专栏收录该内容

34 篇文章 0 订阅

订阅专栏

个人总结

结构描述：

ResNet基础网络：

RetinaNet选取后三个阶段，C3，C4，C5，然后利用这三组特征图通过上采样和下采样得到P3,P4,P5,P6,P7五个特征图，这五个特征图的通道数都为256,大小依次递减。

RetinaNet:

在每组P特征图上分别进行回归和分类的卷积，其中回归卷积后通道数为 4*num_anchors，然后再reshape成每行代表一个anchor的形式；分类卷积后通道数为num_class x num_anchors，同样reshape成每行代表一个anchors的形式。
注意：接下来将P3,P4,P5,P6,P7上回归的结果从anchors的维度排列起来（keras.layers.Concatenate），同样的分类的结果也从anchors的维度排列起来。

求网络预期输出的函数

def anchor_targets_bbox(
    anchors,
    image_group,
    annotations_group,
    num_classes,
    negative_overlap=0.4,
    positive_overlap=0.5
):
    """ Generate anchor targets for bbox detection.
    由一组anchors和annotations计算除网络需要拟合的结果，包含分类和回归两部分。

    Args
        anchors: np.array of annotations of shape (N, 4) for (x1, y1, x2, y2).生成方式：1）输入图片大小。2）求出对应多尺度特
                征图的尺寸。3）在多个特征图上生成anchors。所以这里的N是很大的
        image_group: List of BGR images.
        annotations_group: List of annotations (np.array of shape (N, 5) for (x1, y1, x2, y2, label)).
        num_classes: Number of classes to predict.
        mask_shape: If the image is padded with zeros, mask_shape can be used to mark the relevant part of the image.
        negative_overlap: IoU overlap for negative anchors (all anchors with overlap < negative_overlap are negative).
        positive_overlap: IoU overlap or positive anchors (all anchors with overlap > positive_overlap are positive).

    Returns
        labels_batch: batch that contains labels & anchor states (np.array of shape (batch_size, N, num_classes + 1),
                      where N is the number of anchors for an image and the last column defines the anchor state (-1 for ignore, 0 for bg, 1 for fg).
        regression_batch: batch that contains bounding-box regression targets for an image & anchor states (np.array of shape (batch_size, N, 4 + 1),
                      where N is the number of anchors for an image, the first 4 columns define regression targets for (x1, y1, x2, y2) and the
                      last column defines anchor states (-1 for ignore, 0 for bg, 1 for fg).
    """

    assert(len(image_group) == len(annotations_group)), "The length of the images and annotations need to be equal."
    assert(len(annotations_group) > 0), "No data received to compute anchor targets for."
    for annotations in annotations_group:
        assert('bboxes' in annotations), "Annotations should contain bboxes."
        assert('labels' in annotations), "Annotations should contain labels."

    batch_size = len(image_group)

    regression_batch  = np.zeros((batch_size, anchors.shape[0], 4 + 1), dtype=keras.backend.floatx())#+1表state
    labels_batch      = np.zeros((batch_size, anchors.shape[0], num_classes + 1), dtype=keras.backend.floatx())#+1表state

    
    for index, (image, annotations) in enumerate(zip(image_group, annotations_group)):
        if annotations['bboxes'].shape[0]:
            # obtain indices of gt annotations with the greatest overlap
            positive_indices, ignore_indices, argmax_overlaps_inds = compute_gt_annotations(anchors, annotations['bboxes'], negative_overlap, positive_overlap)#argmax_overlaps_inds代表每个anchors对应最大iou的GT框的index

            labels_batch[index, ignore_indices, -1]       = -1#最后一个位置代表当前anchor的状态
            labels_batch[index, positive_indices, -1]     = 1#最后一个位置代表当前anchor的状态

            regression_batch[index, ignore_indices, -1]   = -1#最后一个位置代表当前anchor的状态
            regression_batch[index, positive_indices, -1] = 1#最后一个位置代表当前anchor的状态

            # compute target class labels
            labels_batch[index, positive_indices, annotations['labels'][argmax_overlaps_inds[positive_indices]].astype(int)] = 1# 将标记为positive的anchors，为其标记上具体的类别信息

            regression_batch[index, :, :-1] = bbox_transform(anchors, annotations['bboxes'][argmax_overlaps_inds, :])# 为所有的anchors标记上其回归信息

        # ignore annotations outside of image
        if image.shape:
            anchors_centers = np.vstack([(anchors[:, 0] + anchors[:, 2]) / 2, (anchors[:, 1] + anchors[:, 3]) / 2]).T
            indices = np.logical_or(anchors_centers[:, 0] >= image.shape[1], anchors_centers[:, 1] >= image.shape[0])

            labels_batch[index, indices, -1]     = -1
            regression_batch[index, indices, -1] = -1

    return regression_batch, labels_batch

分类：

np.array() of shape (batch_size, anchors.shape[0], num_classes + 1)
num_class + 1：分别存储其类别信息和状态信息，只有状态为正阳本的才有类别信息，其余类别信息为全零。
状态信息用1（正阳本）,0（负样本）,-1（忽略样本）表示；类别信息用one-hot的0,1表示。

回归：

np.array() of shape (batch_size, anchors.shape[0], 4 + 1)
状态信息同样用1（正阳本）,0（负样本）,-1（忽略样本）表示；回归信息用
4 + 1：分别存储其类别信息和状态信息，所有anchors都有回归信息。回归信息的具体计算如下，直接相减再除以anchor的宽和高

    anchor_widths  = anchors[:, 2] - anchors[:, 0]
    anchor_heights = anchors[:, 3] - anchors[:, 1]

    targets_dx1 = (gt_boxes[:, 0] - anchors[:, 0]) / anchor_widths
    targets_dy1 = (gt_boxes[:, 1] - anchors[:, 1]) / anchor_heights
    targets_dx2 = (gt_boxes[:, 2] - anchors[:, 2]) / anchor_widths
    targets_dy2 = (gt_boxes[:, 3] - anchors[:, 3]) / anchor_heights

loss：

def focal(alpha=0.25, gamma=2.0):
    """ Create a functor for computing the focal loss.

    Args
        alpha: Scale the focal weight with alpha.
        gamma: Take the power of the focal weight with gamma.

    Returns
        A functor that computes the focal loss using the alpha and gamma.
    """
    def _focal(y_true, y_pred):
        """ Compute the focal loss given the target tensor and the predicted tensor.

        As defined in https://arxiv.org/abs/1708.02002

        Args
            y_true: Tensor of target data from the generator with shape (B, N, num_classes).
            y_pred: Tensor of predicted data from the network with shape (B, N, num_classes).

        Returns
            The focal loss of y_pred w.r.t. y_true.
        """
        labels         = y_true[:, :, :-1]
        anchor_state   = y_true[:, :, -1]  # -1 for ignore, 0 for background, 1 for object
        classification = y_pred

        # filter out "ignore" anchors
        indices        = backend.where(keras.backend.not_equal(anchor_state, -1))
        labels         = backend.gather_nd(labels, indices)
        classification = backend.gather_nd(classification, indices)

        # compute the focal loss
        alpha_factor = keras.backend.ones_like(labels) * alpha
        alpha_factor = backend.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor)
        focal_weight = backend.where(keras.backend.equal(labels, 1), 1 - classification, classification)
        focal_weight = alpha_factor * focal_weight ** gamma

        cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification)

        # compute the normalizer: the number of positive anchors
        normalizer = backend.where(keras.backend.equal(anchor_state, 1))
        normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx())
        normalizer = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer)

        return keras.backend.sum(cls_loss) / normalizer

    return _focal


def smooth_l1(sigma=3.0):
    """ Create a smooth L1 loss functor.

    Args
        sigma: This argument defines the point where the loss changes from L2 to L1.

    Returns
        A functor for computing the smooth L1 loss given target data and predicted data.
    """
    sigma_squared = sigma ** 2

    def _smooth_l1(y_true, y_pred):
        """ Compute the smooth L1 loss of y_pred w.r.t. y_true.

        Args
            y_true: Tensor from the generator of shape (B, N, 5). The last value for each box is the state of the anchor (ignore, negative, positive).
            y_pred: Tensor from the network of shape (B, N, 4).

        Returns
            The smooth L1 loss of y_pred w.r.t. y_true.
        """
        # separate target and state
        regression        = y_pred
        regression_target = y_true[:, :, :-1]
        anchor_state      = y_true[:, :, -1]

        # filter out "ignore" anchors
        indices           = backend.where(keras.backend.equal(anchor_state, 1))
        regression        = backend.gather_nd(regression, indices)
        regression_target = backend.gather_nd(regression_target, indices)

        # compute smooth L1 loss
        # f(x) = 0.5 * (sigma * x)^2          if |x| < 1 / sigma / sigma
        #        |x| - 0.5 / sigma / sigma    otherwise
        regression_diff = regression - regression_target
        regression_diff = keras.backend.abs(regression_diff)
        regression_loss = backend.where(
            keras.backend.less(regression_diff, 1.0 / sigma_squared),
            0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),
            regression_diff - 0.5 / sigma_squared
        )

        # compute the normalizer: the number of positive anchors
        normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])
        normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())
        return keras.backend.sum(regression_loss) / normalizer

    return _smooth_l1

输入：图片
输出：一组anchors对应的回归和分类预测值。

数据处理

数据的输入为一个txt文件，每一行代表一个实例，如下：

/media/tf/Elements/nj_Univercity/all_pics_sp/100_06726.jpg 568,69,586,88,1 
/media/tf/Elements/nj_Univercity/all_pics_sp/100_06727.jpg 582,67,604,90,1 
/media/tf/Elements/nj_Univercity/all_pics_sp/100_06728.jpg 584,68,607,90,1 
/media/tf/Elements/nj_Univercity/all_pics_sp/1_03550.jpg 568,94,584,110,1 585,93,601,110,1 603,93,619,110,1 620,94,636,111,1 
/media/tf/Elements/nj_Univercity/all_pics_sp/1_03551.jpg 570,94,587,111,1 587,94,603,111,1 604,94,621,111,1 622,94,639,111,1

数据的输出为每个anchor对应的分类和回归预期的真值。

被自己蠢哭了

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
RetinaNet

结构描述：输入：图片输出：一组anchors对应的回归和分类预测值。流程：ResNet有四个阶段：通道数分别为（64,128,256,512）RetinaNet选取后三个阶段，C3，C4，C5，然后利用这三组特征图通过上采样和下采样得到P3,P4,P5,P6,P7五个特征图，这五个特征图的通道数都为256,大小依次递减。在每组P特征图上分别进行回归和分类的卷积，其中回归卷积后通道数为 ...
复制链接

扫一扫