MaskRcnn(五)源码解读之RPN层

最新推荐文章于 2022-07-16 10:11:33 发布

血狼傲骨

最新推荐文章于 2022-07-16 10:11:33 发布

阅读量566

点赞数 1

分类专栏：实例分割 python专栏文章标签： python

本文链接：https://blog.csdn.net/sjjsbsbbs/article/details/120432975

版权

实例分割同时被 2 个专栏收录

17 篇文章 4 订阅

订阅专栏

python专栏

17 篇文章 2 订阅

订阅专栏

RPN层

一、前言
二、原理解读
三、代码解读

一、前言

在进行RPN操作之前，需要基于不同尺度特征图生成所有框。具体代码和注释如下:

def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
    #参数介绍，scalese是大小，比如说64*64.
    #ratios是长与宽的比例，可以进行调整。
    #shape是输入特征图的大小
    #后面两个是默认参数,可不调，stride默认为1，一个一个像素去挪动。
    #这里只是进行候选，后续会对候选框进行微调。
    scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
    scales = scales.flatten()
    ratios = ratios.flatten()
    #计算宽度和高度，每个都有三种，因为有三种不同长宽比例
    heights = scales / np.sqrt(ratios)
    widths = scales * np.sqrt(ratios)
    shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
    shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
    shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
    #生成候选框，找到每一个候选框的宽度长度，中心点。
    #每个中心点会对应三个长和宽。
    box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
    box_heights, box_centers_y = np.meshgrid(heights, shifts_y)

    #找到中心点
    box_centers = np.stack(
        [box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
    box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])

    #得到实际的坐标，对中心点进行偏移，得到两个点的坐标。
    boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                            box_centers + 0.5 * box_sizes], axis=1)
    return boxes


def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                             anchor_stride):
    #返回坐标，候选框的实际位置，特征图上的候选框得映射到原始图像的坐标，因为特征图和原始图片尺寸大小是不一样的。
    anchors = []
    #对于大小比例不同的分别进行生成，不同特征图按照不同的比例进行生成。
    #存储特征图，把所有特征图对应的候选框拿到手，对这些框进行操作。
    #把这些可能存在的位置全部连在一起，进行返回。
    for i in range(len(scales)):
        anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
                                        feature_strides[i], anchor_stride))
    return np.concatenate(anchors, axis=0)

二、原理解读

RPN层，扫描图像并生成提议(proposals，即有可能包含-一个目标的区域)。
原论文网络结构图如下:
在这里插入图片描述

RPN网络分为2条线，上面一条通过softmax分类anchors获得foreground和background（检测目标是foreground），下面一条用于计算对于anchors的bounding box regression偏移量，以获得精确的proposal。而最后的Proposal层则负责综合foreground anchors和bounding box regression偏移量获取proposals，同时剔除太小和超出边界的proposals。这里完成了相当于目标定位的功能。

三、代码解读

def rpn_graph(feature_map, anchors_per_location, anchor_stride):
    #共享卷积，进入后都进行3×3的卷积。
    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)
    #得到得分，六个结果值
    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)
    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
    #通过softmax得到概率值
    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)
    #回归值
    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)
    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)
    #返回三个值
    return [rpn_class_logits, rpn_probs, rpn_bbox]