【MaskRCNN】源码系列三：train&test的RPN

最新推荐文章于 2021-10-05 17:45:48 发布

mjiansun

最新推荐文章于 2021-10-05 17:45:48 发布

阅读量687

点赞数

分类专栏： Keras 论文笔记

本文链接：https://blog.csdn.net/u013066730/article/details/102548108

版权

论文笔记同时被 2 个专栏收录

87 篇文章 20 订阅

订阅专栏

Keras

43 篇文章 5 订阅

订阅专栏

apply_box_deltas_graph

clip_boxes_graph

NMS

我们简单做个总结

预设anchors

RPN定义在mrcnn/model.py的MaskRCNN()类中的build函数中。

        # Anchors
        if mode == "training":
            anchors = self.get_anchors(config.IMAGE_SHAPE)  # shape is [261888,4], 得到了所有尺寸，而且经过归一化后的图片
            # Duplicate across the batch dimension because Keras requires it
            # TODO: can this be optimized to avoid duplicating the anchors?
            anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
            # A hack to get around Keras's bad support for constants
            anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
        else:
            anchors = input_anchors

我们这里是进行train和test的rpn介绍，这里我们可以看到，当为train时进入判断获取得到的anchors和不进入train中获取的anchors其实是一样的，都是调用了get_anchors这个函数。

这里我们介绍一下get_anchors函数。

get_anchor

    def get_anchors(self, image_shape):
        """Returns anchor pyramid for the given image size."""
        backbone_shapes = compute_backbone_shapes(self.config, image_shape)
        # Cache anchors and reuse if image shape is the same
        if not hasattr(self, "_anchor_cache"):
            self._anchor_cache = {}
        if not tuple(image_shape) in self._anchor_cache:
            # Generate Anchors
            a = utils.generate_pyramid_anchors(
                self.config.RPN_ANCHOR_SCALES, # [32,64,128,256,512]
                self.config.RPN_ANCHOR_RATIOS, # [0.5,1,2]
                backbone_shapes, # [[256,256],[128,128],[64,64],[32,32],[16,16]]
                self.config.BACKBONE_STRIDES, # [4,8,16,32,64]
                self.config.RPN_ANCHOR_STRIDE) # 1
            # Keep a copy of the latest anchors in pixel coordinates because
            # it's used in inspect_model notebooks.
            # TODO: Remove this after the notebook are refactored to not use it
            self.anchors = a # shape is [261888,4]
            # Normalize coordinates
            self._anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2])
        return self._anchor_cache[tuple(image_shape)]

这里通过形状和config进行计算backbone_shapes。从上一篇博客中，我们了解到了rpn使用了P2,P3,P4,P5,P6这5个特征图，那么正好与此时的backbone_shapes对应了起来，其具体参数已经写在了代码注释中。

接着将进行generate_pyramid_anchors，这里我就不自己介绍了，请参阅之前的博客。

最后再进行一次规范化，具体看norm_boxes：

def norm_boxes(boxes, shape):
    """Converts boxes from pixel coordinates to normalized coordinates. # 只要是normalize，就表明所有的坐标都进行过归一化了
    boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
    shape: [..., (height, width)] in pixels

    Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
    coordinates it's inside the box.

    Returns:
        [N, (y1, x1, y2, x2)] in normalized coordinates
    """
    h, w = shape
    scale = np.array([h - 1, w - 1, h - 1, w - 1])
    shift = np.array([0, 0, 1, 1])
    return np.divide((boxes - shift), scale).astype(np.float32)

但这个规范化并不是规范到0-1之间，而是相对于宽高的压缩。所以规范化值可能会渠道-0.04或者1.15等，也就是会略微超出0和1这个范围，毕竟生成的预设anchorbox就是会超出原图大小。

所以最终的预设anchors的返回值会在大概-0.2到1.25之间，形状为（261888，4）。

RPN模型

        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1
                              len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE) # 3， 256

build_rpn_model

通过build函数中的上述代码，进入到build_rpn_model中一探究竟。

def build_rpn_model(anchor_stride, anchors_per_location, depth): #1，3，256
    input_feature_map = KL.Input(shape=[None, None, depth],
                                 name="input_rpn_feature_map")
    outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride) # （?,?,?,256）,3,1  ||||||  [(?,?,2),(?,?,2),(?,?,4)]
    return KM.Model([input_feature_map], outputs, name="rpn_model")

再进入到rpn_graph函数仔细探讨。

def rpn_graph(feature_map, anchors_per_location, anchor_stride): # (?,?,?,256),3,1
    """Builds the computation graph of Region Proposal Network.

    feature_map: backbone features [batch, height, width, depth]
    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).

    Returns:
        rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
        rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.
        rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                  applied to anchors.
    """
    # TODO: check if stride of 2 causes alignment issues if the feature map
    # is not even.
    # Shared convolutional base of the RPN
    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)

    # Anchor Score. [batch, height, width, anchors per location * 2].
    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)

    # Reshape to [batch, anchors, 2]
    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)  #[batch, H * W * anchors_per_location, 2]

    # Softmax on last dimension of BG/FG.
    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)

    # Bounding box refinement. [batch, H, W, anchors per location * depth]
    # where depth is [x, y, log(w), log(h)]
    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)

    # Reshape to [batch, anchors, 4]
    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x) #[batch, H * W * anchors_per_location, 4]

    return [rpn_class_logits, rpn_probs, rpn_bbox]

先看输入参数：

feature_map：上篇博客得到的rpn_feature_maps = [P2, P3, P4, P5, P6]中的特种图，一个一个的被送进rpn_graph中

anchors_per_location：3

anchor_stride：1

这里以P2为例，所以feature_map为(1,256,256,256)。

从上图可以看出rpn_graph共返回rpn_class_logits、rpn_probs、rpn_bbox。P2的形状已经在上图中显示，其他P3、P4、P5、P6以此类推，将得到如下表格：

	rpn_class_logits	rpn_probs	rpn_bbox
P2	(1,2562563,2)	(1,2562563,2)	(1,2562563,4)
P3	(1,1281283,2)	(1,1281283,2)	(1,1281283,4)
P4	(1,64643,2)	(1,64643,2)	(1,64643,4)
P5	(1,32323,2)	(1,32323,2)	(1,32323,4)
P6	(1,16163,2)	(1,16163,2)	(1,16163,4)

当返回到build_rpn_model函数中时，不难看出进行了新模型的构建，输入就是特征图，输出就是对应的[rpn_class_logits,rpn_probs,rpn_bbox]。

再跳回到上一级函数，直接就到了build中的下面的代码中：

        # 将特征图送到rpn网络中得到输出结果。
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

        output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
        outputs = list(zip(*layer_outputs)) # shape is [3, ?]第一个logits，第二个probs，第三个是4个坐标
        outputs = [KL.Concatenate(axis=1, name=n)(list(o))
                   for o, n in zip(outputs, output_names)]

        rpn_class_logits, rpn_class, rpn_bbox = outputs

这个就是直接将一个个特征图送入到网络中完成的，然后经过堆叠，使得最终的

rpn_class_logits：形状为（1，261888，2）；

rpn_class：形状为（1，261888，2），经过了softmax；

rpn_bbox：形状为（1，261888，4）。

这里的261888=256*256*3+128*128*3+64*64*3+32*32*3+16*16*3。

ProposalLayer

从261888个anchorbox中，虽然上面对这些anchorbox做了修正，但是很多并不能被直接使用，需要进行一定的挑选。

        # Generate proposals
        # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.
        proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
            else config.POST_NMS_ROIS_INFERENCE
        rpn_rois = ProposalLayer(
            proposal_count=proposal_count,
            nms_threshold=config.RPN_NMS_THRESHOLD,
            name="ROI",
            config=config)([rpn_class, rpn_bbox, anchors]) # shape is [batch, N=2000, (y1, x1, y2, x2)],经过nms，然后不够2000就补成2000，多于2000的就扔掉。另外不会出现超出图片的检测框，有clip

这里proposal_count被设定为2000。

接下来进入到ProposalLayer层。

class ProposalLayer(KE.Layer):
    def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):

初始化参数

proposal_count：2000

nms_threshold：0.7

config：config

name："ROI"

接着该类就直接被调用了

    def call(self, inputs):  #[rpn_class, rpn_bbox, anchors]

输入参数为

inputs：[rpn_class, rpn_bbox, anchors]

我嫌弃一句句解释了，以后都代码块进行解释。

    def call(self, inputs):  #[rpn_class, rpn_bbox, anchors]
        # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
        scores = inputs[0][:, :, 1] # softmax后前景的得分，形状为(1,261888,)
        # Box deltas [batch, num_rois, 4]
        deltas = inputs[1] #坐标变换所需参数
        deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) 
        # Anchors
        anchors = inputs[2] #anchorboxes

上面就是简单的变量赋值。这里乘以self.config.RPN_BBOX_STD_DEV可以查看https://blog.csdn.net/u013066730/article/details/102504951#build_rpn_targets博客中数据输入时做了除以的操作，这里时为了抵消其带来的影响。

        # Improve performance by trimming to top anchors by score
        # and doing the rest on the smaller subset.
        #这里看看输入的anchors的数量和self.config.PRE_NMS_LIMIT=6000哪个小，选小的那个
        pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])
        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices
        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])

这里261888个anchorboxes大于6000，所以就只选择前6000个得分的索引值（概率值越大表明越是前景）。batch_slice见下面讲解，这里说说

scores是根据索引选取出来的前6000个判断是否为前景的得分，得分越高越可能是前景；

deltas根据索引选取出来的前6000个偏移转换值，该值是通过上面的build_rpn_model所得到，形状为（6000，4）。

pre_nms_anchors根据索引选取出来的预设anchorbox，这些anchorbox是最有可能为前景的框。

batch_slice

这里介绍以下batch_slice，该函数后面会多次用到，这里仅以[scores,ix]为例，scores的形状为(1,261888)，ix的形状为(1,6000)，lambda x, y: tf.gather(x, y)函数被输入，然后每张GPU被分配的batchsize。
def batch_slice(inputs, graph_fn, batch_size, names=None):
    if not isinstance(inputs, list):
        inputs = [inputs]

    outputs = []
    for i in range(batch_size):
        inputs_slice = [x[i] for x in inputs] # 循环遍历inputs列表，然后从单个元素中去除第一行数据，在这个例子中就是取出第一个样本的所有得分以及前6000个得分的索引值。
        output_slice = graph_fn(*inputs_slice) # 从样本中选取出对应索引的值
        if not isinstance(output_slice, (tuple, list)):
            output_slice = [output_slice]
        outputs.append(output_slice)
    # Change outputs from a list of slices where each is
    # a list of outputs to a list of outputs and each has
    # a list of slices
    outputs = list(zip(*outputs))

    if names is None:
        names = [None] * len(outputs)

    result = [tf.stack(o, axis=0, name=n)
              for o, n in zip(outputs, names)]
    if len(result) == 1:
        result = result[0]

    return result
将*inputs_slice就是将scores[i]和ix[i]按照batchsize的分开，一个一个的进行graph_fn操作，通过graph_fn得到计算结果保存在output_slice，然后再按照batchsize所在维度进行堆叠，得到最终结果result。

讲完batch_slice的作用，再继续往下走。

        # Apply deltas to anchors to get refined anchors.
        # [batch, N, (y1, x1, y2, x2)]
        boxes = utils.batch_slice([pre_nms_anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors"])

上面这段代码其实就行将build_rpn_model得到的结果与预设anchorbox进行结合，最终得到经过修正的anchorbox，我们称经过修正的anchorbox成为预测boxes。

涉及apply_box_deltas_graph()，那就介绍这个函数。

apply_box_deltas_graph

def apply_box_deltas_graph(boxes, deltas):
    """Applies the given deltas to the given boxes.
    boxes: [N, (y1, x1, y2, x2)] boxes to update
    deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply
    """
    # Convert to y, x, h, w
    height = boxes[:, 2] - boxes[:, 0]
    width = boxes[:, 3] - boxes[:, 1]
    center_y = boxes[:, 0] + 0.5 * height
    center_x = boxes[:, 1] + 0.5 * width
    # Apply deltas
    center_y += deltas[:, 0] * height
    center_x += deltas[:, 1] * width
    height *= tf.exp(deltas[:, 2])
    width *= tf.exp(deltas[:, 3])
    # Convert back to y1, x1, y2, x2
    y1 = center_y - 0.5 * height
    x1 = center_x - 0.5 * width
    y2 = y1 + height
    x2 = x1 + width
    result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
    return result

这里主要是要代入公式计算，具体公式如下：

其中Aw，Ah，Ax，Ay都是预设anchorbox的中心点和宽高，dx，dy，dw，dh都是预测得到的偏移量，预设anchorbox经过偏移量的修正，即可得到RPN所需要的最终的预测boxes。

        # Clip to image boundaries. Since we're in normalized coordinates,
        # clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]
        window = np.array([0, 0, 1, 1], dtype=np.float32)
        boxes = utils.batch_slice(boxes,
                                  lambda x: clip_boxes_graph(x, window),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors_clipped"])

上述代码主要是由于预测boxes的大小超过0-1这个范围，这显然不合理，检测框只能出现在我图像的内部，所以我们就强制让小于0的值变为0，大于1的值变为1，保证所有的预测boxes都在图像内。

clip_boxes_graph

def clip_boxes_graph(boxes, window):
    """
    boxes: [N, (y1, x1, y2, x2)]
    window: [4] in the form y1, x1, y2, x2
    """
    # Split
    wy1, wx1, wy2, wx2 = tf.split(window, 4)
    y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)
    # Clip
    y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
    x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
    y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
    x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
    clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
    clipped.set_shape((clipped.shape[0], 4))
    return clipped

这段代码就是让不在0-1范围内的框都强制变到0-1区间内。

NMS

上面该准备的内容已经准备好了，开始进行nms操作。

        # Non-max suppression
        def nms(boxes, scores):
            indices = tf.image.non_max_suppression(
                boxes, scores, self.proposal_count,
                self.nms_threshold, name="rpn_non_max_suppression")
            proposals = tf.gather(boxes, indices)
            # Pad if needed
            padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
            proposals = tf.pad(proposals, [(0, padding), (0, 0)]) # 补0，上下左右，也就是下面补0
            return proposals
        proposals = utils.batch_slice([boxes, scores], nms,
                                      self.config.IMAGES_PER_GPU) # 最终的形状应该为[batchsize, N=2000， 4]

boxes：预测boxes，其形状为（1，6000，4）；

scores：根据得分大小选出的前6000个前景的概率值，该概率值与预测boxes一一对应，其形状为（1，6000）；

self.proposal_count：2000，只选2000个前景框；

self.nms_threshold：0.7。

根据tf.image.non_max_suppression()得到最终保留下来的索引，然后根据索引选取对应位置的预测boxes，得到proposals。这个proposals形状可能为（？，1500，4），只有1500个符合要求，但是我们要求必须为（？，2000，4）的形状，所以只能将他补0。

最终返回的proposals就是rpn_rois。

到这里RPN基本就结束了，RPN的损失后面再讲。

我们简单做个总结

首先使用卷积神经网络输出预设框的前背景类别置信值和偏移值，然后根据前背景类别置信值选出前6000个预设框，将预设框与偏移值进行结合，对预设框进行修正得到预测boxes，将预测boxes再经过nms的操作，得到最终proposals。

mjiansun

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

	rpn_class_logits	rpn_probs	rpn_bbox
P2	(1,2562563,2)	(1,2562563,2)	(1,2562563,4)
P3	(1,1281283,2)	(1,1281283,2)	(1,1281283,4)
P4	(1,64643,2)	(1,64643,2)	(1,64643,4)
P5	(1,32323,2)	(1,32323,2)	(1,32323,4)
P6	(1,16163,2)	(1,16163,2)	(1,16163,4)